Syllables Selection for the Development of Speech Database for Punjabi TTS System
The Selection of the speech unit and then the number of speech units for the speech database is one of the important and tedious job. Syllables have been reported as good choice of speech unit for speech database of many languages. For this work also, syllables have been selected as the speech unit for the development of the Punjabi speech database. For minimizing the database size, efforts have been made for the selection of the minimal set of syllables covering almost whole Punjabi word set. To accomplish this all Punjabi syllables have been statistically analyzed on the Punjabi corpus having more than 104 million words. This analysis helped to select a relatively smaller syllable set (about first ten thousand syllables (0.86% of total syllables)) of most frequently occurring syllables having cumulative frequency of occurrence (FOO) less than 99.81%, out of 1156740 total available syllables. Also to improve the efficiency of the text-to-speech (TTS) system; interesting facts about Punjabi syllables have been obtained based on their FOO at the three (starting, middle and end) positions in the words.
Keywords: Speech database, Punjabi syllables, Punjabi TTS system
Download Full-Text








