Mahidol University Logo
Faculty of ICT, Mahidol University
 

Admissions

Printable Version

 

A PROTOTYPE OF THAI TEXT- TO-SPEECH SYNTHERSIS BASED ON LINEAR PREDICTIVE CODING METHOD

 

TITLE A PROTOTYPE OF THAI TEXT- TO-SPEECH SYNTHERSIS BASED ON LINEAR PREDICTIVE CODING METHOD.
AUTHOR SUTTISUN SUTHAD NA AYUDTHYA
DEGREE MASTER OF SCIENCE PROGRAMME IN COMPUTER SCIENCE
FACULTY FACULTY OF SCIENCE
ADVISOR SUPACHAI TANG WONGSAN
CO-ADVISOR CHOMTIP PORN PANOMCHAI
 
ABSTRACT
This research project was aimed to develop a prototype of a Thai Text-to-Speech System, which could synthesize speech in Thai from a text by mean of computers. The voice of the synthesized speech should be correlated to Thai reading principles and be similar to the human voice. The newly developed prototype consisted of two major systems: text processing and signal processing. The text processing system decomposed the input text into phonetic codes, which were further processed by the signal processing system. The signal processing system was composed of two processes: mid-tone syllable synthesis and tone transformation. Initially, mid-tone syllables were synthesized by concatenating of speech units recorded in the form of semi-syllables, which consist of initial part of 288 units and final part of 243 units. By using the concatenation method, 7,776 mid-tone syllables (32 first consonant sounds * 27 vowel sounds * 9 final consonant sounds) could be synthesized. Afterwards, the tone transformation process was done by modifying pitch values of the mid-tone syllable, which were computed by the Autocorrelation method. Then the modified pitch values were used to synthesize speech signal using the LPC (Linear Predictive Coding) method. By modifying the pitch values, the frequency of the speech signal was changed. As the result, a mid-tone syllable could be transformed to the low, falling, high and rising-tone syllable. Totally, 38,888 syllables (7,776 mid-tone syllables * 5 tones) could be synthesized. In the experimental stage, the system produced speech signal of three types of text data: 1) the mid-tone syllables, 2) the tone transformed syllables, and 3) the sample meaningful sentences. The speech quality was evaluated using the Mean Opinion Score (MOS) method. The experimental result of the mid-tone syllables group is acceptable at 100%. Nevertheless, the result of the tone transformed syllables group is acceptable 61.25%, poor 9.75% and unacceptable 29.00%. The result of the sample meaningful sentences group included four evaluation aspects: pronunciation, distinctness, naturalness, and intelligibility, with scores of 2.58, 2.66 2.40 and 2.89 respectively. Therefore, it could be concluded that the present prototype had demonstrated the ability of speech synthesizing of Thai texts up to quite a satisfactory level.
KEYWORD TEXT-TO-SPEECH / SPEECH SYNTHESIS / LPC

 

Go to Top

 

ICT Building, Mahidol University, 999 Phuttamonthon 4 Road, Salaya, Nakhonpathom 73170 Tel. +66 02 441-0909 Fax. +66 02 849-6099
Mahidol University Computing Center, The Faculty of ICT, Mahidol University , Rama 6 Road, Rajathevi, Bangkok 10400 Tel. +66 02 354-4333 Fax. +66 02 354-7333