Prosodic Contour Generation for Korean Text-To-Speech System Using Artificial Neural Networks

Lim, Un-Cheon;

The Journal of the Acoustical Society of Korea

Volume 28 Issue 2E
/
Pages.43-50
/
2009
/
1225-4428(pISSN)

The Acoustical Society of Korea (한국음향학회)

Prosodic Contour Generation for Korean Text-To-Speech System Using Artificial Neural Networks

Lim, Un-Cheon (Dept. of Electronics Eng., Hoseo Univ.)

Published : 2009.06.30

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

To get more natural synthetic speech generated by a Korean TTS (Text-To-Speech) system, we have to know all the possible prosodic rules in Korean spoken language. We should find out these rules from linguistic, phonetic information or from real speech. In general, all of these rules should be integrated into a prosody-generation algorithm in a TTS system. But this algorithm cannot cover up all the possible prosodic rules in a language and it is not perfect, so the naturalness of synthesized speech cannot be as good as we expect. ANNs (Artificial Neural Networks) can be trained to learn the prosodic rules in Korean spoken language. To train and test ANNs, we need to prepare the prosodic patterns of all the phonemic segments in a prosodic corpus. A prosodic corpus will include meaningful sentences to represent all the possible prosodic rules. Sentences in the corpus were made by picking up a series of words from the list of PB (phonetically Balanced) isolated words. These sentences in the corpus were read by speakers, recorded, and collected as a speech database. By analyzing recorded real speech, we can extract prosodic pattern about each phoneme, and assign them as target and test patterns for ANNs. ANNs can learn the prosody from natural speech and generate prosodic patterns of the central phonemic segment in phoneme strings as output response of ANNs when phoneme strings of a sentence are given to ANNs as input stimuli.

Keywords

References

J. D. Markel and A. H. Gray Jr., Linear Prediction of Speech. Springer-Verlag. 1976
J. Allen, M. S. Hunnicutt and D. H. Klatt et al, From Text To Speech. Cambridge University Press, 1987
A. Waibel, Prosody and Speech Recognition. Morgan Kaufmann Publishers, 1988
A. M. Liberman et al, 'Minimal rules for synthesizing speech,' J. Acoust. Soc. Am., vol. 31, no. 11, PP. 1490-1499, Nov. 1959 https://doi.org/10.1121/1.1907654
J. Allen, 'Synthesis of speech from unrestricted text,' Proc. IEEE, vol. 64, no .4, PP. 433-442, Apr. 1976 https://doi.org/10.1109/PROC.1976.10152
N. Umeda, 'Vowel duration in American English,' J. Acoust. Soc. Am., vol. 56, PP. 434-445, 1975 https://doi.org/10.1121/1.380688
J. Pierrehumbert, 'Synthesizing intonation,' J. Acoust. Soc. Am., vol. 70, no. 4, PP. 985-995, Oct. 1981 https://doi.org/10.1121/1.387033
R. M. Meli and F. Fallside, 'The modeling of FO contours,' in IEEE Proc. ICASSP '82, PP. 947-949, 1982.
M. Ljungqvist and H. Fujisaki, 'Generating Intonation for Swedish Text-to-Speech Conversion Using a Quantitative Model for the FO Contour,' in Proc. Eurospeech '93, PP. 873-876, 1993
Hyun Bok Lee, 'Korean prosody: Speech rhythm and intonation,' Korea Journal, PP. 42-69, Feb. 1987
J. C. Lee, S. H. Kim and M. Hahn, 'Intonation Processing for Korean TTS Conversion Using Stylization Method,' in Proc. ICSPAT '95, vol. II, PP. 1943-1946, 1995
C. Tuerk and T. Robinson, 'Speech Synthesis Using Artificial Neural Networks Trained on Cepstral Coefficients,' in Proc. Eurospeech '93, PP. 1713-1716, 1993
M. Riedi, 'A Neural-Network-Based Model of Segmental Duration for Speech Synthesis,' in Proc. Eurospeech '95, vol. I, PP. 599 -602, 1995
D. p. Morgan and C. L. Scofield, Neural Networks and Speech Processing, Kluwer Academic Pub., 1991
Mazin G. Rahim, Artificial Neural Networks for Speech Analysis/Synthesis, Chapman & Hall, 1994
Adam Blum, Neural Networks in C++, John Wiley & Sons Inc., 1992
Sok Wang Chang, Hyun Joan Kim, Chang Su Ryoo, Un-Cheon Lim, 'A Study on the Prosody Generation in Isolated Words with an Artificial Neural Network,' in Proc. ICSP'97, vol. 1 of 2, PP. 207-210, 1997
Kyung-Joong Min, Un-Cheon Lim, 'Architecture of Artificial Neural Networks for Prosody Generation in Korean Sentences', Proc. ICSP'2001, vol. 2 of 2, PP. 771-776, 2001
Kyoung-Joong Min, Un-Cheon Lim, 'Korean Prosody Generation and Artificial Neural Networks', INTERSPEECH 2004 ICSLP, Vol. 3 of 8, PP. 1869-1872, 2004

The Journal of the Acoustical Society of Korea

Prosodic Contour Generation for Korean Text-To-Speech System Using Artificial Neural Networks

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)