Browse > Article
http://dx.doi.org/10.5762/KAIS.2014.15.9.5763

Analysis of Voice Color Similarity for the development of HMM Based Emotional Text to Speech Synthesis  

Min, So-Yeon (Dept. of Information and Communication, Seoil University)
Na, Deok-Su (Voiceware co. Ltd, R&D center)
Publication Information
Journal of the Korea Academia-Industrial cooperation Society / v.15, no.9, 2014 , pp. 5763-5768 More about this Journal
Abstract
Maintaining a voice color is important when compounding both the normal voice because an emotion is not expressed with various emotional voices in a single synthesizer. When a synthesizer is developed using the recording data of too many expressed emotions, a voice color cannot be maintained and each synthetic speech is can be heard like the voice of different speakers. In this paper, the speech data was recorded and the change in the voice color was analyzed to develop an emotional HMM-based speech synthesizer. To realize a speech synthesizer, a voice was recorded, and a database was built. On the other hand, a recording process is very important, particularly when realizing an emotional speech synthesizer. Monitoring is needed because it is quite difficult to define emotion and maintain a particular level. In the realized synthesizer, a normal voice and three emotional voice (Happiness, Sadness, Anger) were used, and each emotional voice consists of two levels, High/Low. To analyze the voice color of the normal voice and emotional voice, the average spectrum, which was the measured accumulated spectrum of vowels, was used and the F1(first formant) calculated by the average spectrum was compared. The voice similarity of Low-level emotional data was higher than High-level emotional data, and the proposed method can be monitored by the change in voice similarity.
Keywords
Emotional Speech Synthesis; Voice Color Similarity;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 T. Toda and K. Tokuda, "A speech parameter generation algorithm considering global variance for HMM-based speech synthesis," IEICE Transactions, vol. E90-D, no.5, 816-824(2007) DOI: http://dx.doi.org/10.1093/ietisy/e90-d.5.816   DOI
2 Z-.H. Ling, Y. Hu, and L. Dai, "Global variance modeling on the log power spectrum of LSPs for HMM-based speech synthesis," Proc. INTERSPEECH, 825-828(2010)
3 Z. Yan, Q. Yao, S.K. Frank, "Rich Context Modeling for High Quality HMM-Based TTS," INTERSPEECH 2009, 1755-1758(2009)
4 J. Yamagishi, K. Onishi, T. Masuko, T. Kobayashi, "Acoustic modeling of speaking styles and emotional expressions in HMM-based speech synthesis," IEICE Trans. on Inf. & Syst., vol.E88-D, no.3, 503-509(2005) DOI: http://dx.doi.org/10.1093/ietisy/e88-d.3.502
5 M. Isogai et al., "Recording script design for corpus-based TTS system based on coverage of various phonetic elements," Proc. ICASSP, vol. I, 301-304(2005)
6 Seo-Bae Lee, "An Analysis of Formants Extracted from Emotional Speech and Acoustical Implications for the Emotion Recognition System and Speech Recognition System," Journal of the Korean society of speech sciences , No.3 Vol1, 45-50( 2011)   과학기술학회마을
7 D. S. Na and M. J. Bae, "A Variable Break Prediction Method using CART in a Japanese Text-to-Speech System," IEICE Trans. Inf. & Syst., Vol. E92-D, No.2, 349-352(2009)   DOI   ScienceOn
8 http://voicetext.jp/news/archives/2570