Browse > Article
http://dx.doi.org/10.5391/JKIIS.2010.20.5.683

Emotion Robust Speech Recognition using Speech Transformation  

Kim, Weon-Goo (군산대학교 전기공학과)
Publication Information
Journal of the Korean Institute of Intelligent Systems / v.20, no.5, 2010 , pp. 683-687 More about this Journal
Abstract
This paper studied some methods which use frequency warping method that is the one of the speech transformation method to develope the robust speech recognition system for the emotional variation. For this purpose, the effect of emotional variations on the speech signal were studied using speech database containing various emotions and it is observed that speech spectrum is affected by the emotional variation and this effect is one of the reasons that makes the performance of the speech recognition system worse. In this paper, new training method that uses frequency warping in training process is presented to reduce the effect of emotional variation and the speech recognition system based on vocal tract length normalization method is developed to be compared with proposed system. Experimental results from the isolated word recognition using HMM showed that new training method reduced the error rate of the conventional recognition system using speech signal containing various emotions.
Keywords
robust speech recognition; frequency warping; vocal tract length normalization;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 A. Acero and R. M. Stern, "Robust speech recognition by normalization of the acoustic space", Proc. of ICASSP, Toronto, pp. 893-896, May 1991.
2 E. Eide and H. Gish, "A parametric approach to vocal tract length normalization", Proc. of ICASSP, Atlanta, GA, pp.346-349, May 1996.
3 Sirko Molau, Stephan Kanthak , Hermann Ney, "Efficient Vocal Tract Normalization in Automatic Speech Recognition", Proc. of the ESSV'00, Cottbus, Germany, pp. 209-216, 2000
4 강봉석, “음성 신호를 이용한 문장독립 감정 인식시스템”, 연세대학교 석사학위 논문, 2000.
5 J. C. Junqua, and J. P. Haton, Robustness in Automatic Speech Recognition - Fundamental and Applications, Kluwer Academic Publishers, 1996.
6 A. Acero and R. M. Stern, "Environmental robustness in automatic speech recognition," Proc. of ICASSP, pp. 849-852, April 1990.
7 H. Hermansky, N. Morgan, H. G. Hirsch, "Recognition of speech in additive and convolutional noise based RASTA spectral processing", Proc. of ICASSP, pp. 83-86, 1993.
8 J. Koehler, N. Morgan, H. Hermansky, H. G. Hirsch, G. Tong, "Integrating RASTA-PLP into Speech Recognition", Proc. of ICASSP, pp. 421-424, 1994.
9 M. G. Rahim, B. H. Juang, "Signal bias removal by maximum likelihood estimation for robust telephone speech recognition", IEEE Trans. Speech & Audio Processing, vol. 4, No. 1, pp. 19-30, 1996.   DOI
10 N. Amir, "Classifying emotions in speech: a comparison of methods", Proc. of Eurospeech '2001, Vol. 1, pp. 127-130, Aalborg, Denmark, 2001
11 A. Nogueiras, etc, "Speech emotion recognition using Hidden Markov Models", Proc. of Eurospeech '2001, Vol. 4, pp. 2679-2682, Aalborg, Denmark, 2001
12 R. W. Picard, Affective Computing, The MIT Press 1997.
13 S. Wegmann, D. McAllaster, J. Orlofl and B. Peskin, "Speaker Normalization on Conversational Telephone Speech, Proc. of ICASSP, Atlanta, GA, pp. 339-342, May 1996.
14 I. R. Murray and J. L. Arnott, "Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion", Journal of Acoustical Society of America, pp. 1097-1108, Feb. 1993.
15 김원구, 방현진, “성도 정규화를 이용한 감정 변화에 강인한 음성 인식”, 한국 지능시스템학회 논문지, 19권 6호, pp. 773-338, 2009   과학기술학회마을
16 M. Pitz, H. Ney, "Vocal tract normalization equals linear transformation in cepstral space", IEEE Trans. Speech & Audio Processing, vol. 13, No. 5, pp. 930-944, 2005.   DOI
17 L. Welling, R. Haeb-Umbach, X. Aubert and N. Haberland, "A study on speaker Normalization using vocal tract normalization and speaker adaptive training", Proc. of ICASSP, Seattle, WA, pp. 797-800, May 1998