Browse > Article
http://dx.doi.org/10.5391/JKIIS.2010.20.4.528

A Training Method for Emotionally Robust Speech Recognition using Frequency Warping  

Kim, Weon-Goo (군산대학교 전기공학과)
Publication Information
Journal of the Korean Institute of Intelligent Systems / v.20, no.4, 2010 , pp. 528-533 More about this Journal
Abstract
This paper studied the training methods less affected by the emotional variation for the development of the robust speech recognition system. For this purpose, the effect of emotional variation on the speech signal and the speech recognition system were studied using speech database containing various emotions. The performance of the speech recognition system trained by using the speech signal containing no emotion is deteriorated if the test speech signal contains the emotions because of the emotional difference between the test and training data. In this study, it is observed that vocal tract length of the speaker is affected by the emotional variation and this effect is one of the reasons that makes the performance of the speech recognition system worse. In this paper, a training method that cover the speech variations is proposed to develop the emotionally robust speech recognition system. Experimental results from the isolated word recognition using HMM showed that propose method reduced the error rate of the conventional recognition system by 28.4% when emotional test data was used.
Keywords
MFCC;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 N. Amir, "Classifying emotions in speech: a comparison of methods", in Proceedings of Eurospeech '2001, Vol. 1, pp. 127-130, Aalborg, Denmark, 2001
2 A. Nogueiras, etc, "Speech emotion recognition using Hidden Markov Models", in Proceedings of Eurospeech '2001, Vol. 4, pp. 2679-2682, Aalborg, Denmark, 2001
3 R. W. Picard, Affective Computing, The MIT Press 1997.
4 I. R. Murray and J. L. Arnott, "Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion", Journal of Acoustical Society of America, pp. 1097-1108, Feb. 1993.
5 조영임, 장성순, "응급상황에서의 음성인식을 위한 필터기 구현", 한국지능시스템학회 논문지, 20권 2호, pp. 208-213, 2010.4   DOI
6 M. Pitz, H. Ney, "Vocal tract normalization equals linear transformation in cepstral space", IEEE Trans. Speech & Audio Processing, vol. 13, No. 5, pp. 930-944, 2005.   DOI
7 S. Wegmann, D. McAllaster, J. Orlofl and B. Peskin, "Speaker Normalization on Conversational Telephone Speech, in Proceedings of ICASSP, Atlanta, GA, pp. 339-342, May 1996.
8 L. Welling, R. Haeb-Umbach, X. Aubert and N. Haberland, "A study on speaker Normalization using vocal tract normalization and speaker adaptive training", in Proceedings of ICASSP, Seattle, WA, pp. 797-800, May 1998
9 A. Acero and R. M. Stern, "Robust speech recognition by normalization of the acoustic space", in Proceedings of. ICASSP, Toronto, pp. 893-896, May 1991.
10 E. Eide and H. Gish, "A parametric approach to vocal tract length normalization", in Proceedings of ICASSP, Atlanta, GA, pp.346-349, May 1996.
11 Sirko Molau, Stephan Kanthak , Hermann Ney, "Efficient Vocal Tract Normalization in Automatic Speech Recognition", in Proceedings of the ESSV'00, Cottbus, Germany, pp. 209-216, 2000
12 강봉석, “음성 신호를 이용한 문장독립 감정 인식시스템”, 연세대학교 석사학위 논문, 2000.
13 J. Koehler, N. Morgan, H. Hermansky, H. G. Hirsch, G. Tong, "Integrating RASTA-PLP into Speech Recognition", in Proceedings of ICASSP, pp. 421-424, 1994.
14 J. C. Junqua, and J. P. Haton, Robustness in Automatic Speech Recognition - Fundamental and Applications, Kluwer Academic Publishers, 1996.
15 A. Acero and R. M. Stern, "Environmental robustness in automatic speech recognition," in Proceedings of ICASSP, pp. 849-852, April 1990.
16 H. Hermansky, N. Morgan, H. G. Hirsch, "Recognition of speech in additive and convolutional noise based RASTA spectral processing", in Proceedings of ICASSP, pp. 83-86, 1993.
17 M. G. Rahim, B. H. Juang, "Signal bias removal by maximum likelihood estimation for robust telephone speech recognition", IEEE Trans. Speech & Audio Processing, vol. 4, No. 1, pp. 19-30, 1996.   DOI