[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.5391/JKIIS.2005.15.6.655

Robust Speech Recognition Parameters for Emotional Variation

Kim Weon-Goo (군산대학교 전자정보공학부)

Publication Information

Journal of the Korean Institute of Intelligent Systems / v.15, no.6, 2005 , pp. 655-660 More about this Journal

Abstract

This paper studied the feature parameters less affected by the emotional variation for the development of the robust speech recognition technologies. For this purpose, the effect of emotional variation on the speech recognition system and robust feature parameters of speech recognition system were studied using speech database containing various emotions. In this study, LPC cepstral coefficient, met-cepstral coefficient, root-cepstral coefficient, PLP coefficient, RASTA met-cepstral coefficient were used as a feature parameters. And CMS and SBR method were used as a signal bias removal techniques. Experimental results showed that the HMM based speaker independent word recognizer using RASTA met-cepstral coefficient :md its derivatives and CMS as a signal bias removal showed the best performance of $7.05\%$ word error rate. This corresponds to about a $52\%$ word error reduction as compare to the performance of baseline system using met - cepstral coefficient.

Keywords

HMM; MFCC;

Citations & Related Records

Reference

1	J. Koehler, N. Morgan, H. Hermansky, H. G. Hirsch, G. Tong, 'Integrating RASTA-PLP into Speech Recognition', in Proc. ICASSP, pp. 421-424, 1994
2	K. R. Scherer, D. R. Ladd, and K. E. A. Silverman, 'Vocal Cues to Speaker Affect: Testing Two Models', Journal Acoustical Society of America, Vol. 76, No. 5, pp. 1346-1355, Nov. 1984 DOI ScienceOn
3	C. E. Williams and K. N. Stevens, 'Emotions and Speech: Some Acoustical Correlates', Journal Acoustical Society of America, Vol. 52, No. 4, pp. 1238-1250, 1972 DOI PUBMED
4	Michael Lewis and Jeannette M. Haviland, Handbook of Emotions,,The Guilford Press 1993
5	P.Alexandre, ect. 'Root Cepstral Analysis: A Unified View. Application to Speech Processing in Car Noise Environments', Speech Communication, vol. 12, no. 3, pp. 277-288, 1993 DOI ScienceOn
6	Iain R. Murray and John L. Arnott, 'Toward the Simulation of Emotion in Synthetic Speech: A review of the literature on human vocal emotion',, Journal of Accoustal Society of America., pp. 1097-1108, Feb. 1993
7	R. W. Picard, Affective Computing, MIT Press 1997
8	Janet E. Cahn, ,'The Generation of Affect in Synthesized Speech',, Journal of the American Voice I/O Society, Vol. 8, pp. 1-19, July 1990
9	H. Hermansky, N. Morgan, A. Bayya, P. Kohn, 'Compensation for the Effect of the Communication Channel in Auditory-Like Analysis of Speech(RASTA-PLP)', in Proc. EUROSPEECH, vol. 3, pp. 1367-1370, Sep. 1991
10	H. Hermansky, N. Morgan, H. G. Hirsch, 'Recognition of Speech in Additive and Convolutional Noise based RASTA Spectral Processing', in Proc. ICASSP, pp. 83-86, 1993
11	S. Young, 'A Review of Large-Vocabulary Continuous-Speech Recognition',,IEEE Signal Processing Magazine, Vol. 13, No. 5, pp. 45-47, 1996 DOI ScienceOn
12	L. R. Rabiner and B. H. Juang, Fundamentals of speech recognition, Prentice-Hall Inc., 1993
13	M. G. Rahim, B. H. Juang, 'Signal Bias Removal by Maximum Likelihood Estimation for Robust Telephone Speech Recognition', IEEE Trans. Speech & Audio Processing, vol. 4, No. 1, pp. 19-30, 1996 DOI ScienceOn
14	J. C. Junqua, and J. P. Haton, Robustness in Automatic Speech Recognition - Fundamental and Applications, Kluwer Academic Publishers, 1996
15	Noam Amir,'Classifying Emotions in Speech: a Comparison of Methods', Proceedings of Eurospeech '2001, Vol. 1, pp. 127-130, Aalborg, Denmark, 2001
16	L. R. Rabiner,,'A Tutorial on HMMs and Selected Applications in Speech Recognition', Proc. IEEE, Vol. 77, No. 2, pp. 257-285, 1989
17	A. Acero, ect, 'Environmental Robustness in Automatic Speech Recognition,' in Proc. ICASSP, pp. 849-852, April 1990
18	A. Nogueiras, etc,'Speech Emotion Recognition using Hidden Markov Models', Proceedings of Eurospeech '2001, Vol. 4, pp. 2679-2682, Aalborg, Denmark, 2001

4	Byeong-Gwan Iem. (2015) The International Journal of Fuzzy Logic and Intelligent Systems A Low Bit Rate Speech Coder Based on the Inflection Point Detection / 15 (4) , 300
4	Byeong-Gwan Iem. (2016) The International Journal of Fuzzy Logic and Intelligent Systems A Fixed Rate Speech Coder Based on the Filter Bank Method and the Inflection Point Detection / 16 (4) , 276

KSCI

Robust Speech Recognition Parameters for Emotional Variation 감정 변화에 강인한 음성 인식 파라메터

Robust Speech Recognition Parameters for Emotional Variation