Robust Speech Recognition Parameters for Emotional Variation

Kim Weon-Goo;

doi:10.5391/JKIIS.2005.15.6.655

한국지능시스템학회논문지 (Journal of the Korean Institute of Intelligent Systems)

제15권6호
/
Pages.655-660
/
2005
/
1976-9172(pISSN)
/
2288-2324(eISSN)

한국지능시스템학회 (Korean Institute of Intelligent Systems)

DOI QR Code

감정 변화에 강인한 음성 인식 파라메터

Robust Speech Recognition Parameters for Emotional Variation

김원구 (군산대학교 전자정보공학부)

Kim Weon-Goo

발행 : 2005.12.01

https://doi.org/10.5391/JKIIS.2005.15.6.655 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

본 논문에서는 인간의 감정 변화에 강인한 음성 인식 기술 개발을 목표로 하여 감정 변화의 영향을 적게 받는 음성 인식시스템의 특징 파라메터에 관한 연구를 수행하였다. 이를 위하여 우선 다양한 감정이 포함된 음성 데이터베이스를 사용하여 감정 변화가 음성 인식 시스템의 성능에 미치는 영향에 관한 연구와 감정 변화의 영향을 적게 받는 음성 인식 시스템의 특징 파라메터에 관한 연구를 수행하였다. 본 연구에서는 LPC 켑스트럼 계수, 멜 켑스트럼 계수, 루트 켑스트럼 계수, PLP 계수와 RASTA 처리를 한 멜 켑스트럼 계수와 음성의 에너지를 사용하였다 또한 음성에 포함된 편의(bias)를 제거하는 방법으로 CMS와 SBR 방법을 사용하여 그 성능을 비교하였다. 실험 결과에서 RASTA 멜 켑스트럼과 델타 켑스트럼을 사용하고 신초편의 제거 방법으로 CMS를 사용한 경우에 HMM 기반의 화자독립 단어 인식기의 오차가 $7.05\%$로 가장 우수한 성능을 나타내었다. 이러한 것은 멜 켑스트럼을 사용한 기준시스템과 비교하여 $59\%$정도 오차가 감소된 것이다.

This paper studied the feature parameters less affected by the emotional variation for the development of the robust speech recognition technologies. For this purpose, the effect of emotional variation on the speech recognition system and robust feature parameters of speech recognition system were studied using speech database containing various emotions. In this study, LPC cepstral coefficient, met-cepstral coefficient, root-cepstral coefficient, PLP coefficient, RASTA met-cepstral coefficient were used as a feature parameters. And CMS and SBR method were used as a signal bias removal techniques. Experimental results showed that the HMM based speaker independent word recognizer using RASTA met-cepstral coefficient :md its derivatives and CMS as a signal bias removal showed the best performance of $7.05\%$ word error rate. This corresponds to about a $52\%$ word error reduction as compare to the performance of baseline system using met - cepstral coefficient.

키워드

HMM;
MFCC;

참고문헌

Noam Amir,'Classifying Emotions in Speech: a Comparison of Methods', Proceedings of Eurospeech '2001, Vol. 1, pp. 127-130, Aalborg, Denmark, 2001
A. Nogueiras, etc,'Speech Emotion Recognition using Hidden Markov Models', Proceedings of Eurospeech '2001, Vol. 4, pp. 2679-2682, Aalborg, Denmark, 2001
R. W. Picard, Affective Computing, MIT Press 1997
Janet E. Cahn, ,'The Generation of Affect in Synthesized Speech',, Journal of the American Voice I/O Society, Vol. 8, pp. 1-19, July 1990
K. R. Scherer, D. R. Ladd, and K. E. A. Silverman, 'Vocal Cues to Speaker Affect: Testing Two Models', Journal Acoustical Society of America, Vol. 76, No. 5, pp. 1346-1355, Nov. 1984 https://doi.org/10.1121/1.391450
Iain R. Murray and John L. Arnott, 'Toward the Simulation of Emotion in Synthetic Speech: A review of the literature on human vocal emotion',, Journal of Accoustal Society of America., pp. 1097-1108, Feb. 1993
C. E. Williams and K. N. Stevens, 'Emotions and Speech: Some Acoustical Correlates', Journal Acoustical Society of America, Vol. 52, No. 4, pp. 1238-1250, 1972 https://doi.org/10.1121/1.1913238
Michael Lewis and Jeannette M. Haviland, Handbook of Emotions,,The Guilford Press 1993
L. R. Rabiner and B. H. Juang, Fundamentals of speech recognition, Prentice-Hall Inc., 1993
S. Young, 'A Review of Large-Vocabulary Continuous-Speech Recognition',,IEEE Signal Processing Magazine, Vol. 13, No. 5, pp. 45-47, 1996 https://doi.org/10.1109/79.536824
L. R. Rabiner,,'A Tutorial on HMMs and Selected Applications in Speech Recognition', Proc. IEEE, Vol. 77, No. 2, pp. 257-285, 1989
J. C. Junqua, and J. P. Haton, Robustness in Automatic Speech Recognition - Fundamental and Applications, Kluwer Academic Publishers, 1996
A. Acero, ect, 'Environmental Robustness in Automatic Speech Recognition,' in Proc. ICASSP, pp. 849-852, April 1990
H. Hermansky, N. Morgan, H. G. Hirsch, 'Recognition of Speech in Additive and Convolutional Noise based RASTA Spectral Processing', in Proc. ICASSP, pp. 83-86, 1993
J. Koehler, N. Morgan, H. Hermansky, H. G. Hirsch, G. Tong, 'Integrating RASTA-PLP into Speech Recognition', in Proc. ICASSP, pp. 421-424, 1994
H. Hermansky, N. Morgan, A. Bayya, P. Kohn, 'Compensation for the Effect of the Communication Channel in Auditory-Like Analysis of Speech(RASTA-PLP)', in Proc. EUROSPEECH, vol. 3, pp. 1367-1370, Sep. 1991
P.Alexandre, ect. 'Root Cepstral Analysis: A Unified View. Application to Speech Processing in Car Noise Environments', Speech Communication, vol. 12, no. 3, pp. 277-288, 1993 https://doi.org/10.1016/0167-6393(93)90099-7
M. G. Rahim, B. H. Juang, 'Signal Bias Removal by Maximum Likelihood Estimation for Robust Telephone Speech Recognition', IEEE Trans. Speech & Audio Processing, vol. 4, No. 1, pp. 19-30, 1996 https://doi.org/10.1109/TSA.1996.481449

피인용 문헌

A Low Bit Rate Speech Coder Based on the Inflection Point Detection vol.15, pp.4, 2015, https://doi.org/10.5391/IJFIS.2015.15.4.300
A Fixed Rate Speech Coder Based on the Filter Bank Method and the Inflection Point Detection vol.16, pp.4, 2016, https://doi.org/10.5391/IJFIS.2016.16.4.276

한국지능시스템학회논문지 (Journal of the Korean Institute of Intelligent Systems)

감정 변화에 강인한 음성 인식 파라메터

Robust Speech Recognition Parameters for Emotional Variation

초록

키워드

참고문헌

피인용 문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)