DOI QR코드

DOI QR Code

감정 변화에 강인한 음성 인식 파라메터

Robust Speech Recognition Parameters for Emotional Variation

  • 김원구 (군산대학교 전자정보공학부)
  • 발행 : 2005.12.01

초록

본 논문에서는 인간의 감정 변화에 강인한 음성 인식 기술 개발을 목표로 하여 감정 변화의 영향을 적게 받는 음성 인식시스템의 특징 파라메터에 관한 연구를 수행하였다. 이를 위하여 우선 다양한 감정이 포함된 음성 데이터베이스를 사용하여 감정 변화가 음성 인식 시스템의 성능에 미치는 영향에 관한 연구와 감정 변화의 영향을 적게 받는 음성 인식 시스템의 특징 파라메터에 관한 연구를 수행하였다. 본 연구에서는 LPC 켑스트럼 계수, 멜 켑스트럼 계수, 루트 켑스트럼 계수, PLP 계수와 RASTA 처리를 한 멜 켑스트럼 계수와 음성의 에너지를 사용하였다 또한 음성에 포함된 편의(bias)를 제거하는 방법으로 CMS와 SBR 방법을 사용하여 그 성능을 비교하였다. 실험 결과에서 RASTA 멜 켑스트럼과 델타 켑스트럼을 사용하고 신초편의 제거 방법으로 CMS를 사용한 경우에 HMM 기반의 화자독립 단어 인식기의 오차가 $7.05\%$로 가장 우수한 성능을 나타내었다. 이러한 것은 멜 켑스트럼을 사용한 기준시스템과 비교하여 $59\%$정도 오차가 감소된 것이다.

This paper studied the feature parameters less affected by the emotional variation for the development of the robust speech recognition technologies. For this purpose, the effect of emotional variation on the speech recognition system and robust feature parameters of speech recognition system were studied using speech database containing various emotions. In this study, LPC cepstral coefficient, met-cepstral coefficient, root-cepstral coefficient, PLP coefficient, RASTA met-cepstral coefficient were used as a feature parameters. And CMS and SBR method were used as a signal bias removal techniques. Experimental results showed that the HMM based speaker independent word recognizer using RASTA met-cepstral coefficient :md its derivatives and CMS as a signal bias removal showed the best performance of $7.05\%$ word error rate. This corresponds to about a $52\%$ word error reduction as compare to the performance of baseline system using met - cepstral coefficient.

키워드

참고문헌

  1. Noam Amir,'Classifying Emotions in Speech: a Comparison of Methods', Proceedings of Eurospeech '2001, Vol. 1, pp. 127-130, Aalborg, Denmark, 2001
  2. A. Nogueiras, etc,'Speech Emotion Recognition using Hidden Markov Models', Proceedings of Eurospeech '2001, Vol. 4, pp. 2679-2682, Aalborg, Denmark, 2001
  3. R. W. Picard, Affective Computing, MIT Press 1997
  4. Janet E. Cahn, ,'The Generation of Affect in Synthesized Speech',, Journal of the American Voice I/O Society, Vol. 8, pp. 1-19, July 1990
  5. K. R. Scherer, D. R. Ladd, and K. E. A. Silverman, 'Vocal Cues to Speaker Affect: Testing Two Models', Journal Acoustical Society of America, Vol. 76, No. 5, pp. 1346-1355, Nov. 1984 https://doi.org/10.1121/1.391450
  6. Iain R. Murray and John L. Arnott, 'Toward the Simulation of Emotion in Synthetic Speech: A review of the literature on human vocal emotion',, Journal of Accoustal Society of America., pp. 1097-1108, Feb. 1993
  7. C. E. Williams and K. N. Stevens, 'Emotions and Speech: Some Acoustical Correlates', Journal Acoustical Society of America, Vol. 52, No. 4, pp. 1238-1250, 1972 https://doi.org/10.1121/1.1913238
  8. Michael Lewis and Jeannette M. Haviland, Handbook of Emotions,,The Guilford Press 1993
  9. L. R. Rabiner and B. H. Juang, Fundamentals of speech recognition, Prentice-Hall Inc., 1993
  10. S. Young, 'A Review of Large-Vocabulary Continuous-Speech Recognition',,IEEE Signal Processing Magazine, Vol. 13, No. 5, pp. 45-47, 1996 https://doi.org/10.1109/79.536824
  11. L. R. Rabiner,,'A Tutorial on HMMs and Selected Applications in Speech Recognition', Proc. IEEE, Vol. 77, No. 2, pp. 257-285, 1989
  12. J. C. Junqua, and J. P. Haton, Robustness in Automatic Speech Recognition - Fundamental and Applications, Kluwer Academic Publishers, 1996
  13. A. Acero, ect, 'Environmental Robustness in Automatic Speech Recognition,' in Proc. ICASSP, pp. 849-852, April 1990
  14. H. Hermansky, N. Morgan, H. G. Hirsch, 'Recognition of Speech in Additive and Convolutional Noise based RASTA Spectral Processing', in Proc. ICASSP, pp. 83-86, 1993
  15. J. Koehler, N. Morgan, H. Hermansky, H. G. Hirsch, G. Tong, 'Integrating RASTA-PLP into Speech Recognition', in Proc. ICASSP, pp. 421-424, 1994
  16. H. Hermansky, N. Morgan, A. Bayya, P. Kohn, 'Compensation for the Effect of the Communication Channel in Auditory-Like Analysis of Speech(RASTA-PLP)', in Proc. EUROSPEECH, vol. 3, pp. 1367-1370, Sep. 1991
  17. P.Alexandre, ect. 'Root Cepstral Analysis: A Unified View. Application to Speech Processing in Car Noise Environments', Speech Communication, vol. 12, no. 3, pp. 277-288, 1993 https://doi.org/10.1016/0167-6393(93)90099-7
  18. M. G. Rahim, B. H. Juang, 'Signal Bias Removal by Maximum Likelihood Estimation for Robust Telephone Speech Recognition', IEEE Trans. Speech & Audio Processing, vol. 4, No. 1, pp. 19-30, 1996 https://doi.org/10.1109/TSA.1996.481449

피인용 문헌

  1. A Low Bit Rate Speech Coder Based on the Inflection Point Detection vol.15, pp.4, 2015, https://doi.org/10.5391/IJFIS.2015.15.4.300
  2. A Fixed Rate Speech Coder Based on the Filter Bank Method and the Inflection Point Detection vol.16, pp.4, 2016, https://doi.org/10.5391/IJFIS.2016.16.4.276