DOI QR코드

DOI QR Code

A Study on Robust Speech Emotion Feature Extraction Under the Mobile Communication Environment

이동통신 환경에서 강인한 음성 감성특징 추출에 대한 연구

  • 조윤호 (단국대학교 정보.컴퓨터과학과) ;
  • 박규식 (단국대학교 정보. 컴퓨터과학과)
  • Published : 2006.08.01

Abstract

In this paper, we propose an emotion recognition system that can discriminate human emotional state into neutral or anger from the speech captured by a cellular-phone in real time. In general. the speech through the mobile network contains environment noise and network noise, thus it can causes serious System performance degradation due to the distortion in emotional features of the query speech. In order to minimize the effect of these noise and so improve the system performance, we adopt a simple MA (Moving Average) filter which has relatively simple structure and low computational complexity, to alleviate the distortion in the emotional feature vector. Then a SFS (Sequential Forward Selection) feature optimization method is implemented to further improve and stabilize the system performance. Two pattern recognition method such as k-NN and SVM is compared for emotional state classification. The experimental results indicate that the proposed method provides very stable and successful emotional classification performance such as 86.5%. so that it will be very useful in application areas such as customer call-center.

본 논문은 이동전화 (Cellular phone)를 통해 실시간으로 습득된 음성으로부터 사람의 감성 상태를 평상 혹은 화남으로 인식할 수 있는 음성 감성인식 시스템을 제안하였다. 일반적으로 이동전화를 통해 수신된 음성은 화자의 환경 잡음과 네트워크 잡음을 포함하고 있어 음성 신호의 감성특정을 왜곡하게 되고 이로 인해 인식 시스템에 심각한 성능저하를 초래하게 된다. 본 논문에서는 이러한 잡음 영향을 최소화하기 위해 비교적 단순한 구조와 적은 연산량을 가진 MA (Moving Average) 필터를 감성 특정벡터에 적용해서 잡음에 의한 시스템 성능저하를 최소화하였다. 또한 특정벡터를 최적화할 수 있는 SFS (Sequential Forward Selection) 기법을 사용해서 제안 감성인식 시스템의 성능을 한층 더 안 정화시켰으며 감성 패턴 분류기로는 k-NN과 SVM을 비교하였다. 실험 결과 제안 시스템은 이동통신 잡음 환경에서 약 86.5%의 높은 인식률을 달성할 수 있어 향후 고객 센터 (Call-center) 등에 유용하게 사용될 수 있을 것으로 기대된다.

Keywords

References

  1. M. Liu and C. Wan, 'A Study on Content-based Classification Retrieval of Audio Database,' Proc. of the International Database Engineering & Applications Symposium, 339-345. 2001
  2. F. Dellaert, T. Polzin, and A. Waibel, 'Recognizing Emotion in Speech', In Proc. International Conf. on Spoken Language Processing, 1970-1973, 1996
  3. T. Moriyama and Oazwa, 'Emotion Recognition and Synthesis System on Speech', IEEE International Conference on Multimedia Computing and Systems, 1 840-844, Florence, Italy, 1999
  4. A. Nogueiras, A. Moreno, A. Bonafonte, and J. B. Marino, 'Speech Emotion Recognition Using Hidden Markov Models,' presented at Eurospeech 2001, Scandinavia, 2001
  5. Noam Amir, 'Classifying Emotion in Speech: a Comparison of Methods', Proceedings of Euro Speech'2001, 1 127-130, Aalborg, Denmark, 2001
  6. C. M. Lee, and S. S. Narayanan, 'Towards Detecting Emotions in Spoken Dialogs,' in IEEE Transactions on Speech and Audio Processing, 13 (2) 2005
  7. Guojun Zhou, John H. L. Hansen, and James F. Kaiser, 'Nonlinear Feature Based Classification of Speech Under Stress' IEEE Transactions on Speech and Audio Processing, 9 (3) 2001
  8. Anil Jain and Douglas Zongker, 'Feature Selection : Evaluation, Application, and Small Sample Performance', IEEE Pattern Analysis and Machine Intelligence, 19 (2) 153-158, 1997 https://doi.org/10.1109/34.574797
  9. Lingyun Gu and Stephen A. Zahorian, 'A New Robust Algorithm for Isolated Word Endpoint Detection,' IV-4161 International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL, 13-17, 2002
  10. P. de la Cuadra, A. Master and C. Sapp, 'Efficient Pitch Detection Techniques for Interactive Music', International Computer Music Conference, 403-406, Havana, Cuba, September, 2001
  11. M.J. Ross, H.L. Shaer, A. Cohen, R. Freudberg, and H. J. Manley, 'Average Magnitude Difference Function Pitch Extractor', Acoustics, Speech, and Signal Processing (see also IEEE Transactions on Signal Processing), IEEE Transactions on 22 (5) 353-362, Oct. 1974 https://doi.org/10.1109/TASSP.1974.1162598
  12. Xuejing Sun, 'A Pitch Determination Algorithm Based On Subharmonic-to-Harmonic Ratio', International Conference on Spoken Language Processing '2000, 676-679, 2000
  13. 강봉석, '음성 신호를 이용한 문장독립 감정 인식 시스템', 석사학위 논문, 연세대학교, 2001