DOI QR코드

DOI QR Code

Speech Emotion Recognition using Feature Selection and Fusion Method

특징 선택과 융합 방법을 이용한 음성 감정 인식

  • Kim, Weon-Goo (Dept. of Electrical Engineering, Kunsan National University)
  • Received : 2017.03.23
  • Accepted : 2017.07.11
  • Published : 2017.08.01

Abstract

In this paper, the speech parameter fusion method is studied to improve the performance of the conventional emotion recognition system. For this purpose, the combination of the parameters that show the best performance by combining the cepstrum parameters and the various pitch parameters used in the conventional emotion recognition system are selected. Various pitch parameters were generated using numerical and statistical methods using pitch of speech. Performance evaluation was performed on the emotion recognition system using Gaussian mixture model(GMM) to select the pitch parameters that showed the best performance in combination with cepstrum parameters. As a parameter selection method, sequential feature selection method was used. In the experiment to distinguish the four emotions of normal, joy, sadness and angry, fifteen of the total 56 pitch parameters were selected and showed the best recognition performance when fused with cepstrum and delta cepstrum coefficients. This is a 48.9% reduction in the error of emotion recognition system using only pitch parameters.

Keywords

References

  1. R. A. Calvo, S. D'Mello, "Affect Detection: An Interdisciplinary Review of Models, Methods, and Their Applications,", IEEE Trans. Affective Computing, Vol. 1, No 1, pp. 18-37, Jan 2010 https://doi.org/10.1109/T-AFFC.2010.1
  2. I. R. Murray, J. L. Arnott, "Toward the Simulation of Emotion in Synthetic Speech: A Review of the Literature on Human Vocal Emotion", Journal Acoustical Society of America, pp.1097-1108, Feb. 1993
  3. R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz, and J. Taylor, "Emotion recognition in human-computer interaction," IEEE Signal Process. Mag., Vol. 18, No. 1, pp. 32-80, Jan. 2001 https://doi.org/10.1109/79.911197
  4. V. Kostv, S. Fukuda, "Emotion in User Interface, Voice Interaction System," IEEE International Conference on Systems, Cybernetics Representation, No.2, pp. 798-803, 2000
  5. T. Moriyama, S. Oazwa, "Emotion Recognition and Synthesis System on Speech," IEEE Intl. Conference on Multimedia Computing and System, pp. 840-844. 1999
  6. L. C. Siva, P. C. Ng, "Bimodal Emotion Recognition," Proceeding of the 4th Intl. Conference on Automatic Face and Gesture Recognition, pp. 332-335. 2000
  7. K. Amol T., R. M. R. Guddeti, "Multiclass SVM-based Language-Independent Emotion Recognition using Selective Speech Features", Proceedings of ICACCI, pp. 1069-1073, 2014
  8. R. S. Sudhkar, M. C. Anil, "Analysis of Speech Features for Emotion Detection : A review", Proceedings of 2015 International Conference on Computing Communication Control and Automation, pp. 661-664, 2015
  9. C. Busso, S. Lee, S. Narayanan, "Analysis of Emotionally Salient Aspects of Fundamental Frequency for Emotion Detection,", IEEE Trans. Speech and Audio Processing, Vol. 17, No 4, pp. 582-596, May 2009 https://doi.org/10.1109/TASL.2008.2009578
  10. S. Ntalampiras, N. Fakotakis, "Modeling the Temporal Evolution of Acoustic Parameters for Speech Emotion Recognition", IEEE Trans. Affective Computing, Vol. 3, No. 1, pp. 116-125, Jan. 2012 https://doi.org/10.1109/T-AFFC.2011.31
  11. Y. G. Kim, Y. C. Bae, "Design of Emotion Recognition Model Using Fuzzy Logic", Proceedings of KFIS Spring Conference, 2000
  12. K. B. Sim, C. H. Park, "Analyzing the Element of Emotion Recognition from Speech", Journal of Korean Institute of Intelligent Systems, Vol. 11, No. 6, pp. 510-515, 2001
  13. N. Kim, W. Seong, H. Ha, and H. Kim, "Comparison of feature parameters for speech emotion recognition", Proceedings of Korean Institute of Communications and Information Sciences, pp. 167-168, 2016
  14. G. Lee, W. Kim, "Emotion Recognition using Pitch Parameters of Speech", Journal of Korean Institute of Intelligent Systems, Vol. 25, No. 3, pp. 272-278, June 2015 https://doi.org/10.5391/JKIIS.2015.25.3.272
  15. P. A. Devijver, J. Kitteler, "Pattern Recognition : A Statistical Approach", London: Prentice-Hall International, 1982
  16. P. Boersma, D. Weeninck, "PRAAT, a system for doing phonetics by computer," Inst. Phon. Sci. Univ. of Amsterdam, Amsterdam, Negherlands, Tech. Rep. 132, 1996 [Online]. Available: http://www.praat.org.
  17. D. Ververidis, C. Kotropoulos, L. Pitas, "Automatic Emotional Speech Classification", Proceedings of ICASSP'04, 2004
  18. B. S. Kang, "Text-independent Emotion Recognition Algorithm using Speech Signal," Master thesis, Yonsei University, 2000