Discrimination of Emotional States In Voice and Facial Expression

  • Kim, Sung-Ill (Center of Speech Technology, State Key Laboratory of Intelligent Technology and Systems, Dept. of Computer Science & Technology, Tsinghua University) ;
  • Yasunari Yoshitomi (Dept. of Environmental Information, Faculty of Human Environment, Kyoto Prefectural University) ;
  • Chung, Hyun-Yeol (Dept. of Information and Communication Engrg., School of Electrical Engrg. And Computer Science Yeungnam university)
  • Published : 2002.06.01

Abstract

The present study describes a combination method to recognize the human affective states such as anger, happiness, sadness, or surprise. For this, we extracted emotional features from voice signals and facial expressions, and then trained them to recognize emotional states using hidden Markov model (HMM) and neural network (NN). For voices, we used prosodic parameters such as pitch signals, energy, and their derivatives, which were then trained by HMM for recognition. For facial expressions, on the other hands, we used feature parameters extracted from thermal and visible images, and these feature parameters were then trained by NN for recognition. The recognition rates for the combined parameters obtained from voice and facial expressions showed better performance than any of two isolated sets of parameters. The simulation results were also compared with human questionnaire results.

Keywords

References

  1. A, Waibel, 'Prosody and Speech Recognition,' Doctoral Thesis, Carnegie Mellon Univ. 1986
  2. C. Tuerk, 'A Text-to-Speech System based on {NET}ta1k,' Master's Thesis, Cambridge University Engineering Dept, 1990
  3. D. Talkin, 'A robust algorithm for pitch tracking (RAPT),' in Speech Coding and Synthesis, Elsevier Science, Amsterdam, 495-518, 1995
  4. A. E. Turk, J. R. Sawusch, 'The processing of duration and intensity cues to prominence,' Journal of the Acoustical Society of America, 99 (6), 3782-3790, June 1996 https://doi.org/10.1121/1.414995
  5. A. Fernald, 'Approval and disapproval: Infant responsiveness to vocal affect in familiar and unfamiliar languages,' Developmental Psychology, 64, 657-674, 1993
  6. R. W. Picard, Affective Computing. MIT Press, Cam-bridge, MA, 1997
  7. K. F. Lee, 'Automatic Speech Recognition; The Development of SPHINX System,' Kluwer Academic Publisher, Norwell, Mass., 1989
  8. L. Rabiher, BH. Juang, 'Fundamentals of Speech Recog-nition,' Prentice Hall Signal Processing Series, 1993
  9. Y. Yoshitomi, N. Miyawaki, S. Tomita, and S. Kimura, 'Facial Expression Recognition using Thermal Image Recognition and Neural Network,' Proc. of 6th IEEE Int. Work on Robot and Human Communication, 380-385, 1997
  10. Y. Sugimoto, Y. Yoshitomi, and S. Tomita 'A method for Detecting Transitions of Emotional States using a Thermal Facial image based on a Synthesis of Facial Expressions,' Journal of Robotics and Autonomous Systems, 31, 147-160, 2000 https://doi.org/10.1016/S0921-8890(99)00104-9
  11. Y. Yoshitomi, T. Miyaura, S. Tomita, and S. Kimura, 'Face Identification Using Thermal Image Processing,' Proc. of $6^{th}$ IEEE International Workshop on Robot and Human Communication, 374-379, 1997
  12. Y. Yoshitomi, A. Tsuchiya, and S. Tomita, 'Face Recog-nition Using Dynamic Thermal image Processing,' Proc. of $7^{th}$ IEEE international Workshop on Robot and Human Communication, 443-448, 1998
  13. Y. Yoshitomi, M. Murakawa, and S. Tomita, 'Face Iden-tification Using Sensor Fusion of Thermal image and Visible Ray Image,' Proc. of $7^{th}$