[KSCI] Korea Science Citation Index Service

Discrimination of Emotional States In Voice and Facial Expression

Kim, Sung-Ill (Center of Speech Technology, State Key Laboratory of Intelligent Technology and Systems, Dept. of Computer Science & Technology, Tsinghua University)
Yasunari Yoshitomi (Dept. of Environmental Information, Faculty of Human Environment, Kyoto Prefectural University)
Chung, Hyun-Yeol (Dept. of Information and Communication Engrg., School of Electrical Engrg. And Computer Science Yeungnam university)

Publication Information

The Journal of the Acoustical Society of Korea / v.21, no.2E, 2002 , pp. 98-104 More about this Journal

Abstract

The present study describes a combination method to recognize the human affective states such as anger, happiness, sadness, or surprise. For this, we extracted emotional features from voice signals and facial expressions, and then trained them to recognize emotional states using hidden Markov model (HMM) and neural network (NN). For voices, we used prosodic parameters such as pitch signals, energy, and their derivatives, which were then trained by HMM for recognition. For facial expressions, on the other hands, we used feature parameters extracted from thermal and visible images, and these feature parameters were then trained by NN for recognition. The recognition rates for the combined parameters obtained from voice and facial expressions showed better performance than any of two isolated sets of parameters. The simulation results were also compared with human questionnaire results.

Keywords

Emotion recognition; HMM; Neural network; Prosody; Facial expression;

Citations & Related Records

Reference

1	A. E. Turk, J. R. Sawusch, 'The processing of duration and intensity cues to prominence,' Journal of the Acoustical Society of America, 99 (6), 3782-3790, June 1996 DOI ScienceOn
2	A. Fernald, 'Approval and disapproval: Infant responsiveness to vocal affect in familiar and unfamiliar languages,' Developmental Psychology, 64, 657-674, 1993
3	Y. Sugimoto, Y. Yoshitomi, and S. Tomita 'A method for Detecting Transitions of Emotional States using a Thermal Facial image based on a Synthesis of Facial Expressions,' Journal of Robotics and Autonomous Systems, 31, 147-160, 2000 DOI ScienceOn
4	A, Waibel, 'Prosody and Speech Recognition,' Doctoral Thesis, Carnegie Mellon Univ. 1986
5	K. F. Lee, 'Automatic Speech Recognition; The Development of SPHINX System,' Kluwer Academic Publisher, Norwell, Mass., 1989
6	D. Talkin, 'A robust algorithm for pitch tracking (RAPT),' in Speech Coding and Synthesis, Elsevier Science, Amsterdam, 495-518, 1995
7	Y. Yoshitomi, T. Miyaura, S. Tomita, and S. Kimura, 'Face Identification Using Thermal Image Processing,' Proc. of $6^{th}$ IEEE International Workshop on Robot and Human Communication, 374-379, 1997
8	C. Tuerk, 'A Text-to-Speech System based on {NET}ta1k,' Master's Thesis, Cambridge University Engineering Dept, 1990
9	Y. Yoshitomi, N. Miyawaki, S. Tomita, and S. Kimura, 'Facial Expression Recognition using Thermal Image Recognition and Neural Network,' Proc. of 6th IEEE Int. Work on Robot and Human Communication, 380-385, 1997
10	Y. Yoshitomi, A. Tsuchiya, and S. Tomita, 'Face Recog-nition Using Dynamic Thermal image Processing,' Proc. of $7^{th}$ IEEE international Workshop on Robot and Human Communication, 443-448, 1998
11	R. W. Picard, Affective Computing. MIT Press, Cam-bridge, MA, 1997
12	Y. Yoshitomi, M. Murakawa, and S. Tomita, 'Face Iden-tification Using Sensor Fusion of Thermal image and Visible Ray Image,' Proc. of $7^{th}$
13	L. Rabiher, BH. Juang, 'Fundamentals of Speech Recog-nition,' Prentice Hall Signal Processing Series, 1993