Browse > Article

Statistical Speech Feature Selection for Emotion Recognition  

Kwon Oh-Wook (Chungbuk National University)
Chan Kwokleung (University of California)
Lee Te-Won (University of California)
Abstract
We evaluate the performance of emotion recognition via speech signals when a plain speaker talks to an entertainment robot. For each frame of a speech utterance, we extract the frame-based features: pitch, energy, formant, band energies, mel frequency cepstral coefficients (MFCCs), and velocity/acceleration of pitch and MFCCs. For discriminative classifiers, a fixed-length utterance-based feature vector is computed from the statistics of the frame-based features. Using a speaker-independent database, we evaluate the performance of two promising classifiers: support vector machine (SVM) and hidden Markov model (HMM). For angry/bored/happy/neutral/sad emotion classification, the SVM and HMM classifiers yield $42.3\%\;and\;40.8\%$ accuracy, respectively. We show that the accuracy is significant compared to the performance by foreign human listeners.
Keywords
Emotion Recognition; Support Vector Machines; Hidden Markov Models;
Citations & Related Records
연도 인용수 순위
  • Reference
1 M, Pardas, A. Bonafonte, J.L. Landabaso, 'Emotion recognition based on MPEG-4 facial animation parameters,' Proc. ICASSP 2002, Orlando, USA, May 2002
2 A. Nogueiras, A. Moreno, A. Bonafonte, J.B. Marino, 'Speech emotion recognition using hidden Markov models,' Proc, Eurospeech 2001, Aalborg, Denmark, Sep, 2001
3 R. Tato, R, Santos, R. Kompe, J.M. Pardo, 'Emotional space improves emotion recognition,' Proc. ICSLP 2002, 2029-2032, Sep, 2002
4 ETSI Standard, Final Draft ETSI ES 202 050 v1,1,1 (2002-07), Speech Processing, Transmission and Quality Aspects (STQ); Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms
5 M, Rahurkar, J.H.L. Hansen, J. Meyerhoff, G, Saviolakis, M. Koenig, 'Frequency Distribution Based Weighted Sub-band Approach for Classification of Emotional/Stressful Content in Speech,' Proc. Eurospeech-2003, 721-724, Geneva, Switzerland, Sep, 2003
6 N. Amir, 'Classifying emotions in speech: A comparison of methods,' Proc. Eurospeech 2001, Aalborg, Denmark, Sep. 2001
7 K. R. Scherer, 'Adding the affective dimension: A new look in speech analysis and synthesis,' Proc. ICSLP 96, 1996
8 G. Zhou, J,H.L. Hansen, and J,K. Kaiser, 'Nonlinear feature based classification of speech under stress,' IEEE Trans. Speech and Audio Processing, 9 (3), 201-216, Mar. 2001   DOI   ScienceOn
9 R.W. Picard, Affective computing, (MIT Media Lab Perceptual Computing Section Technical Report No. 321, 1995.)
10 T. S. Polzin, A. Waibel, 'Emotion-sensitive human-computer interfaces,' ISCA Workshop on Speech and Emotion, Belfast, 2000
11 R. Cowie, 'Describing the emotional states expressed in speech,' ISCA Workshop on Speech and Emotion, Belfast 2000
12 D. Ververidis, C. Kotropoulos, I. Pitas, 'Automatic emotional speech classification,' Proc. ICASSP 2004, 1-5931-596, 2004
13 C.-W. Hsu and C,-J, Lin, 'A comparison of methods for multi-class support vector machines,' IEEE Transactions on Neural Networks, 13, 415-425, 2002   DOI   ScienceOn
14 V, Vapnik, Statistical Learning Theory, (New York: Wiley, 1998.)
15 S. McGilioway, R. Cowie, E. Douglas-Cowie, 'Approaching automatic recognition of emotion from voice: A rough benchmark,' ISCA Workshop on Speech and Emotion, Belfast 2000
16 L. Rabiner and B.-H, Juang, Fundamentals of Speech Recognition, (Prentice-Hall, 1993.)
17 A. Tickle, 'English and Japanese speakers' emotion vocalization and recognition: A comparison highlighting vowel quality,'' ISCA Workshop on Speech and Emotion, Belfast, 2000
18 A.J. Hayter, Probability and Statistics for Engineers and Scientists, (PWS Publishing Company, 1995.)
19 C. M. Lee, S. S. Narayanan, 'Toward detecting emotions in spoken dialogs,' IEEE Trans. Speech and Audio Processing, 13 (2), 293-303, Mar. 2005   DOI   ScienceOn
20 B.D. Ripley, Pattern Recognition and Neural Networks. (Cambridge, U.K.: Cambridge Univ. Press, 1996.)
21 J. Ma, Y. Zhao, and S. Ahalt, OSU SVM Classifier Matlab Toolbox (ver 3.00), http://eewww.eng.ohio-state.edu/-maj/osu_svm/
22 L. Rabiner and R,W. Schafer, Digital Processing of Speech Signals, (Prentice-Hall, 1978.)
23 S. Young, G. Evermann, T. Hain, D. Kershaw, G. Moore, J. Odell, D. Ollason, V. Valtchev, P. Woodland, The HTK Book Version 3.2, (Cambridge University Engineering Department, 2002.)
24 S, Steidl, M. Levit, A, Batliner, E, Noth, H. Niemann, 'Of all things the measure is man - Automatic classification of emotions and inter-Iabeler consistency,' Proc. ICASSP 2005, PP. 1-3171-320, 2005