[KSCI] Korea Science Citation Index Service

Statistical Speech Feature Selection for Emotion Recognition

Kwon Oh-Wook (Chungbuk National University)
Chan Kwokleung (University of California)
Lee Te-Won (University of California)

Publication Information

The Journal of the Acoustical Society of Korea / v.24, no.4E, 2005 , pp. 144-151 More about this Journal

Abstract

We evaluate the performance of emotion recognition via speech signals when a plain speaker talks to an entertainment robot. For each frame of a speech utterance, we extract the frame-based features: pitch, energy, formant, band energies, mel frequency cepstral coefficients (MFCCs), and velocity/acceleration of pitch and MFCCs. For discriminative classifiers, a fixed-length utterance-based feature vector is computed from the statistics of the frame-based features. Using a speaker-independent database, we evaluate the performance of two promising classifiers: support vector machine (SVM) and hidden Markov model (HMM). For angry/bored/happy/neutral/sad emotion classification, the SVM and HMM classifiers yield $42.3\%\;and\;40.8\%$ accuracy, respectively. We show that the accuracy is significant compared to the performance by foreign human listeners.

Keywords

Emotion Recognition; Support Vector Machines; Hidden Markov Models;

Citations & Related Records

Reference

1	M, Pardas, A. Bonafonte, J.L. Landabaso, 'Emotion recognition based on MPEG-4 facial animation parameters,' Proc. ICASSP 2002, Orlando, USA, May 2002
2	A. Nogueiras, A. Moreno, A. Bonafonte, J.B. Marino, 'Speech emotion recognition using hidden Markov models,' Proc, Eurospeech 2001, Aalborg, Denmark, Sep, 2001
3	R. Tato, R, Santos, R. Kompe, J.M. Pardo, 'Emotional space improves emotion recognition,' Proc. ICSLP 2002, 2029-2032, Sep, 2002
4	ETSI Standard, Final Draft ETSI ES 202 050 v1,1,1 (2002-07), Speech Processing, Transmission and Quality Aspects (STQ); Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms
5	M, Rahurkar, J.H.L. Hansen, J. Meyerhoff, G, Saviolakis, M. Koenig, 'Frequency Distribution Based Weighted Sub-band Approach for Classification of Emotional/Stressful Content in Speech,' Proc. Eurospeech-2003, 721-724, Geneva, Switzerland, Sep, 2003
6	N. Amir, 'Classifying emotions in speech: A comparison of methods,' Proc. Eurospeech 2001, Aalborg, Denmark, Sep. 2001
7	K. R. Scherer, 'Adding the affective dimension: A new look in speech analysis and synthesis,' Proc. ICSLP 96, 1996
8	G. Zhou, J,H.L. Hansen, and J,K. Kaiser, 'Nonlinear feature based classification of speech under stress,' IEEE Trans. Speech and Audio Processing, 9 (3), 201-216, Mar. 2001 DOI ScienceOn
9	R.W. Picard, Affective computing, (MIT Media Lab Perceptual Computing Section Technical Report No. 321, 1995.)
10	T. S. Polzin, A. Waibel, 'Emotion-sensitive human-computer interfaces,' ISCA Workshop on Speech and Emotion, Belfast, 2000
11	R. Cowie, 'Describing the emotional states expressed in speech,' ISCA Workshop on Speech and Emotion, Belfast 2000
12	D. Ververidis, C. Kotropoulos, I. Pitas, 'Automatic emotional speech classification,' Proc. ICASSP 2004, 1-5931-596, 2004
13	C.-W. Hsu and C,-J, Lin, 'A comparison of methods for multi-class support vector machines,' IEEE Transactions on Neural Networks, 13, 415-425, 2002 DOI ScienceOn
14	V, Vapnik, Statistical Learning Theory, (New York: Wiley, 1998.)
15	S. McGilioway, R. Cowie, E. Douglas-Cowie, 'Approaching automatic recognition of emotion from voice: A rough benchmark,' ISCA Workshop on Speech and Emotion, Belfast 2000
16	L. Rabiner and B.-H, Juang, Fundamentals of Speech Recognition, (Prentice-Hall, 1993.)
17	A. Tickle, 'English and Japanese speakers' emotion vocalization and recognition: A comparison highlighting vowel quality,'' ISCA Workshop on Speech and Emotion, Belfast, 2000
18	A.J. Hayter, Probability and Statistics for Engineers and Scientists, (PWS Publishing Company, 1995.)
19	C. M. Lee, S. S. Narayanan, 'Toward detecting emotions in spoken dialogs,' IEEE Trans. Speech and Audio Processing, 13 (2), 293-303, Mar. 2005 DOI ScienceOn
20	B.D. Ripley, Pattern Recognition and Neural Networks. (Cambridge, U.K.: Cambridge Univ. Press, 1996.)
21	J. Ma, Y. Zhao, and S. Ahalt, OSU SVM Classifier Matlab Toolbox (ver 3.00), http://eewww.eng.ohio-state.edu/-maj/osu_svm/
22	L. Rabiner and R,W. Schafer, Digital Processing of Speech Signals, (Prentice-Hall, 1978.)
23	S. Young, G. Evermann, T. Hain, D. Kershaw, G. Moore, J. Odell, D. Ollason, V. Valtchev, P. Woodland, The HTK Book Version 3.2, (Cambridge University Engineering Department, 2002.)
24	S, Steidl, M. Levit, A, Batliner, E, Noth, H. Niemann, 'Of all things the measure is man - Automatic classification of emotions and inter-Iabeler consistency,' Proc. ICASSP 2005, PP. 1-3171-320, 2005