[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3837/tiis.2019.02.018

Incomplete Cholesky Decomposition based Kernel Cross Modal Factor Analysis for Audiovisual Continuous Dimensional Emotion Recognition

Li, Xia (College of Telecommunication and Information Engineering, Nanjing University of Posts and Telecommunications)
Lu, Guanming (College of Telecommunication and Information Engineering, Nanjing University of Posts and Telecommunications)
Yan, Jingjie (College of Telecommunication and Information Engineering, Nanjing University of Posts and Telecommunications)
Li, Haibo (College of Telecommunication and Information Engineering, Nanjing University of Posts and Telecommunications)
Zhang, Zhengyan (College of Telecommunication and Information Engineering, Nanjing University of Posts and Telecommunications)
Sun, Ning (Engineering Research Center of Wideband Wireless Communication Technology, Ministry of Education, Nanjing University of Posts and Telecommunications)
Xie, Shipeng (College of Telecommunication and Information Engineering, Nanjing University of Posts and Telecommunications)

Publication Information

KSII Transactions on Internet and Information Systems (TIIS) / v.13, no.2, 2019 , pp. 810-831 More about this Journal

Abstract

Recently, continuous dimensional emotion recognition from audiovisual clues has attracted increasing attention in both theory and in practice. The large amount of data involved in the recognition processing decreases the efficiency of most bimodal information fusion algorithms. A novel algorithm, namely the incomplete Cholesky decomposition based kernel cross factor analysis (ICDKCFA), is presented and employed for continuous dimensional audiovisual emotion recognition, in this paper. After the ICDKCFA feature transformation, two basic fusion strategies, namely feature-level fusion and decision-level fusion, are explored to combine the transformed visual and audio features for emotion recognition. Finally, extensive experiments are conducted to evaluate the ICDKCFA approach on the AVEC 2016 Multimodal Affect Recognition Sub-Challenge dataset. The experimental results show that the ICDKCFA method has a higher speed than the original kernel cross factor analysis with the comparable performance. Moreover, the ICDKCFA method achieves a better performance than other common information fusion methods, such as the Canonical correlation analysis, kernel canonical correlation analysis and cross-modal factor analysis based fusion methods.

Keywords

continuous dimensional emotion recognition; incomplete Cholesky decomposition; kernel cross-modal factor analysis; multimodal information fusion;

Citations & Related Records

Reference

1	Z. Huang, T. Dang, N. Cummins, B. Stasak, P. Le, V. Sethu, J. Epps, "An investigation of annotation delay compensation and output-associative fusion for multimodal continuous emotion prediction," in Proc. of 5th International Workshop on Audio/Visual Emotion Challenge, pp. 41-48, Oct. 2015.
2	C. C. Chang, and C. J. Lin, "LibSVM: a library for support vector machines," ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 3, Apr. 2011
3	G. Tigeorgis, F. Ringeval, R. Brueckner, E. Marchi, M. A. Nicolaou, B. Schuller, and S. Zafeiriou, "Adieu features? end-to-end speech emotion recognition using a deep convolutional recurrent network," In Proc.of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5200-5203, Mar. 2016.
4	Z. Zeng, M. Pantic, G. Roisman, and T. Huang, "A survey of affect recognition methods: audio, visual and spontaneous expressions," IEEE Transactions on Pattern Analysis amd Machine Inteligencel, Vol. 31, no. 1, pp. 39-58, Jan. 2009. DOI
5	J. Yan, W. Zheng, M. Xin, and J. Yan, "Integrating facial expression and body gesture in videos for emotion recognition," IEICE Transactions on Information and Systerms, vol. E97.D, no. 3, pp. 610-613, Mar. 2014. DOI
6	J. Yan, W. Zheng, Q. X, G. L, H. Li, and B. W, "Sparse kernel reduced-rank regression for bimodal emotion recognition from facial expression and speech," IEEE Transactions on Multimedia, vol. 18, no. 7, pp. 1319-1329, Jul. 2016. DOI
7	Y. Wang, L. Guan, and A. N. Venetsanopoulos, "Audiovisual emotion recognition via cross-modal association in kernel space," in Proc. of IEEE International Conference on Multimedia & Expo, pp. 1-6, Jul. 2011.
8	Y. Wang, L. Guan, and A. N. Venetsanopoulos, "Kernel cross-modal factor analysis for information fusion with application to bimodal emotion recognition," IEEE Transations on Multimedia, vol. 14, no. 3, pp. 597-607, Jun. 2012. DOI
9	D. Li, N. Dimitrova, M. Li, and I. K. Sethi, "Multimedia content processing through cross-modal association," in Proc. of 11th ACM International Conference on Multimedia, pp. 604-611, Nov. 2003.
10	C. H. Wu, J. C. Lin, and W.L. Wei, "Survey on audiovisual emotion recognition: databases, features, and data fusion strategies," Apsipa Transactions on Signal & Information Processing, vol. 3, pp. 1-18, 2014. DOI
11	B.Schuller, M. Valstar, F. Eyben, R. Cwie, and M. Pantic, "AVEC 2012 - the continuous audio/visual emotion challenge," in Proc. of 14th ACM International Conference on Multimodal Interaction, pp. 449-456, Oct. 2012.
12	C. Vinola, and K. Vimaladevi, "A survey on human emotion recognition approaches, databases and applications," Electronic Letters on Computer Vision & Image Analysis, vol. 14, no. 2, pp. 24-44, 2015. DOI
13	L. Pang, S. Zhu, and C. W. Ngo, "Deep Multimodal Learning for Affective Analysis and Retrieval," IEEE Transactions on Multimedia, vol. 17, no. 11, pp. 2008-2020, Nov. 2015. DOI
14	C. H. Wu, J. C. Lin, and W. L. Wei, "Two-level hierarchical alignment of semi-coupled HMM-based audiovisual emotion recognition with temporal course," IEEE Transactions on Multimedia, vol. 15, no. 8, pp. 1880-1895, Dec. 2013. DOI
15	M. Valstar, B. Schuller, K. Smith, F. Eyben, B. Jiang, S. Bilakhia, S. Schnieder, R. Cowie, and M. Pantic, "AVEC 2013 - the continuous audio/visual emotion and depression recognition challenge," in Proc. of the 3rd ACM International international worshop on Audio/visusl emotion challenge, pp. 3-10, Oct. 2013.
16	C. Soladie, H. Salam, N. Stoiber, and R. Seguier, "Continuous facial expression representation for multimodal emotion detection," International Journal of Advanced Computer Science, vol. 3, no. 5, pp. 202-216, May. 2013.
17	M. Valstar, B. Schuller, K. Smith, T. Almaev, F. Eyben, J. Krajewski, R. Cowie, and M. Pantic, "AVEC 2014-3D dimensional affect and depression recognition challenge," in Proc. of 4th International Workshop on Audio/Visual Emotiona Challenge, pp. 3-10, Nov. 2014.
18	F. Ringeval, B. Schuller, M. Valster, S. Jaiswal, E. Marchi, D. Lalanne R. Cowie, and M. Pantic, "AV+EC 2015-the first affect recognition challenge bridging across audio, video, and physiological data," in proc. of 5th International Workshop on Audio/Visual Emotion Challenge, pp. 3-8, Oct. 2015.
19	M. Valstar, J. Gratch, B. Schuller, F. Ringeval, D. Lalanne, M. T. Torres, S. Scherer, G. Stratou, R. Cowie, and M. Pantic, "AVEC 2016 - depression, mood, and emotion recognition workshop and challenge," in Proc. of 6th International Workshaop on Audio/Visual Challenge, pp. 3-10, Oct. 2016.
20	F. Eyben, M. Wöllmer, M.F. Valstar, H.Gunes, B.Schuller, and M.Pantic, "String-based audiovisual fusion of behavioural events for the assessment of dimensional affect," in Proc. of IEEE International Conference on Automatic Face & Gesture Recognition, pp. 322-329, Mar. 2011.
21	L. Chao, J. Tao, M. Yang, Y. Li, and Z. Wen, "Long short term memory recurrent neural network based multimodal dimensional emotion recognition," in Proc. of 5th International Workshop on Audo/Visual Emotion Challenge, pp. 65-72, Oct. 2015.
22	S. Chen, and Q. Jin, "Multi-modal dimensional emotion recognition using recurrent neural networks," in Proc. of 5th International Workshop on Audo/Visual Emotion Challenge, pp. 49-56, Oct. 2015.
23	P. Cardinal, M. Dehak, A. Lameiras, J. Alam, and P. Boucher, "ETS system for AV+EC 2015 challenge," in Proc. of 5th International Workshop on Audo/Visual Emotion Challenge, pp. 17-23, Oct. 2015.
24	A. Sayedelahl, R. Araujo, and M. S. Kamel, "Audio-visual feature-decision level fusion for spontaneous emotion estimation in speech conversation," in Proc. of IEEE International Conference on Multimedia and Expo Workshops, pp. 1-6, Oct. 2013.
25	Y. Falinie, A. Gaus, H. Meng, A. Jan, F. Zhang, and S. Turabzadeh, "Automatic affective dimension recognition from naturalistic facial expressions based on wavelet filtering and PLS regression," in Proc. of IEEE International Conference and Workshop on Automatic Face and Gesture Recognition, pp. 1-6, Oct. 2015.
26	M. Kachele, M. Schels, P. Thiam, and F. Schwenker, "Fusion mappings for multimodal affect recognition," in Proc. of IEEE Symposium Series on Computional Intelligence, pp. 307-313, Jan. 2015.
27	L. Tian, J. D. Moore, and C. Lai, "Recognizing emotions in dialogues with acoustic and lexical features," in Proc. of IEEE International Conference on Affective Computing and Intelligent Interaction, pp. 737-742, Dec. 2015.
28	J. Nicolle, V. Rapp, K. Bailly, L. Prevost, and M. Chetouani, "Robust continuous prediction of human emotions using multiscale dynamic cues," in Proc. of 14th ACM International Conference on Multimodal Interaction, pp. 501-508, Oct. 2012.
29	C. Soladie. H. Salam, C. Pelachaud, N. Stoiber, and R. Seguier, "A multimodal fuzzy inference system using a continuous facial expression representation for emotion detection," in Proc. of 14th ACM International Conference on Multimodal Interaction, pp. 493-500, Oct. 2012.
30	A. Metallinou, A. Katsamanis, Y. Wang, and S. Narayanan, "Tracking changes in continuous emotion state using body language and prosodic cues," in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2288-2291, Jul. 2011.
31	D. R.Hardoon, S. Szedmak, and J. Shawe-Taylor, "Canonical Correlation Analysis: An Overview with Application to Learning methods," Neural Computation, vol. 16, no. 12, pp.2639-2664, Dec. 2004. DOI
32	Y. Song, L. P. Morency, and R. Davis, "Learning a sparse codebook of facial and body microexpressions for emotion recognition," in Proc. of 15th ACM on International Conference on Multimodal Interaction, pp. 237-244, Dec. 2013.
33	F. R. Bach, and M. I. Jordan, "Kernel independent component analysis," Journal of Machine Learning Research, vol. 3, pp. 1-48, Jul. 2002.
34	J. Shawe-Taylor and N. Cristianini, Kernel Method for Pattern Analysis, Cambridge, New York, 2004.
35	F. Ringeval, A. Sondergger, J. Sauer, and D. Lalanne, "Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions," in Proc. of IEEE International Conference and Workshop on Automatic Face and Gesture Recognition, pp. 1-8, Jul. 2013.