DOI QR코드

DOI QR Code

Emotion Recognition Implementation with Multimodalities of Face, Voice and EEG

  • Udurume, Miracle (Department of Aeronautics Mechanical and Electronic Convergence Engineering, Kumoh National Institute of Technology) ;
  • Caliwag, Angela (Department of Aeronautics Mechanical and Electronic Convergence Engineering, Kumoh National Institute of Technology) ;
  • Lim, Wansu (Department of Aeronautics Mechanical and Electronic Convergence Engineering, Kumoh National Institute of Technology) ;
  • Kim, Gwigon (Department of Business Administration, Kumoh National Institute of Technology)
  • Received : 2022.04.21
  • Accepted : 2022.09.08
  • Published : 2022.09.30

Abstract

Emotion recognition is an essential component of complete interaction between human and machine. The issues related to emotion recognition are a result of the different types of emotions expressed in several forms such as visual, sound, and physiological signal. Recent advancements in the field show that combined modalities, such as visual, voice and electroencephalography signals, lead to better result compared to the use of single modalities separately. Previous studies have explored the use of multiple modalities for accurate predictions of emotion; however the number of studies regarding real-time implementation is limited because of the difficulty in simultaneously implementing multiple modalities of emotion recognition. In this study, we proposed an emotion recognition system for real-time emotion recognition implementation. Our model was built with a multithreading block that enables the implementation of each modality using separate threads for continuous synchronization. First, we separately achieved emotion recognition for each modality before enabling the use of the multithreaded system. To verify the correctness of the results, we compared the performance accuracy of unimodal and multimodal emotion recognitions in real-time. The experimental results showed real-time user emotion recognition of the proposed model. In addition, the effectiveness of the multimodalities for emotion recognition was observed. Our multimodal model was able to obtain an accuracy of 80.1% as compared to the unimodality, which obtained accuracies of 70.9, 54.3, and 63.1%.

Keywords

Acknowledgement

The work reported in this paper was conducted during the sabbatical year of Kumoh National Institute of Technology in 2019

References

  1. J. Zhao, X. Mao, and L. Chen, "Speech emotion recognition using deep 1D & 2D CNN LSTM networks," Biomedical Signal Processing and Control, vol. 47, pp. 312-323, Jan.2019. DOI: 10.1016/j.bspc.2018.08.035.
  2. M. Liu and J. Tang, "Audio and video bimodal emotion recognition in social networks based on improved alexnet network and attention mechanism," Journal of Information Processing Systems, vol. 17, pp. 754-771, Aug. 2021. DOI: 10.3745/JIPS.02.0161.
  3. J. N. Njoku, A. C. Caliwag, W. Lim, S. Kim, H. Hwang, and J. Jung, "Deep learning based data fusion methods for multimodal emotion recognition," The Journal of Korean Institute of Communications and Information Sciences, vol. 47, no. 1, pp. 79-87, Jan. 2022. DOI: 10.7840/kics.2022.47.1.79.
  4. Q. Ji, Z. Zhu, and P. Lan, "Real-time nonintrusive monitoring and prediction of driver fatigue," IEEE Transactions on Vehicular Technology, vol. 53, no. 4, pp. 1052-1068, Jul. 2004. DOI: 10.1109/TVT.2004.830974.
  5. H. Zhao, Z. Wang, S. Qiu, J. Wang, F. Xu, Z. Wang, and Y. Shen, "Adaptive gait detection based on foot-mounted inertial sensors and multi-sensor fusion," Information Fusion, vol. 52, pp. 157-166, Dec. 2019. DOI: 10.1016/j.inffus.2019.03.002.
  6. J. Gratch and S. Marsella, "Evaluating a computational model of emotion," Autonomous Agents and Multi-Agent Systems, vol. 11, no. 1, pp. 23-43, 2005. DOI: 10.1007/s10458-005-1081-1.
  7. N. Cudlenco, N. Popescu, and M. Leordeanu, "Reading into the mind's eye: Boosting automatic visual recognition with EEG signals," Neurocomputing, vol. 386, pp. 281-292, 2020. DOI: 10.1016/j.neucom.2019.12.076.
  8. O. Kwon, I. Jang, C. Ahn, and H. G. Kang, "An effective style token weight control technique for end-to-end emotional speech synthesis," IEEE Signal Processing Letters, vol. 26, no. 9, pp. 1383-1387, Jul. 2019. DOI: 10.1109/LSP.2019.2931673.
  9. Wei, Wei & Yongli, Feng & Chen, Gang & Chu, Ming, "Multi-modal facial expression feature based on deep-neural networks," Journal on Multimodal User Interfaces, vol. 14, pp. 17-23, 2020. DOI: 10.1007/s12193-019-00308-9.
  10. Y. Tian, J. Cheng, Y. Li, and S. Wang, "Secondary information aware facial expression recognition," IEEE Signal Processing Letters, vol. 26, no. 12, pp. 1753-1757, Dec. 2019. DOI: 10.1109/LSP.2019.2942138.
  11. G. Castellano, L. Kessous, and G. Caridakis, "Emotion recognition through multiple modalities: Face, body gesture, speech," in Affect and Emotion in Human-Computer Interaction, Lecture Notes in Computer Science, pp. 92-103. 2008. DOI: 10.1007/978-3-540-85099-1_8.
  12. Y. Ma, Y. Hao, M. Chen, J. Chen, P. Lu, and A. Kosir, "Audio-visual emotion fusion (AVEF): A deep efficient weighted approach," Information Fusion, vol. 46, pp. 184-192, Mar. 2019. DOI: 10.1016/j.inffus.2018.06.003.
  13. C. Busso, Z. Deng, S. Yildirim, M. Bulut, C. M. Lee, A. Kazemzadeh, S. Lee, U. Neumann, and S. Narayanan, "Analysis of emotion recognition using facial expressions, speech and multimodal information," in Proceeding of ACM 6th International Conference on Multimodal Interfaces, New York: NY, USA, pp. 205-211, 2004. DOI: 10.1145/1027933.1027968.
  14. C. Guanghui and Z. Xiaoping, "Multi-modal emotion recognition by fusing correlation features of speech-visual," IEEE Signal Processing Letters, vol. 28, pp. 533-537, 2021. DOI: 10.1109/LSP.2021.3055755.
  15. B. Xing, H. Zhang, K. Zhang, L. Zhang, X. Wu, X. Shi, S. Yu, and S. Zhang, "Exploiting EEG signals and audiovisual feature fusion for video emotion recognition," IEEE Access, vol. 7, pp. 59844-59861, May. 2019. DOI:10.1109/ACCESS.2019.2914872.
  16. E. Perez, I. Cervantes, E. Duran, G. Bustamante, J. Dizon, Y. Chnag, and H. Lin, "Feature extraction and signal processing of open-source brain-computer interface," in Proceedings of 2nd Annual Undergraduate Research Expo, Dallas: TX, USA, 2016.
  17. C. Y. Park, N. Cha, S. Kang, A. Kim, A. H. Khandoker, L. Hadjileontiadis, A. Oh, and U. Lee, "K-EmoCon, a multimodal sensor dataset for continuous emotion recognition in naturalistic conversation," Scientific Data, vol. 7, pp. 293, Sep. 2020. DOI: 10.1038/s41597-020-00630-y.