DOI QR코드

DOI QR Code

A Framework for Facial Expression Recognition Combining Contextual Information and Attention Mechanism

  • Jianzeng Chen (School of Computer and Information Engineering, Nanchang Institute of Technology) ;
  • Ningning Chen (School of Computer and Information Engineering, Nanchang Institute of Technology)
  • Received : 2023.05.26
  • Accepted : 2023.12.09
  • Published : 2024.08.31

Abstract

Facial expressions (FEs) serve as fundamental components for human emotion assessment and human-computer interaction. Traditional convolutional neural networks tend to overlook valuable information during the FE feature extraction, resulting in suboptimal recognition rates. To address this problem, we propose a deep learning framework that incorporates hierarchical feature fusion, contextual data, and an attention mechanism for precise FE recognition. In our approach, we leveraged an enhanced VGGNet16 as the backbone network and introduced an improved group convolutional channel attention (GCCA) module in each block to emphasize the crucial expression features. A partial decoder was added at the end of the backbone network to facilitate the fusion of multilevel features for a comprehensive feature map. A reverse attention mechanism guides the model to refine details layer-by-layer while introducing contextual information and extracting richer expression features. To enhance feature distinguishability, we employed islanding loss in combination with softmax loss, creating a joint loss function. Using two open datasets, our experimental results demonstrated the effectiveness of our framework. Our framework achieved an average accuracy rate of 74.08% on the FER2013 dataset and 98.66% on the CK+ dataset, outperforming advanced methods in both recognition accuracy and stability.

Keywords

Acknowledgement

This work was supported by Ministry of Education Industry-University Cooperative Education Fund Project (No. 22087166301756).

References

  1. N. Samadiani, G. Huang, B. Cai, W. Luo, C. H. Chi, Y. Xiang, and J. He, "A review on automatic facial expression recognition systems assisted by multimodal sensor data," Sensors, vol. 19, no. 8, article no. 1863, 2019. https://doi.org/10.3390/s19081863 
  2. S. Li and W. Deng, "Deep facial expression recognition: a survey," IEEE Transactions on Affective Computing, vol. 13, no. 3, pp. 1195-1215, 2022. https://doi.org/10.1109/TAFFC.2020.2981446 
  3. A. John, M. C. Abhishek, A. S. Ajayan, S. Sanoop, and V. R. Kumar, "Real-time facial emotion recognition system with improved preprocessing and feature extraction," in Proceedings of 2020 3rd International Conference on Smart Systems and Inventive Technology (ICSSIT), Tirunelveli, India, 2020, pp. 1328-1333. https://doi.org/10.1109/ICSSIT48917.2020.9214207 
  4. S. Saeed, A. A. Shah, M. K. Ehsan, M. R. Amirzada, A. Mahmood, and T. Mezgebo, "Automated facial expression recognition framework using deep learning," Journal of Healthcare Engineering, vol. 2022, article no. 5707930, 2022. https://doi.org/10.1155/2022/5707930 
  5. M. S. Ejaz, M. R. Islam, M. Sifatullah, and A. Sarker, "Implementation of principal component analysis on masked and non-masked face recognition," in Proceedings of 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT), Dhaka, Bangladesh, 2019, pp. 1-5. https://doi.org/10.1109/ICASERT.2019.8934543 
  6. A. S. Rubel, A. A. Chowdhury, and M. H. Kabir, "Facial expression recognition using adaptive robust local complete pattern," in Proceedings of 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 2019, pp. 41-45. https://doi.org/10.1109/ICIP.2019.8802911 
  7. S. Ahmed, M. Frikha, T. D. H. Hussein, and J. Rahebi, "Optimum feature selection with particle swarm optimization to face recognition system using Gabor wavelet transform and deep learning," BioMed Research International, vol. 2021, article no. 6621540, 2021. https://doi.org/10.1155/2021/6621540 
  8. A. Barman and P. Dutta, "Facial expression recognition using distance and shape signature features," Pattern Recognition Letters, vol. 145, pp. 254-261, 2021. https://doi.org/10.1016/j.patrec.2017.06.018 
  9. S. Agarwal and D. P. Mukherjee, "Facial expression recognition through adaptive learning of local motion descriptor," Multimedia Tools and Applications, vol. 76, pp. 1073-1099, 2017. https://doi.org/10.1007/s11042-015-3103-6 
  10. W. Mellouk and W. Handouzi, "Facial emotion recognition using deep learning: review and insights," Procedia Computer Science, vol. 175, pp. 689-694, 2020. https://doi.org/10.1016/j.procs.2020.07.101 
  11. A. Mollahosseini, D. Chan, and M. H. Mahoor, "Going deeper in facial expression recognition using deep neural networks," in Proceedings of 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA, 2016, pp. 1-10. https://doi.org/10.1109/WACV.2016.7477450 
  12. H. Ding, S. K. Zhou, and R. Chellappa, "FaceNet2ExpNet: regularizing a deep face recognition net for expression recognition," in Proceedings of 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), Washington, DC, USA, 2017, pp. 118-126. https://doi.org/10.1109/FG.2017.23 
  13. H. W. Ng, V. D. Nguyen, V. Vonikakis, and S. Winkler, "Deep learning for emotion recognition on small datasets using transfer learning," in Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, Seattle, WA, USA, 2015, pp. 443-449. https://doi.org/10.1145/2818346.2830593 
  14. M. Verma, H. Kobori, Y. Nakashima, N. Takemura, and H. Nagahara, "Facial expression recognition with skip-connection to leverage low-level features," in Proceedings of 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 2019, pp. 51-55. https://doi.org/10.1109/ICIP.2019.8803396 
  15. Y. Liu, W. Dai, F. Fang, Y. Chen, R. Huang, R. Wang, and B. Wan, "Dynamic multi-channel metric network for joint pose-aware and identity-invariant facial expression recognition," Information Sciences, vol. 578, pp. 195-213, 2021. https://doi.org/10.1016/j.ins.2021.07.034 
  16. Z. Niu, G. Zhong, and H. Yu, "A review on the attention mechanism of deep learning," Neurocomputing, vol. 452, pp. 48-62, 2021. https://doi.org/10.1016/j.neucom.2021.03.091 
  17. J. Hu, L. Shen, and G. Sun, "Squeeze-and-excitation networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 7132-7141. 
  18. S. Woo, J. Park, J. Y. Lee, and I. S. Kweon, "CBAM: convolutional block attention module" in Computer Vision - ECCV 2018. Cham, Switzerland: Springer, 2018, pp. 3-19. https://doi.org/10.1007/978-3-030-01234-2_1
  19. S. Rathi and S. Bamal, "Deep learning and globally guided image filtering technique based image dehazing and enhancement," International Journal of Technical Research & Science, vol. 2020(Special Issue), pp. 60- 68, 2020. https://doi.org/10.30780/specialissue-ICACCG2020/043 
  20. T. Jia, J. Li, L. Zhuo, and T. Yu, "Semi-supervised single-image dehazing network via disentangled metaknowledge," IEEE Transactions on Multimedia, vol. 26, pp. 2634-2647, 2023. https://doi.org/10.1109/TMM.2023.3301273 
  21. X. Zhang, J. Li, and Z. Hua, "MFFE: multi-scale feature fusion enhanced net for image dehazing," Signal Processing: Image Communication, vol. 105, article no. 116719, 2022. https://doi.org/10.1016/j.image.2022.116719 
  22. T. Zhang, G. J. Qi, B. Xiao, and J. Wang, "Interleaved group convolutions," in Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 2017, pp. 4373-4382. https://doi.org/10.1109/ICCV.2017.469 
  23. Z. Wu, L. Su, and Q. Huang, "Cascaded partial decoder for fast and accurate salient object detection," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 3907-3916. https://doi.org/10.1109/CVPR.2019.00403 
  24. S. Chen, X. Tan, B. Wang, and X. Hu, "Reverse attention for salient object detection," in Computer Vision - ECCV 2018. Cham, Switzerland: Springer, 2018, pp. 234-250. https://doi.org/10.1007/978-3-030-01240-3_15 
  25. J. Cai, Z. Meng, A. S. Khan, Z. Li, J. O'Reilly, and Y. Tong, "Island loss for learning discriminative features in facial expression recognition," in Proceedings of 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), Xi'an, China, 2018, pp. 302-309. https://doi.org/10.1109/FG.2018.00051 
  26. P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews, "The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression," in Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, San Francisco, CA, USA, 2010, pp. 94-101. https://doi.org/10.1109/CVPRW.2010.5543262 
  27. I. J. Goodfellow, D. Erhan, P. L. Carrier, A. Courville, M. Mirza, B. Hamner, et al., "Challenges in representation learning: a report on three machine learning contests," in Neural Information Processing. Heidelberg, Germany: Springer, 2013, pp. 117-124. https://doi.org/10.1007/978-3-642-42051-1_16 
  28. P. Jiang, G. Liu, Q. Wang, and J. Wu, "Accurate and reliable facial expression recognition using advanced softmax loss with fixed weights," IEEE Signal Processing Letters, vol. 27, pp. 725-729, 2020. https://doi.org/10.1109/LSP.2020.2989670