A Framework for Facial Expression Recognition Combining Contextual Information and Attention Mechanism

Jianzeng Chen;Ningning Chen;

doi:10.3745/JIPS.01.0107

Journal of Information Processing Systems

Volume 20 Issue 4
/
Pages.535-549
/
2024
/
1976-913X(pISSN)
/
2092-805X(eISSN)

Korea Information Processing Society (한국정보처리학회)

DOI QR Code

A Framework for Facial Expression Recognition Combining Contextual Information and Attention Mechanism

Jianzeng Chen (School of Computer and Information Engineering, Nanchang Institute of Technology) ;
Ningning Chen (School of Computer and Information Engineering, Nanchang Institute of Technology)

Received : 2023.05.26
Accepted : 2023.12.09
Published : 2024.08.31

https://doi.org/10.3745/JIPS.01.0107 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

Facial expressions (FEs) serve as fundamental components for human emotion assessment and human-computer interaction. Traditional convolutional neural networks tend to overlook valuable information during the FE feature extraction, resulting in suboptimal recognition rates. To address this problem, we propose a deep learning framework that incorporates hierarchical feature fusion, contextual data, and an attention mechanism for precise FE recognition. In our approach, we leveraged an enhanced VGGNet16 as the backbone network and introduced an improved group convolutional channel attention (GCCA) module in each block to emphasize the crucial expression features. A partial decoder was added at the end of the backbone network to facilitate the fusion of multilevel features for a comprehensive feature map. A reverse attention mechanism guides the model to refine details layer-by-layer while introducing contextual information and extracting richer expression features. To enhance feature distinguishability, we employed islanding loss in combination with softmax loss, creating a joint loss function. Using two open datasets, our experimental results demonstrated the effectiveness of our framework. Our framework achieved an average accuracy rate of 74.08% on the FER2013 dataset and 98.66% on the CK+ dataset, outperforming advanced methods in both recognition accuracy and stability.

Keywords

Acknowledgement

This work was supported by Ministry of Education Industry-University Cooperative Education Fund Project (No. 22087166301756).

References

N. Samadiani, G. Huang, B. Cai, W. Luo, C. H. Chi, Y. Xiang, and J. He, "A review on automatic facial expression recognition systems assisted by multimodal sensor data," Sensors, vol. 19, no. 8, article no. 1863, 2019. https://doi.org/10.3390/s19081863
S. Li and W. Deng, "Deep facial expression recognition: a survey," IEEE Transactions on Affective Computing, vol. 13, no. 3, pp. 1195-1215, 2022. https://doi.org/10.1109/TAFFC.2020.2981446
A. John, M. C. Abhishek, A. S. Ajayan, S. Sanoop, and V. R. Kumar, "Real-time facial emotion recognition system with improved preprocessing and feature extraction," in Proceedings of 2020 3rd International Conference on Smart Systems and Inventive Technology (ICSSIT), Tirunelveli, India, 2020, pp. 1328-1333. https://doi.org/10.1109/ICSSIT48917.2020.9214207
S. Saeed, A. A. Shah, M. K. Ehsan, M. R. Amirzada, A. Mahmood, and T. Mezgebo, "Automated facial expression recognition framework using deep learning," Journal of Healthcare Engineering, vol. 2022, article no. 5707930, 2022. https://doi.org/10.1155/2022/5707930
M. S. Ejaz, M. R. Islam, M. Sifatullah, and A. Sarker, "Implementation of principal component analysis on masked and non-masked face recognition," in Proceedings of 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT), Dhaka, Bangladesh, 2019, pp. 1-5. https://doi.org/10.1109/ICASERT.2019.8934543
A. S. Rubel, A. A. Chowdhury, and M. H. Kabir, "Facial expression recognition using adaptive robust local complete pattern," in Proceedings of 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 2019, pp. 41-45. https://doi.org/10.1109/ICIP.2019.8802911
S. Ahmed, M. Frikha, T. D. H. Hussein, and J. Rahebi, "Optimum feature selection with particle swarm optimization to face recognition system using Gabor wavelet transform and deep learning," BioMed Research International, vol. 2021, article no. 6621540, 2021. https://doi.org/10.1155/2021/6621540
A. Barman and P. Dutta, "Facial expression recognition using distance and shape signature features," Pattern Recognition Letters, vol. 145, pp. 254-261, 2021. https://doi.org/10.1016/j.patrec.2017.06.018
S. Agarwal and D. P. Mukherjee, "Facial expression recognition through adaptive learning of local motion descriptor," Multimedia Tools and Applications, vol. 76, pp. 1073-1099, 2017. https://doi.org/10.1007/s11042-015-3103-6
W. Mellouk and W. Handouzi, "Facial emotion recognition using deep learning: review and insights," Procedia Computer Science, vol. 175, pp. 689-694, 2020. https://doi.org/10.1016/j.procs.2020.07.101
A. Mollahosseini, D. Chan, and M. H. Mahoor, "Going deeper in facial expression recognition using deep neural networks," in Proceedings of 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA, 2016, pp. 1-10. https://doi.org/10.1109/WACV.2016.7477450
H. Ding, S. K. Zhou, and R. Chellappa, "FaceNet2ExpNet: regularizing a deep face recognition net for expression recognition," in Proceedings of 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), Washington, DC, USA, 2017, pp. 118-126. https://doi.org/10.1109/FG.2017.23
H. W. Ng, V. D. Nguyen, V. Vonikakis, and S. Winkler, "Deep learning for emotion recognition on small datasets using transfer learning," in Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, Seattle, WA, USA, 2015, pp. 443-449. https://doi.org/10.1145/2818346.2830593
M. Verma, H. Kobori, Y. Nakashima, N. Takemura, and H. Nagahara, "Facial expression recognition with skip-connection to leverage low-level features," in Proceedings of 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 2019, pp. 51-55. https://doi.org/10.1109/ICIP.2019.8803396
Y. Liu, W. Dai, F. Fang, Y. Chen, R. Huang, R. Wang, and B. Wan, "Dynamic multi-channel metric network for joint pose-aware and identity-invariant facial expression recognition," Information Sciences, vol. 578, pp. 195-213, 2021. https://doi.org/10.1016/j.ins.2021.07.034
Z. Niu, G. Zhong, and H. Yu, "A review on the attention mechanism of deep learning," Neurocomputing, vol. 452, pp. 48-62, 2021. https://doi.org/10.1016/j.neucom.2021.03.091
J. Hu, L. Shen, and G. Sun, "Squeeze-and-excitation networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 7132-7141.
S. Woo, J. Park, J. Y. Lee, and I. S. Kweon, "CBAM: convolutional block attention module" in Computer Vision - ECCV 2018. Cham, Switzerland: Springer, 2018, pp. 3-19. https://doi.org/10.1007/978-3-030-01234-2_1
S. Rathi and S. Bamal, "Deep learning and globally guided image filtering technique based image dehazing and enhancement," International Journal of Technical Research & Science, vol. 2020(Special Issue), pp. 60- 68, 2020. https://doi.org/10.30780/specialissue-ICACCG2020/043
T. Jia, J. Li, L. Zhuo, and T. Yu, "Semi-supervised single-image dehazing network via disentangled metaknowledge," IEEE Transactions on Multimedia, vol. 26, pp. 2634-2647, 2023. https://doi.org/10.1109/TMM.2023.3301273
X. Zhang, J. Li, and Z. Hua, "MFFE: multi-scale feature fusion enhanced net for image dehazing," Signal Processing: Image Communication, vol. 105, article no. 116719, 2022. https://doi.org/10.1016/j.image.2022.116719
T. Zhang, G. J. Qi, B. Xiao, and J. Wang, "Interleaved group convolutions," in Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 2017, pp. 4373-4382. https://doi.org/10.1109/ICCV.2017.469
Z. Wu, L. Su, and Q. Huang, "Cascaded partial decoder for fast and accurate salient object detection," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 3907-3916. https://doi.org/10.1109/CVPR.2019.00403
S. Chen, X. Tan, B. Wang, and X. Hu, "Reverse attention for salient object detection," in Computer Vision - ECCV 2018. Cham, Switzerland: Springer, 2018, pp. 234-250. https://doi.org/10.1007/978-3-030-01240-3_15
J. Cai, Z. Meng, A. S. Khan, Z. Li, J. O'Reilly, and Y. Tong, "Island loss for learning discriminative features in facial expression recognition," in Proceedings of 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), Xi'an, China, 2018, pp. 302-309. https://doi.org/10.1109/FG.2018.00051
P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews, "The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression," in Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, San Francisco, CA, USA, 2010, pp. 94-101. https://doi.org/10.1109/CVPRW.2010.5543262
I. J. Goodfellow, D. Erhan, P. L. Carrier, A. Courville, M. Mirza, B. Hamner, et al., "Challenges in representation learning: a report on three machine learning contests," in Neural Information Processing. Heidelberg, Germany: Springer, 2013, pp. 117-124. https://doi.org/10.1007/978-3-642-42051-1_16
P. Jiang, G. Liu, Q. Wang, and J. Wu, "Accurate and reliable facial expression recognition using advanced softmax loss with fixed weights," IEEE Signal Processing Letters, vol. 27, pp. 725-729, 2020. https://doi.org/10.1109/LSP.2020.2989670