Video Expression Recognition Method Based on Spatiotemporal Recurrent Neural Network and  Feature Fusion

Zhou, Xuan;

doi:10.3745/JIPS.01.0067

Journal of Information Processing Systems

Volume 17 Issue 2
/
Pages.337-351
/
2021
/
1976-913X(pISSN)
/
2092-805X(eISSN)

Korea Information Processing Society (한국정보처리학회)

DOI QR Code

Video Expression Recognition Method Based on Spatiotemporal Recurrent Neural Network and Feature Fusion

Zhou, Xuan (Dept. of Information Technology Center, Hangzhou Normal University Qianjiang College)

Received : 2020.04.09
Accepted : 2020.06.17
Published : 2021.04.30

https://doi.org/10.3745/JIPS.01.0067 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Automatically recognizing facial expressions in video sequences is a challenging task because there is little direct correlation between facial features and subjective emotions in video. To overcome the problem, a video facial expression recognition method using spatiotemporal recurrent neural network and feature fusion is proposed. Firstly, the video is preprocessed. Then, the double-layer cascade structure is used to detect a face in a video image. In addition, two deep convolutional neural networks are used to extract the time-domain and airspace facial features in the video. The spatial convolutional neural network is used to extract the spatial information features from each frame of the static expression images in the video. The temporal convolutional neural network is used to extract the dynamic information features from the optical flow information from multiple frames of expression images in the video. A multiplication fusion is performed with the spatiotemporal features learned by the two deep convolutional neural networks. Finally, the fused features are input to the support vector machine to realize the facial expression classification task. The experimental results on cNTERFACE, RML, and AFEW6.0 datasets show that the recognition rates obtained by the proposed method are as high as 88.67%, 70.32%, and 63.84%, respectively. Comparative experiments show that the proposed method obtains higher recognition accuracy than other recently reported methods.

Keywords

References

J. Li, Y. Mi, G. Li, and Z. Ju, "CNN-based facial expression recognition from annotated rgb-d images for human-robot interaction," International Journal of Humanoid Robotics, vol. 16, no. 4, article no. 1941002, 2019. https://doi.org/10.1142/S0219843619410020
M. U. Nagaral and T. H. Reddy, "Hybrid approach for facial expression recognition using HJDLBP and LBP histogram in video sequences," International Journal of Image, Graphics and Signal Processing, vol. 10, no. 2, pp. 1-9, 2018. https://doi.org/10.5815/ijigsp.2018.02.01
X. Fan, X. Yang, Q. Ye, and Y. Yang, "A discriminative dynamic framework for facial expression recognition in video sequences," Journal of Visual Communication and Image Representation, vol. 56, pp. 182-187, 2018. https://doi.org/10.1016/j.jvcir.2018.09.011
F. Ahmed and M. H. Kabir, "Facial expression recognition under difficult conditions: a comprehensive study on edge directional texture patterns," International Journal of Applied Mathematics and Computer Science, vol. 28, no. 2, pp. 399-409, 2018. http://dx.doi.org/10.2478/amcs-2018-0030
H. Yan, "Collaborative discriminative multi-metric learning for facial expression recognition in video," Pattern Recognition, vol. 75, pp. 33-40, 2018. https://doi.org/10.1016/j.patcog.2017.02.031
J. Zhao, X. Mao, and J. Zhang, "Learning deep facial expression features from image and optical flow sequences using 3D CNN," The Visual Computer, vol. 34, no. 10, pp. 1461-1475, 2018. https://doi.org/10.1007/s00371-018-1477-y
A. M. Shabat and J. R. Tapamo, "Angled local directional pattern for texture analysis with an application to facial expression recognition," IET Computer Vision, vol. 12, no. 5, pp. 603-608, 2018. https://doi.org/10.1049/iet-cvi.2017.0340
Z. Gong and H. Chen, "Sequential data classification by dynamic state warping," Knowledge and Information Systems, vol. 57, no. 3, pp. 545-570, 2018. https://doi.org/10.1007/s10115-017-1139-9
O. Yi, H. Tavafoghi, and D. Teneketzis, "Dynamic games with asymmetric information: common information based perfect Bayesian equilibria and sequential decomposition," IEEE Transactions on Automatic Control, vol. 62, no. 1, pp. 222-237, 2016. https://doi.org/10.1109/TAC.2016.2544936
L. H. Nguyen and J. A. Goulet, "Real-time anomaly detection with Bayesian dynamic linear models," Structural Control and Health Monitoring, vol. 26, no. 9, article no. e2404, 2019. https://doi.org/10.1002/stc.2404
E. Zangeneh and A. Moradi, "Facial expression recognition by using differential geometric features," The Imaging Science Journal, vol. 66, no. 8, pp. 463-470, 2018. https://doi.org/10.1080/13682199.2018.1509176
Z. Sun, Z. P. Hu, R. Chiong, M. Wang, and W. He, "Combining the kernel collaboration representation and deep subspace learning for facial expression recognition," Journal of Circuits, Systems and Computers, vol. 27, no. 8, article no. 1850121, 2018. https://doi.org/10.1142/S0218126618501219
A. Moeini, K. Faez, H. Moeini, and A. M. Safai, "Facial expression recognition using dual dictionary learning," Journal of Visual Communication and Image Representation, vol. 45, pp. 20-33, 2017. https://doi.org/10.1016/j.jvcir.2017.02.007
E. Owusu, J. D. Abdulai, and Y. Zhan, "Face detection based on multilayer feed-forward neural network and Haar features," Software: Practice and Experience, vol. 49, no. 1, pp. 120-129, 2019. https://doi.org/10.1002/spe.2646
N. Jain, S. Kumar, A. Kumar, P. Shamsolmoali, and M. Zareapoor, "Hybrid deep neural networks for face emotion recognition," Pattern Recognition Letters, vol. 115, pp. 101-106, 2018. https://doi.org/10.1016/j.patrec.2018.04.010
N. P. Gopalan and S. Bellamkonda, "Pattern averaging technique for facial expression recognition using support vector machines," IJ Image, Graphics and Signal Processing, vol. 9, 27-33, 2018. https://doi.org/10.5815/ijigsp.2018.09.04
M. S. Hossain and M. A. Yousuf, "Real time facial expression recognition for nonverbal communication," International Arab Journal of Information Technology, vol. 15, no. 2, pp. 278-288, 2018.
S. Yuan and X. Mao, "Exponential elastic preserving projections for facial expression recognition," Neurocomputing, vol. 275, pp. 711-724, 2018. https://doi.org/10.1016/j.neucom.2017.08.067
Y. Chen, J. Du, Q. Liu, L. Zhang, and Y. Zeng, Robust and energy-efficient expression recognition based on improved deep ResNets," Biomedical Engineering/Biomedizinische Technik, vol. 64, no. 5, pp. 519-528, 2019. https://doi.org/10.1515/bmt-2018-0027
F. Khan, "Facial expression recognition using facial landmark detection and feature extraction via neural networks," 2018 [Online]. Available: https://arxiv.org/abs/1812.04510
Y. Huang, Y. Yan, S. Chen, and H. Wang, "Expression-targeted feature learning for effective facial expression recognition," Journal of Visual Communication and Image Representation, vol. 55, pp. 677-687, 2018. https://doi.org/10.1016/j.jvcir.2018.08.002
X. Liu, Y. Ge, C. Yang, and P. Jia, "Adaptive metric learning with deep neural networks for video-based facial expression recognition," Journal of Electronic Imaging, vol. 27, no. 1, article no. 013022, 2008. https://doi.org/10.1117/1.JEI.27.1.013022
H. Li, J. Sun, Z. Xu, and L. Chen, "Multimodal 2D+ 3D facial expression recognition with deep fusion convolutional neural network," IEEE Transactions on Multimedia, vol. 19, no. 12, pp. 2816-2831, 2017. https://doi.org/10.1109/TMM.2017.2713408
Z. Yu, Q. Liu, and G. Liu, "Deeper cascaded peak-piloted network for weak expression recognition," The Visual Computer, vol. 34, no. 12, pp. 1691-1699, 2018. https://doi.org/10.1007/s00371-017-1443-0
H. Boughrara, M. Chtourou, C. B. Amar, and L. Chen, "MLP neural network using modified constructive training algorithm: application to face recognition," International Journal of Intelligent Systems Technologies and Applications, vol. 16, no. 1, pp. 53-79, 2017. https://doi.org/10.1504/IJISTA.2017.081316
Y. Zhou and N. Chen, "The LAP under facility disruptions during early post-earthquake rescue using PSO-GA hybrid algorithm," Fresenius Environmental Bulletin, vol. 28, no. 12 A, pp. 9906-9914, 2019.
J. Jian, Y. Guo, L. Jiang, Y. An, and J. Su, "A multi-objective optimization model for green supply chain considering environmental benefits," Sustainability, vol. 11, no. 21, article no. 5911, 2019. https://doi.org/10.3390/su11215911
N. Wang, M. J. Er, and M. Han, "Parsimonious extreme learning machine using recursive orthogonal least squares," IEEE Transactions on Neural Networks and Learning Systems, vol. 25, no. 10, pp. 1828-1841, 2014. https://doi.org/10.1109/TNNLS.2013.2296048
M. Li, X. Shi, X. Li, W. Ma, J. He, and T. Liu, "Epidemic forest: a spatiotemporal model for communicable diseases," Annals of the American Association of Geographers, vol. 109, no. 3, pp. 812-836, 2019. https://doi.org/10.1080/24694452.2018.1511413
S. Yu, H. Zhu, Z. Fu, and J. Wang, "Single image dehazing using multiple transmission layer fusion," Journal of Modern Optics, vol. 63, no. 6, pp. 519-535, 2016. https://doi.org/10.1080/09500340.2015.1083129

Journal of Information Processing Systems

Video Expression Recognition Method Based on Spatiotemporal Recurrent Neural Network and Feature Fusion

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)