Browse > Article
http://dx.doi.org/10.9717/kmms.2020.23.12.1540

Human Activity Recognition Based on 3D Residual Dense Network  

Park, Jin-Ho (Dept. of Information and Communication Engineering, Tongmyong University)
Lee, Eung-Joo (Dept. of Information and Communication Engineering, Tongmyong University)
Publication Information
Abstract
Aiming at the problem that the existing human behavior recognition algorithm cannot fully utilize the multi-level spatio-temporal information of the network, a human behavior recognition algorithm based on a dense three-dimensional residual network is proposed. First, the proposed algorithm uses a dense block of three-dimensional residuals as the basic module of the network. The module extracts the hierarchical features of human behavior through densely connected convolutional layers; Secondly, the local feature aggregation adaptive method is used to learn the local dense features of human behavior; Then, the residual connection module is applied to promote the flow of feature information and reduced the difficulty of training; Finally, the multi-layer local feature extraction of the network is realized by cascading multiple three-dimensional residual dense blocks, and use the global feature aggregation adaptive method to learn the features of all network layers to realize human behavior recognition. A large number of experimental results on benchmark datasets KTH show that the recognition rate (top-l accuracy) of the proposed algorithm reaches 93.52%. Compared with the three-dimensional convolutional neural network (C3D) algorithm, it has improved by 3.93 percentage points. The proposed algorithm framework has good robustness and transfer learning ability, and can effectively handle a variety of video behavior recognition tasks.
Keywords
Human Activity Recognition; Video Classification; 3D Residual Dense Network; Deep Learning; Feature Fusion;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 C. Schuldt, I. Laptev, and B. Caputo, "Recognizing Human Actions: A Local SVM Approach," Proceedings of the 17th International Conference on Pattern Recognition, pp. 32-36, 2014.
2 P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie, "Behavior Recognition via Sparse Spatio-temporal Features," Proceedings of the 2015 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65-72, 2005.
3 G.W. Taylor, R. Fergus, Y. LeCun, and C. Bregler, "Convolutional Learning of Spatio-temporal Features," Proceedings of the 2010 European Conference on Computer Vision, pp. 140-153, 2010.
4 F. Zhu, L. Sha, J. Xie, and Y. Fang, "From Handcrafted to Learned Representations for Human Action Recognition: A Survey," Image and Vision Computing, Vol. 55, No. 1, pp. 42-52, 2016.   DOI
5 Y. Guangle, L. Tao, and Z. Jiandan, "A Review of Convolutional Neural Network Based Action Recognition," Pattern Recognition Letters, Vol. 118, No. 1, pp. 14-22, 2019.   DOI
6 K. Wang, "A Survey of Human Body Action Recognition," Pattern Recognition and Artificial Intelligence, Vol. 27, No. 1, pp. 35-48, 2014.
7 H. Kaiming, Z. Xiangyu, R. Shaoqing, and S. Jian, "Deep Residual Learning for Image Recognition," Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, 2016.
8 A. Krizhevsky, I. Sutskever, and G.E. Hinton, "Image-net Classification with Deep Convolutional Neural Networks," Proceedings of the 25th International Conference on Neural Information Processing Systems, pp. 1097-1105, 2012.
9 Very Deep Convolutional Networks for Large-scale Image Recognition(2015), https://arxiv.org/pdf/1409.1556.pdf (accessed April 10, 2015).
10 S. Christian, L. Wei, J. Yangqing, S. Pierre, R. Scott, A. Dragomir, et al., "Going Deeper with Convolutions," Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-9, 2015.
11 H. Gao, L. Zhuang, M. Laurens van der, and W.Q. Kilian, "Densely Connected Convolutional Networks," Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2261-2269, 2017.
12 D. Tran, H. Wang, L. Torresani, J. Ray, Y. LeCun, and M. Paluri, "A Closer Look at Spatio-temporal Convolutions for Action Recognition," Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6450-6459, 2018.
13 H. Zeyuan, P. Sange-yun, and L. Eung-joo, "Human Motion Recognition Based on Spatio-temporal Convolutional Neural Network," Journal of Korea Multimedia Society, Vol. 23, No. 8, pp. 977-985, 2020.   DOI
14 J. Shuiwang, X. Wei, Y. Ming, and Y. Kai, "3D Convolutional Neural Networks for Human Action Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, No. 1, pp. 221-231, 2013.   DOI
15 D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, "Learning Spatio-temporal Features with 3D Convolutional Networks," Proceedings of the 2015 IEEE International Conference on Computer Vision, pp. 4489-4497, 2015.
16 D. Tran, J. Ray, Z. Shou, S. Chang, and M. Paluri, "ConvNet Architecture Search for Spatio-temporal Feature Learning," Computer Vision and Pattern Recognition, Vol. 17, No. 8, pp. 65-77, 2017.
17 K. Hara, H. Kataoka, and Y. Satoh, "Can Spatio-temporal 3D CNNs Retrace the History of 2D CNNs and lmageNet?" Proceeding of The 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6546-6555, 2018.
18 C. Yunpeng, K. Yannis, L. Jianshu, Y. Shui cheng, and F. Jiashi, "Multi-fiber Networks for Video Recognition," Proceedings of the 2018 European Conference on Computer Vision, pp. 364-380, 2018.
19 A. Diba, M. Fayyaz, V. Sharma, M.M. Arzani, R. Yousefzadeh, J. Gall, et al., "Spatio-temporal Channel Correlation Networks for Action Classification," Proceedings of the 2018 European Conference on Computer Vision, pp. 284-299, 2018.
20 N. Hussein, E. Gavves, and A.W.M. Smeulders, "Timeception for Complex Action Recognition," Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, pp. 254-263, 2019.