Browse > Article
http://dx.doi.org/10.9717/kmms.2020.23.8.977

Human Motion Recognition Based on Spatio-temporal Convolutional Neural Network  

Hu, Zeyuan (Dept. of Information Communication Engineering, Tongmyong University)
Park, Sange-yun (Academic Affairs and Register Team, Silla University)
Lee, Eung-Joo (Dept. of Information Communication Engineering, Tongmyong University)
Publication Information
Abstract
Aiming at the problem of complex feature extraction and low accuracy in human action recognition, this paper proposed a network structure combining batch normalization algorithm with GoogLeNet network model. Applying Batch Normalization idea in the field of image classification to action recognition field, it improved the algorithm by normalizing the network input training sample by mini-batch. For convolutional network, RGB image was the spatial input, and stacked optical flows was the temporal input. Then, it fused the spatio-temporal networks to get the final action recognition result. It trained and evaluated the architecture on the standard video actions benchmarks of UCF101 and HMDB51, which achieved the accuracy of 93.42% and 67.82%. The results show that the improved convolutional neural network has a significant improvement in improving the recognition rate and has obvious advantages in action recognition.
Keywords
Convolutional Neural Network; Deep Learning; Human Activity Recognition; Spatio-temporal Convolutional Neural Network;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 S.J. Song, C.L. Lan, J.L. Xing, W.J. Zeng, and J.Y. Liu, "An End-to-end Spatio-temporal Attention Model for Human Action Recognition from Skeleton Data," Proceeding of American Association for Artificial Intelligence Conference on Artificial Intelligence, pp. 4263-4270, 2017.
2 A. Iosifidis, A. Tefas, and I. Pitas, “Human Action Recognition Based on Multi-view Regularized Extreme Learning Machine,” Journal of Artificial Intelligence Tools, Vol. 24, No. 5, pp. 1-11, 2015.
3 J. Wang, Z.C. Liu, Y. Wu, and J.S. Yuan, "Mining Actionlet Ensemble for Action Recognition with Depth Cameras," Proceeding of 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1290-1297, 2012.
4 N. Zhang and E.J. Lee, “Human Action Recognition Based on an Improved Combined Feature Representation,” Journal of Korea Multimedia Society, Vol. 21, No. 12, pp. 1473-1480, 2018.   DOI
5 Z.Y. Hu, S.Y. Park and E.J Lee, "Human Action Recognition Based on Convolutional Neural Network," Proceeding of 2018 Conference on Korea Multimedia Society, pp. 233-235, 2018.
6 A. Kovashka and K. Grauman, "Learning a Hierarchy of Discriminative Space-time Neighborhood Features for Human Action Recognition," Proceeding of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2046-2053, 2010.
7 M.E. Hussein, M. Torki, M.A. Gowayyed, and M.E. Saban, "Human Action Recognition Using a Temporal Hierarchy of Covariance Descriptors on 3D Joint Locations," Proceeding of the Twenty-third International J oint Conference on Artificial Intelligence, pp. 2466-2472, 2013.
8 J. Shotton, T. Sharp, A. Kipman, A. Fitzgibbon, M. Finocchio, and A. Blake, “Real-time Human Pose Recognition in Parts from Single Depth Images,” Journal of Communications of the ACM, Vol. 56, No. 1, pp. 116-124, 2013.   DOI
9 A. Kovashka and K. Grauman, "Learning a Hierarchy of Discriminative Space-time Neighborhood Features for Human Action Recognition," Proceeding of Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2046-2053, 2010.
10 L. Xia, C.C. Chen, and J.K. Aggarwal, "View Invariant Human Action Recognition Using Histograms of 3D Joints," Proceeding of Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 20-27, 2012.
11 M.E. Hussein, M. Torki, M.A. Gowayyed, and M.E. Saban, "Human Action Recognition Using a Temporal Hierarchy of Covariance Descriptors on 3D Joint Locations," Proceeding of International Joint Conference on Artificial Intelligence, pp. 2466-2472, 2013.
12 M.Y. Liu, H. Liu, and C. Chen, “Enhanced Skeleton Visualization for View Invariant Human Action Recognition,” Journal of Pattern Recognition, Vol. 68, No. 1, pp. 346-362, 2017.   DOI
13 Y.L. Song, D. Demirdjian, and R. Davis, “Continuous Body and Hand Gesture Recognition for Natural Human-computer Interaction,” Journal of ACM Transactions on Interactive Intelligent Systems, Vol. 2, No. 1, pp. 4212-4216, 2012.
14 T. Santini, W. Fuhl, and E. Kasneci, "CalibMe: Fast and Unsupervised Eye Tracker Calibration for Gaze-based Pervasive Human-computer Interaction," Proceeding of the 2017 CHI Conference on Human Factors in Computing Systems, pp. 2594-2605, 2017.
15 T. Du, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, "Learning Spatiotemporal Features with 3D Convolutional Networks," Proceeding of the IEEE International Conference on Computer Vision, pp. 4489-4497, 2015.
16 K. Simonyan and A. Zisserman, "Two-stream Convolutional Networks for Action Recognition in Videos," Proceeding of the 27th International Conference on Neural Information Processing Systems, pp. 568-576, 2014.
17 K. He, X.Y. Zhang, S.Q. Ren, and J. Sun, “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,” Journal of IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 37, No. 9, pp. 1904-1916, 2015.   DOI
18 K. Wang, X.L. Wang, L. Lin, M. Wang, and W. Zuo, "3D Human Activity Recognition with Reconfigurable Convolutional Neural Networks," Proceeding of the 22nd ACM International Conference on Multimedia, pp. 97-106, 2014.
19 L.M. Wang, Y.J. Xiong, Z. Wang, and Y. Qiao, "Towards Good Practices for Very Deep Two-stream ConvNets," Journal of Computer Science-Computer Vision and Pattern Recognition, Vol. abs/1507.02159, pp. 1-5, 2015.
20 S.W. Ji, W. Xu, M. Yang, and K. Yu, “3D Convolutional Neural Networks for Human Action Recognition,” Journal of IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, No. 1, pp. 221-231, 2013.   DOI
21 K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-scale Image Recognition," Journal of Computer Science, Vol. abs/1409.1556, pp. 1-14, 2015.
22 W. Zhu, J. Hu, G. Sun, X. Cao, and Y. Qiao, "A Key Volume Mining Deep Framework for Action Recognition," Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1991-1999, 2016.
23 L. Hou, D. Samaras, T.M. Kurc, Y. Gao, J.E. Davis, and J.H. Saltz, "Patch-based Convolutional Neural Network for Whole Slide Tissue Image Classification," Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2424-2433, 2016.
24 Y.K. Meena, H. Cecotti, K.W. Lin, A. Dutta, and G. Prasad, “Toward Optimization of Gaze-controlled Human-computer Interaction: Application to Hindi Virtual Keyboard for Stroke Patients,” Journal of IEEE Transactions on Neural Systems and Rehabilitation Engineering, Vol. 26, No. 4, pp. 911-922, 2018.   DOI
25 L.L. Chen, H. Wei, and J. Ferryman, “A Survey of Human Motion Analysis Using Depth Imagery,” Journal of Pattern Recognition Letters, Vol. 34, No. 15, pp. 1995-2006, 2013.   DOI
26 G.P. Dominguez, B. Taati, and A. Mihailidis, "3D Human Motion Analysis to Detect Abnormal Events on Stairs," Proceeding of 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization, and Transmission, pp. 97-103, 2012.
27 C.B. Jin, S.Z. Li, T.D. Do, and H. Kim, "Real- Time Human Action Recognition Using CNN over Temporal Images for Static Video Surveillance Cameras," Proceeding of Pacific Rim Conference on Multimedia of Advances in Multimedia Information Precess, pp. 330-339, 2015.
28 B.Y. Wang, Y.L. Hu, J.B. Gao, Y.F. Sun, and B.C. Yin, “Laplacian LRR on Product Grassmann Manifolds for Human Activity Clustering in Multicamera Video Surveillance,” Journal of IEEE Transactions on Circuits and Systems for Video Technology, Vol. 27, No. 3, pp. 554-566, 2017.   DOI
29 S. Sladojevic, M. Arsenovic, A. Anderla, D. Culibrk, and D. Stefanovic, “Deep Neural Networks Based Recognition of Plant Diseases by Leaf Image Classification,” Journal of Computational Intelligence and Neuroscience, Vol. 2016, No. 6, pp. 1-11, 2016.