Browse > Article
http://dx.doi.org/10.12673/jant.2020.24.2.155

Deep Learning-based Action Recognition using Skeleton Joints Mapping  

Tasnim, Nusrat (School of Electronics and Information Engineering, Korea Aerospace University)
Baek, Joong-Hwan (School of Electronics and Information Engineering, Korea Aerospace University)
Abstract
Recently, with the development of computer vision and deep learning technology, research on human action recognition has been actively conducted for video analysis, video surveillance, interactive multimedia, and human machine interaction applications. Diverse techniques have been introduced for human action understanding and classification by many researchers using RGB image, depth image, skeleton and inertial data. However, skeleton-based action discrimination is still a challenging research topic for human machine-interaction. In this paper, we propose an end-to-end skeleton joints mapping of action for generating spatio-temporal image so-called dynamic image. Then, an efficient deep convolution neural network is devised to perform the classification among the action classes. We use publicly accessible UTD-MHAD skeleton dataset for evaluating the performance of the proposed method. As a result of the experiment, the proposed system shows better performance than the existing methods with high accuracy of 97.45%.
Keywords
Action recognition; Deep learning; CNN; End-to-end skeleton joints mapping;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Y. Du, W. Wang, and L. Wang, "Hierarchical recurrent neural network for skeleton based action recognition," in IEEE Conference on Computer Vision and Pattern Recognition, Boston: MC, pp. 1110-1118, 2015.
2 X. Yang and Y. Tian, "Super normal vector for activity recognition using depth sequences," in IEEE Conference on Computer Vision and Pattern Recognition, Columbus: OH, pp. 804-811, 2014.
3 V. S. Kulkarni, and S. D. Lokhande, "Appearance based recognition of american sign language using gesture segmentation," International Journal on Computer Science and Engineering, No. 3, pp. 560-565, 2010.
4 P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie, "Behavior recognition via sparse spatio-temporal features," in IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, Breckenridge: CO, pp. 65-72, 2005.
5 D. Wu, and L. Shao, “Silhouette analysis-based action recognition via exploiting human poses,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 23, No. 2, pp. 236-243, 2012.   DOI
6 M. Ahmad, and S. W Lee, "HMM-based human action recognition using multiview image sequences," in 18th International Conference on Pattern Recognition (ICPR'06), Hong Kong, pp. 263-266, 2006.
7 L. Xia, C. C. Chen, and J. K. Aggarwal, "View invariant human action recognition using histograms of 3d joints," in 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence: IR, pp. 20-27, 2012.
8 J. Luo, W. Wang, and H. Qi, "Spatio-temporal feature extraction and representation for RGB-D human action recognition," Pattern Recognition Letters, Vol. 50, pp. 139-148, 2014.   DOI
9 V. Megavannan, B. Agarwal, and R. V. Babu, "Human action recognition using depth maps," in 2012 International Conference on Signal Processing and Communications (SPCOM), Piscataway: NJ, pp. 1-5, 2012.
10 J. Trelinski, and B. Kwolek, "Convolutional neural network-based action recognition on depth maps," in International Conference on Computer Vision and Graphics, Warsaw: Poland, pp. 209-221, 2018.
11 M. E. Hussein, M. Torki, M. A. Gowayyed, and M. El-Saban, "Human action recognition using a temporal hierarchy of covariance descriptors on 3d joint locations," in Twenty-Third International Joint Conference on Artificial Intelligence, Beijing: China, pp. 2466-2472, 2013.
12 P. Wang, W. Li, Z. Gao, J. Zhang, C. Tang, and P.O. Ogunbona, “Action recognition from depth maps using deep convolutional neural networks,” IEEE Transactions on Human-Machine Systems, Vol. 46, No. 4, pp. 498-509, 2015.   DOI
13 K. Simonyan and A. Zisserman, "Two-stream convolutional networks for action recognition in videos," in Advances in Neural Information Processing Systems, Montreal: Canada, pp. 568-576, 2014.
14 C. Li, Q. Zhong, D. Xie, and S. Pu, "Skeleton-based action recognition with convolutional neural networks," in 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), Hong Kong, pp. 597-600, 2017.
15 Y. Du, Y. Fu, and L. Wang, "Skeleton based action recognition with convolutional neural network," IEEE 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), Kuala Lumpur: Malaysia, pp. 579-583, 2015.
16 P. Wang, Z. Li, Y. Hou, and W. Li, "Action recognition based on joint trajectory maps using convolutional neural networks," in Proceedings of the 24th ACM International Conference on ACM Multimedia, Amsterdam: Netherlands, pp. 102-106, 2016.
17 Y. Hou, Z. Li, P. Wang, and W. Li, “Skeleton optical spectra-based action recognition using convolutional neural networks,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 28, No. 3, pp. 807-811, 2016.   DOI
18 C. Li, Y. Hou, P. Wang, and W. Li, “Joint distance maps-based action recognition with convolutional neural networks,” IEEE Signal Processing Letters, Vol. 24, No. 5, pp. 624-628, 2017.   DOI
19 UTD-MHAD skeleton dataset, University of Texas at Dalas, [Internet]. Available: https://personal.utdallas.edu/-kehtar/UTD-MHAD.html
20 J. Imran, and B. Raman, "Evaluating fusion of RGB-D and inertial sensors for multimodal human action recognition," Journal of Ambient Intelligence and Humanized Computing, pp. 1-20, 2019.
21 C. Shorten, and TM. Khoshgoftaar, “A survey on image data augmentation for deep learning,” Journal of Big Data, Vol. 6, No. 1, pp. 60, 2019.   DOI
22 A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," in Advances in Neural Information Processing Systems, Lake Tahoe: NV, pp. 1097-1105, 2012.