Classification of human actions using 3D skeleton data: A performance comparison between classical machine learning and deep learning models

Juhwan Kim;Jongchan Kim;Sungim Lee;

doi:10.5351/KJAS.2024.37.5.643

The Korean Journal of Applied Statistics (응용통계연구)

Volume 37 Issue 5
/
Pages.643-661
/
2024
/
1225-066X(pISSN)
/
2383-5818(eISSN)

The Korean Statistical Society (한국통계학회)

DOI QR Code

Classification of human actions using 3D skeleton data: A performance comparison between classical machine learning and deep learning models

스켈레톤 데이터에 기반한 동작 분류: 고전적인 머신러닝과 딥러닝 모델 성능 비교

Juhwan Kim (Department of Applied Statistics, Dankook University) ;
Jongchan Kim (Department of Applied Statistics, Dankook University) ;
Sungim Lee (Department of Statistics and Data Science, Dankook University)

김주환 (단국대학교 응용통계학과) ;
김종찬 (단국대학교 응용통계학과) ;
이성임 (단국대학교 통계데이터사이언스학과)

Received : 2024.07.31
Accepted : 2024.08.29
Published : 2024.10.31

https://doi.org/10.5351/KJAS.2024.37.5.643 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

This study investigates the effectiveness of 3D skeleton data for human action recognition by comparing the classification performance of machine learning and deep learning models. We use the subset of the NTU RGB+D dataset, containing only frontal-view recordings of 40 individuals performing 60 different actions. Our study uses linear discriminant analysis (LDA), support vector machine (SVM), and random forest (RF) as machine learning models, while the deep learning models are hierarchical bidirectional RNN (HBRNN) and semantics-guided neural network (SGN). To evaluate model performance, cross-subject cross-validation is conducted. Our analysis demonstrates that action type significantly impacts model performance. Cluster analysis by action category shows no significant difference in classification performance between machine learning and deep learning models for easily recognizable actions. However, for actions requiring precise differentiation based on frontal-view joint coordinates such as 'clapping' or 'rubbing hands', deep learning models show a higher performance in capturing subtle joint movements compared to machine learning models.

본 연구는 3D 스켈레톤 데이터를 활용하여 머신러닝 및 딥러닝 모델을 통해 동작 인식을 수행하고, 모델 간 분류 성능 차이를 비교 분석하였다. 데이터는 NTU RGB+D 데이터의 정면 촬영 데이터로 40명의 참가자가 수행한 60가지 동작을 분류하였다. 머신러닝 모델로는 선형판별분석(LDA), 다중 클래스 서포트 벡터 머신(SVM), 그리고 랜덤 포레스트(RF)가 있으며, 딥러닝 모델로는 RNN 기반의 HBRNN (hierarchical bidirectional RNN) 모델과 GCN 기반의 SGN (semantics-guided neural network) 모델을 적용하였다. 각 모델의 분류 성능을 평가하기 위해 40명의 참가자별로 교차 검증을 실시하였다. 분석 결과, 모델 간 성능 차이는 동작 유형에 크게 영향을 받았으며, 군집 분석을 통해 각 동작에 대한 분류 성능을 살펴본 결과, 인식이 비교적 쉬운 큰 동작에서는 머신러닝 모델과 딥러닝 모델 간의 성능 차이가 유의미하지 않았고, 비슷한 성능을 나타냈다. 반면, 손뼉치기나 손을 비비는 동작처럼 정면 촬영된 관절 좌표만으로 구별하기 어려운 동작의 경우, 딥러닝 모델이 머신러닝 모델보다 관절의 미세한 움직임을 인식하는 데 더 우수한 성능을 보였다.

Keywords

Acknowledgement

이 성과는 정부 (과학기술정보통신부)의 재원으로 한국연구재단의 지원을 받아 수행된 연구임 (No. 2019R1A2C1003257).

References

Amor BB, Su J, and Srivastava A (2015). Action recognition using rate-invariant analysis of skeletal shape trajectories, IEEE Transactions on Pattern Analysis and Machine Intelligence, 38, 1-13.
Cao C, Lan C, Zhang Y, Zeng W, Lu H, and Zhang Y (2018). keleton-based action recognition with gated convolutional neural networks, IEEE Transactions on Circuits and Systems for Video Technology, 29, 3247-3257.
Chaaraoui AA, Padilla-Lopez JR, and Florez-Revuelta F (2015). Abnormal gait detection with RGB-D devices using joint motion history features, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, 7, 1-6.
Cho K and Chen X (2014). Classifying and visualizing motion capture sequences using deep neural networks, 2014 International Conference on Computer Vision Theory and Applications, 2, 122-130.
Du G, Zhang P, Mai J, and Li Z (2012). Markerless kinect-based hand tracking for robot teleoperation, International Journal of Advanced Robotic Systems, 9, 36.
Du Y, Fu Y, and Wang L (2015). Skeleton based action recognition with convolutional neural network, In Proceedings of 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), Kuala Lumpur, 579-583.
Du Y, Wang W, and Wang L (2015). Hierarchical recurrent neural network for skeleton based action recognition, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1110-1118.
Ghazal S, Khan US, Mubasher Saleem M, Rashid N, and Iqbal J (2019). Human activity recognition using 2D skeleton data and supervised machine learning, IET Image Processing, 13, 2572-2578.
Gregor K, Danihelka I, Graves A, Rezende D, and Wierstra D (2015). Draw: A recurrent neural network for image generation, International Conference on Machine Learning, 37, 1462-1471.
Grushin A, Monner DD, Reggia JA, and Mishra A (2013). Robust human action recognition via long short-term memory, In Proceedings of The 2013 International Joint Conference on Neural Networks (IJCNN), Dallas, TX, 1-8.
Hochreiter S and Schmidhuber J (1997). Long short-term memory, Neural Computation, 9, 1735-1780.
Izenman AJ (2008). Modern Multivariate Statistical Techniques, Springer, New York.
Jalal A, Uddin MZ, and Kim TS (2012). Depth video-based human activity recognition system using translation and scaling invariant features for life logging at smart home, IEEE Transactions on Consumer Electronics, 58, 863-871.
Jeong H and Lim C (2019). A review of artificial intelligence based demand forecasting techiques, The Korean Journal of Applied Statistics, 32, 795-835.
Jeong YS and Park Jh (2018). 3D skeleton animation learning using CNN, Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology, 8, 281-288.
Jin X, Yao Y, Jiang Q, Huang X, Zhang J, Zhang X, and Zhang K (2015). Virtual personal trainer via the kinect sensor, In Proceedings of 2015 IEEE 16th International Conference on Communication Technology, Hangzhou, 406-463.
Kang YK, Kang HY, and Weon DS (2021). Human skeleton keypoints based fall detection using GRU, Journal of the Korea Academia-Industrial Cooperation Society, 22, 127-133.
Ke Q, Bennamoun M, An S, Sohel F, and Boussaid F (2017). A new representation of skeleton sequences for 3d action recognition, In Proceedings of the IEEE conference on computer vision and pattern recognition, 3288-3297.
Kim W, Kim D, Park KS, and Lee S (2023). Motion classification using distributional features of 3D skeleton data, Communications for Statistical Applications and Methods, 30, 551-560.
Kipf TN and Welling M (2016). Semi-supervised classification with graph convolutional networks, Available from: arXiv preprint arXiv:1609.02907
Lee I, Kim D, Kang S, and Lee S (2017). Ensemble deep learning for skeleton-based action recognition using temporal sliding lstm networks, In Proceedings of the IEEE international conference on computer vision, 1012-1020.
Lee J, Lee M, Lee D, and Lee S (2023). Hierarchically decomposed graph convolutional networks for skeleton-based action recognition, In Proceedings of the IEEE/CVF International Conference on Computer Vision, 10444-10453.
Lefebvre G, Berlemont S, Mamalet F, and Garcia C (2013). BLSTM-RNN based 3D gesture classification, Artificial Neural Networks and Machine Learning-ICANN 2013: 23rd International Conference on Artificial Neural Networks Sofia, Bulgaria, 23, 381-388.
Li C, Zhong Q, Xie D, and Pu S (2017). Skeleton-based action recognition with convolutional neural networks. In 2017 IEEE international conference on multimedia & expo workshops, 597-600.
Li C, Zhong Q, Xie D, and Pu S (2018). Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation, Available from: IarXiv preprint arXiv:1804.06055
Lin BS, Wang LY, Hwang YT, Chiang PY, and Chou WJ (2018). Depth camera based system for estimating energy expenditure of physical activities in gyms, IEEE Journal of Biomedical and Health Informatics, 23, 1086-1095.
Liu J, Shahroudy A, Xu D, and Wang G (2016). Spatio-temporal lstm with trust gates for 3d human action recognition, Computer Vision-ECCV 2016: 14th European Conference, 14, 816-833.
Reddy VR and Chattopadhyay T (2014). Human activity recognition from kinect captured data using stick model, International Conference on Human-Computer Interaction, 305-315.
Rumelhart DE, Hinton GE, and Williams RJ (1986). Learning representations by back-propagating error, Nature, 323, 533-536.
Sandra M (2020). Clustering Gestures using Multiple Techniques, Digital Sciences Tilburg University, Tilburg, The Netherlands.
Schuster M and Paliwal KK (1997). Bidirectional recurrent neural networks, IEEE Transactions on Signal Processing, 45, 2673-2681.
Shahroudy A, Liu J, Ng TT, and Wang G (2016). Ntu rgb+ d: A large scale dataset for 3d human activity analysis, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1010-1019.
Shan J and Akella S (2014). 3D human action segmentation and recognition using pose kinetic energy, In Proceedings of 2014 IEEE International Workshop on Advanced Robotics and Its Social Impacts, Evanston, IL, 69-75.
Shin BG, Kim UH, Lee SW, Yang JY, and Kim W (2021). Fall detection based on 2-stacked Bi-LSTM and human-skeleton keypoints of RGBD camera, KIPS Transactions on Software and Data Engineering, 10, 491-500.
Taha A, Zayed HH, Khalifa ME, and El-Horbaty ESM (2015). Human activity recognition for surveillance applications, In Proceedings of the 7th International Conference on Information Technology , 577-586.
Tao W, Liu T, Zheng R, and Feng H (2012). Gait analysis using wearable sensors, Sensors, 12, 2255-2283.
Xu H, Gao Y, Hui Z, Li J, and Gao X (2023). Language knowledge-assisted representation learning for skeleton-based action recognition, Available from: arXiv preprint arXiv:2305.12398
Veeriah V, Zhuang N, and Qi GJ (2015). Differential recurrent neural networks for action recognition, In Proceedings of the IEEE International Conference on Computer Vision, 4041-4049.
Yang Y, Yan H, Dehghan M, and Ang MH (2015). Real-time human-robot interaction in complex environment using kinect v2 image recognition, In 2015 IEEE 7th International Conference on Cybernetics and Intelligent Systems (CIS) and IEEE Conference on Robotics, 112-117.
Zhang P, Lan C, Zeng W, Xing J, Xue J, and Zheng N (2020). Semantics-guided neural networks for efficient skeleton-based human action recognition, In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1112-1121.
Zhu W, Lan C, Xing J, Zeng W, Li Y, Shen L, and Xie X (2016, March). Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks, Proceedings of the AAAI Conference on Artificial Intelligence, 30.

The Korean Journal of Applied Statistics (응용통계연구)

Classification of human actions using 3D skeleton data: A performance comparison between classical machine learning and deep learning models

스켈레톤 데이터에 기반한 동작 분류: 고전적인 머신러닝과 딥러닝 모델 성능 비교

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)