Browse > Article
http://dx.doi.org/10.9717/kmms.2022.25.8.1049

Optimization of Action Recognition based on Slowfast Deep Learning Model using RGB Video Data  

Jeong, Jae-Hyeok (Dept. of Human Intelligence & Robot Engineering, Sangmyung University)
Kim, Min-Suk (Dept. of Human Intelligence & Robot Engineering, Sangmyung University)
Publication Information
Abstract
HAR(Human Action Recognition) such as anomaly and object detection has become a trend in research field(s) that focus on utilizing Artificial Intelligence (AI) methods to analyze patterns of human action in crime-ridden area(s), media services, and industrial facilities. Especially, in real-time system(s) using video streaming data, HAR has become a more important AI-based research field in application development and many different research fields using HAR have currently been developed and improved. In this paper, we propose and analyze a deep-learning-based HAR that provides more efficient scheme(s) using an intelligent AI models, such system can be applied to media services using RGB video streaming data usage without feature extraction pre-processing. For the method, we adopt Slowfast based on the Deep Neural Network(DNN) model under an open dataset(HMDB-51 or UCF101) for improvement in prediction accuracy.
Keywords
Human Action Recognition(HAR); RGB Video; Deep Learning (DL); Slowfast;
Citations & Related Records
Times Cited By KSCI : 4  (Citation Analysis)
연도 인용수 순위
1 H.B. Zhang, Y.X. Zhang, B. Zhong, Q. Lei, L. Yang, and J.X. Du, et al., "A Comprehensive Survey of Vision-Based Human Action Recognition Methods," Sensors, Vol. 19, No. 5, pp 1005, 2019.
2 J.H. Kim, J.H. Choi, Y.H. Park, and A. Nasridinov, "Abnormal Situation Detection on Surveillance Video Using Object Detection and Action Recognition," Journal of Korea Multimedia Society, Vol. 24, No. 2, pp. 186-198, 2021.   DOI
3 H. Duan, Y. Zhao, K. Chen, D. Lin, and Bo. Dai, "Revisiting Skeleton-based Action Recognition," arXiv Preprint, arXiv:2104.13586, 2022.
4 D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, "Learning Spatiotemporal Features with 3D Convolutional Networks," IEEE International Conference on Computer Vision (ICCV), pp. 4489-4497, 2015.
5 W. Sultani, C. Chen, and M. Shah, "Real-World Anomaly Detection in Surveillance Videos," IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6479-6488, 2018.
6 W. Kay, J. Carreira, K. Simonyan, B. Zhang, C. Hillier, and S. Vijayanarasimhan, et al., "The Kinetics Human Action Video Dataset," arXiv Preprint, arXiv:1705.06950, 2017.
7 H. Xu, A. Das, and K. Saenko, "R-C3D: Region Convolutional 3D Network for Temporal Activity Detection," IEEE International Conference on Computer Vision (ICCV), pp. 5794-5803, 2017.
8 C. Feichtenhofer, H. Fan, J. Malik, and K. He, "SlowFast Networks for Video Recognition," IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6201-6210, 2019
9 Z. Cao, G. Hidalgo, T. Simon, S. Wei, and Y. Sheikh, "OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields," IEEE Transactions on Pattern Analysis & Machine Intelligence, Vol. 43, No. 01, pp. 172-186, 2021.   DOI
10 M.K. Kim and E.Y. Cha, "Using Skeleton Vector Information and RNN Learning Behavior Recognition Algorithm," Journal of Broadcast Engineering, Vol. 23, No. 5, pp. 598-605, 2018.   DOI
11 K. Soomro, A.R. Zamir, and M. Shah, "UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild," arXiv Preprint, arXiv:1212.0402, 2012.
12 J. Pan, S. Chen, M. Z. Shou, Y. Liu, J. Shao, and H. Li, "Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization," IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 464-474, 2021.
13 M. Jeon and K. Cheoi, "Anomaly Detection using Combination of Motion Features," Journal of Korea Multimedia Society, Vol. 21, No. 3, pp. 348-357, 2018   DOI
14 H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, "HMDB: A Large Video Database for Human Motion Recognition," International Conference on Computer Vision, pp. 2556-2563, 2011.
15 R. Morais, V. Le, T. Tran, B. Saha, M. Mansour, and S. Venkatesh, "Learning Regularity in Skeleton Trajectories for Anomaly Detection in Videos," IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11988-11996, 2019.
16 N. Tasnim and J.H. Baek, "Deep Learning-based Action Recognition using Skeleton Joints Mapping," Journal of Advanced Navigation Technology, Vol. 24, No. 2, pp. 155-162, 2020.   DOI
17 M. Kim, Z. Lee, and W. Kim, "Realtime Human Object Segmentation Using Image and Skeleton Characteristics," Journal of Broadcast Engineering, Vol. 21, No. 5, pp. 782-791, 2016.   DOI
18 S. Chen and R.R. Yang, "Pose Trainer: Correcting Exercise Posture using Pose Estimation," arXiv Preprint, arXiv:2006.11718, 2020.
19 S. Virender, S. Swati, and G. Pooja, "Real-Time Anomaly Recognition Through CCTV Using Neural Networks," Procedia Computer Science, Vol. 173, pp. 254-263, 2020   DOI
20 H. Fan, B. Xiong, K. Mangalam, Y. Li, Z. Yan, and J. Malik, et al., "Multiscale Vision Transformers," IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6804-6815, 2021.
21 A. Arnab, M. Dehghani, G. Heigold, C. Sun, M. Lucic, and C. Schmid, "ViViT: A Video Vision Transformer," IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6816-6826, 2021.
22 K. Simonyan and A. Zisserman, "Two-Stream Convolutional Networks for Action Recognition in Videos," NIPS, pp. 568-576, 2014.
23 C. Gu, C. Sun, D.A. Ross, C. Vondrick, C. Pantofaru, and Y. Li, et al., "AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions," IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6047-6056, 2018
24 P. Zell, B. Wandt, and B. Rosenhahn, "Joint 3D Human Motion Capture and Physical Analysis from Monocular Videos," IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 17-26, 2017.
25 H.J. Bae, G.J. Jang, Y.H. Kim, and J.P. Kim, "LSTM(Long Short-Term Memory)-Based Abnormal Behavior Recognition Using Alpha Pose," KIPS Transactions on Software and Data Engineering, Vol. 10, No. 5, pp. 187-194, 2021.   DOI
26 Y. Chen, Z. Zhang, C. Yuan, B. Li, Y. Deng, and W. Hu, "Channel-Wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition," IEEE/CVF International Conference on Computer Vision (ICCV), pp. 13339-13348, 2021.
27 H.S. Fang, S. Xie, Y.W. Tai, and C. Lu, "RMPE: Regional Multi-Person Pose Estimation," IEEE International Conference on Computer Vision (ICCV), pp. 2353-2362, 2017.
28 S. Ji, W. Xu, M. Yang, and K. Yu, "3D Convolutional Neural Networks for Human Action Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, No. 1, pp. 221-231, 2013.   DOI
29 J. Carreira and A. Zisserman, "Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset," IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4724-4733, 2017
30 W. Kim, "Edge Computing Server Deployment Technique for Cloud VR-based Multi-User Metaverse Content," Journal of Korea Multimedia Society, Vol. 24, No. 8, pp. 1090-1100, 2021   DOI