DOI QR코드

DOI QR Code

Human Action Recognition Using Deep Data: A Fine-Grained Study

  • Rao, D. Surendra (Koneru Lakshmaiah Education Foundation, Guru Nanak Institutions Technical Campus) ;
  • Potturu, Sudharsana Rao (Koneru Lakshmaiah Education Foundation) ;
  • Bhagyaraju, V (Siddhartha Institute of Engineering and Technology)
  • 투고 : 2022.06.05
  • 발행 : 2022.06.30

초록

The video-assisted human action recognition [1] field is one of the most active ones in computer vision research. Since the depth data [2] obtained by Kinect cameras has more benefits than traditional RGB data, research on human action detection has recently increased because of the Kinect camera. We conducted a systematic study of strategies for recognizing human activity based on deep data in this article. All methods are grouped into deep map tactics and skeleton tactics. A comparison of some of the more traditional strategies is also covered. We then examined the specifics of different depth behavior databases and provided a straightforward distinction between them. We address the advantages and disadvantages of depth and skeleton-based techniques in this discussion.

키워드

참고문헌

  1. I. Theodorakopoulos, D. Kastaniotis, G. Economou, S. Fotopoulos, "Pose-based human action recognition via sparse representation in dissimilarity space", Journal of Visual Communication and Image Representation, 2014; 25(1):12-23. https://doi.org/10.1016/j.jvcir.2013.03.008
  2. S. Sempena, N. U Maulidevi and P. R. Aryan, "Human action recognition using Dynamic Time Warping", Proc. of the 2011 International Conference on Electrical Engineering and Informatics, Bandung, Indonesia, 2011.
  3. Chen Chen, Roozbeh Jafari and Nasser Kehtarnavaz, "A survey of depth and inertial sensor fusion for human action recognition", Multimed Tools Appl, 76, 4405-4425, 2017. https://doi.org/10.1007/s11042-015-3177-1
  4. Aggarwal JK, Xia L., "Human activity recognition from 3d data: a review", Pattern Recognition Letters, 48:70-80, 2014. https://doi.org/10.1016/j.patrec.2014.04.011
  5. Klette, R., Tee, G., "Understanding human motion: A historic review", In Rosenhahn, B., Klette, R., Metaxas, D., eds.: Human Motion. Volume 36 of Computational Imaging and Vision. Springer Netherlands (2008) 1-22.
  6. Chen C, Kehtarnavaz N, Jafari R, "A medication adherence monitoring system for pill bottles based on a wearable inertial sensor", 36th IEEE Annual International Conference on Engineering in Medicine and Biology Society (EMBC), 2014, pp. 4983-4986.
  7. Shah M, Javed O, Shafique K. "Automated visual surveillance in realistic scenarios", IEEE Multimedia, 2007; 14(1):30e9. https://doi.org/10.1109/MMUL.2007.3
  8. Aggarwal J, Ryoo M. "Human activity analysis", ACM Comput Surv, Jan. 2011; 43(3):1-43. https://doi.org/10.1145/1922649.1922653
  9. Chen L, Khalil I., "Activity recognition: approaches, practices and trends", In: Activity recognition in pervasive intelligent environments Atlantis ambient and pervasive intelligence, vol. 4; 2011. p. 10-31.
  10. Michalis Vrigkas, Christophoros Nikou and Ioannis A. Kakadiaris, "A Review of Human Activity Recognition Methods", Frontiers in Robotics and AI, Volume 5, Article 28, 2015.
  11. Poppe R., "A survey on vision-based human action recognition", Image Vis Comput., 28(6), 2010, pp. 976-990. https://doi.org/10.1016/j.imavis.2009.11.014
  12. Ramanathan M, Yau WY, Teoh EK, "Human action recognition with video data: research and evaluation challenges", IEEE Trans Human-Machine Systems, 44(5):650-663, 2014. https://doi.org/10.1109/THMS.2014.2325871
  13. Aggarwal JK, Xia L., "Human activity recognition from 3d data: a review", Pattern Recognition Letters, 48:70-80, 2014. https://doi.org/10.1016/j.patrec.2014.04.011
  14. A. Shahroudy, J. Liu, T.-T. Ng, G. Wang. "NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis", IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016. Las Vegas, NV; pp. 1010-1019.
  15. C. Chen, R. Jafari, N. Kehtarnavaz., "UTD-MHAD: A Multimodal Dataset for Human Action Recognition Utilizing a Depth Camera and a Wearable Inertial Sensor", IEEE International Conference on Image Processing (ICIP); 2015. Quebec City, Canada; pp. 168-172.
  16. S. Gaglio, G. L. Re, M. Morana., "Human activity recognition process using 3-d posture data", IEEE Transactions on Human-Machine Systems, vol.45, issue 5, 2015, pp.586-597. https://doi.org/10.1109/THMS.2014.2377111
  17. G. Yu, Z. Liu, J. Yuan., "Discriminative Orderlet Mining for Real-Time Recognition of Human-Object Interaction", In: D. Cremers, I. Reid, H. Saito, M.-H. Yang, editors. Computer Vision - ACCV 2014: 12th Asian Conference on Computer Vision, Singapore, Singapore, November 1-5, 2014, Revised Selected Papers, Part V. Image Processing, Computer Vision, Pattern Recognition, and Graphics ed. Springer International Publishing, Cham; 2014. pp. 50-65.
  18. M. Munaro, G. Ballin, S. Michieletto, E. Menegatti., "3D flow estimation for human action recognition from colored point clouds", Biologically Inspired Cognitive Architectures. 2013;5:42-51. https://doi.org/10.1016/j.bica.2013.05.008
  19. F. Negin, F. Ozdemir, C. B. Akgul, K. A. Yuksel, A. Ercil., "A Decision Forest Based Feature Selection Framework for Action Recognition from RGB-Depth Cameras", In: M. Kamel, A. Campilho, editors. Image Analysis and Recognition. Lecture Notes in Computer Science ed. Munich: Springer, Berlin, Heidelberg; 2013. pp. 648-657.
  20. F. Ofli, R. Chaudhry, G. Kurillo, R. Vidal, R. Bajcsy., "Berkeley MHAD: A Comprehensive Multimodal Human Action Database", IEEE Workshop on Applications of Computer Vision (WACV); 2013. Clearwater, Florida; pp. 53-60.
  21. H. S. Koppula, R. Gupta, A. Saxena., "Learning human activities and object affordances from RGB-D videos", International Journal of Robotics Research, 2013; 32 (8):915-970.
  22. O. Oreifej, Z. Liu., "HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences", IEEE Conference on Computer Vision and Pattern Recognition; Portland. 2013. pp. 716-723.
  23. Z. Cheng, L. Qin, Y. Ye, Q. Huang, Q. Tian. "Human Daily Action Analysis with Multi-view and Color-Depth Data", In: A. Fusiello, V. Murino, R. Cucchiara, editors. Computer Vision - ECCV 2012. Workshops and Demonstrations. Lecture Notes in Computer Science ed. Springer, Berlin, Heidelberg; 2012. pp. 52-61.
  24. L. Seidenari, V. Varano, S. Berretti, A. Del Bimbo, P. Pala., "Recognizing Actions from Depth Cameras as Weakly Aligned Multi-part Bag-of-Poses", IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW); 2013. Portland, Oregon; pp. 479-485.
  25. J. Wang, Z. Liu, Y. Wu, J. Yuan., "Mining Actionlet Ensemble for Action Recognition with Depth Cameras", IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2012. Providence, Rhode Island; pp. 1290-1297.
  26. V. Bloom, D. Makris, V. Argyriou., "G3D: A Gaming Action Dataset and Real Time Action Recognition Evaluation Framework", IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW); 2012. Providence, Rhode Island; pp. 7-12.
  27. L. Xia, C.-C. Chen, J. Aggarwal., "View Invariant Human Action Recognition Using Histograms of 3d Joints", IEEE Conference on Computer Vision and Pattern Recognition Workshops; 2012. Providence, Rhode Island; pp. 20-27.
  28. A. Kurakin, Z. Zhang, Z. Liu., "A Real Time System for Dynamic Hand Gesture Recognition with a Depth Sensor", In: Proceedings of the 20th European Signal Processing Conference (EUSIPCO); 2012. Bucharest, Romania; pp. 1975-1979.
  29. J. Sung, C. Ponce, B. Selman, A. Saxena., "Unstructured Human Activity Detection from RGBD Images", IEEE Conference on Robotics and Automation (ICRA); 2012. St. Paul, Minnesota; pp. 842-849.
  30. Y. C. Lin, M. C. Hu, W. H. Cheng, Y. H. Hsieh, and H. M. Chen, "Human action recognition and retrieval using sole depth information," in ACM MM, 2012, pp. 1053-1056.
  31. W. Li, Z. Zhang, Z. Liu., "Action Recognition Based on a Bag of 3d Points", IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 2010. San Francisco, CA; pp. 9-14.
  32. Li, W., Zhang, Z., Liu, Z., "Expandable data-driven graphical modeling of human actions based on salient postures", IEEE Transactions on Circuits and Systems for Video Technology, 18(11) (2008) 1499-1510. https://doi.org/10.1109/TCSVT.2008.2005597
  33. Vieira, A., Nascimento, E., Oliveira, G., Liu, Z., Campos, M., "STOP: Space-time occupancy patterns for 3d action recognition from depth map sequences", In: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, (2012) 252-259.
  34. Wang J, Liu Z, Chorowski J, Chen Z, Wu Y., "Robust 3D action recognition with random occupancy patterns", Computer Vision e ECCV 2012 Lecture Notes in Computer Science; 2012. p. 872-885.
  35. Lee H, Battle A, Raina R, Ng AY., "Efficient sparse coding algorithms", Proc. 19th Ann. Conf. Neural Information Processing Systems; 2007. pp. 801-808.
  36. Jalal A, Uddin MZ, Kim JT, Kim T S., "Recognition of human home activities via depth silhouettes and R transformation for smart homes", Indoor Built Environ, 2012; 21(1):184e90. https://doi.org/10.1177/1420326X11423163
  37. Wang Y, Huang K, Tan T., "Human activity recognition based on R transform", IEEE Conference on Computer Vision and Pattern Recognition; 2007.
  38. Yang, X., Zhang, C., Tian, Y., "Recognizing actions using depth motion maps based histograms of oriented gradients", In: ACM International Conference on Multimedia, (2012) 1057-1060.
  39. Chen, C., Jafari, R., & Kehtarnavaz, N., "Action recognition from depth sequences using depth motion maps-based local binary patterns". In Proc., of 2015 IEEE winter conference on Applications of computer vision (WACV), 2015, pp. 1092-1099.
  40. Chen, C., Liu, M., Zhang, B. , Han, J. , Jiang, J. , & Liu, H., "3d action recognition using multi-temporal depth motion maps and fisher vector", In IJCAI , 2016, (pp. 3331-3337).
  41. Mahmoud Al-Faris, John Chiverton, Yanyan Yang 2 and David Ndzi, "Deep Learning of Fuzzy Weighted Multi-Resolution Depth Motion Maps with Spatial Feature Fusion for Action Recognition", J. Imaging 2019, 5, 82; doi:10.3390/jimaging5100082.
  42. Jiang Li; Xiaojuan Ban; Guang Yang; Yitong Li; Yu Wang, "Real-time human action recognition using depth motion maps and convolutional neural networks", International Journal of High Performance Computing and Networking, 2019 Vol.13 No.3, pp.312 - 320 https://doi.org/10.1504/ijhpcn.2019.098572
  43. Xu Weiyao, Wu Muqing, Zhao Min, Liu Yifeng, Lv Bo, and Xia Ting, "Human Action Recognition Using Multilevel Depth Motion Maps", IEEE Access, Volume 7, 2019, pp. 41811- 41822. https://doi.org/10.1109/access.2019.2907720
  44. Wu Li, Q. Wang, and Y. Wang, "Action Recognition Based on Depth Motion Map and Hybrid Classifier", mathematical problems in engineering, Vol.2018, Article ID 8780105, 10 pages.
  45. Kim D, Yun W. H, Yoon H. S, and Jaehong H. S, "Action recognition with depth maps using hog descriptors of multi-view motion," in proc., of 8th International Conference on Mobile Ubiquitous Computing, Systems, Services, and Technologies, UBICOMM, pp. 2308-4278, 2014.
  46. Chen C, Hou Z, Zhang B, Jiang J, Yang Y., "Gradient local autocorrelations and extreme learning machine for depth-based activity recognition", Advances in Visual Computing Lecture Notes in Computer Science; 2015. pp. 613-623.
  47. Kobayashi T, Otsu N., "Image feature extraction using gradient local autocorrelations", Lecture Notes in Computer Science Computer Vision e ECCV 2008; 2008. p. 346-358.
  48. Huang G-B, Zhu Q-Y, Siew C. K., "Extreme learning machine: theory and applications", Neurocomputing 2006; 70(1e3):489 - 501. https://doi.org/10.1016/j.neucom.2005.12.126
  49. G.-B. Huang, H. Zhou, X. Ding, and R. Zhang, "Extreme learning machine for regression and multiclass classification," IEEE Transactions on Systems, Man, and Cybernetics B, vol. 42, no. 2, pp. 513-529, 2012. https://doi.org/10.1109/TSMCB.2011.2168604
  50. Chen C, Zhang B, Hou Z, Jiang J, Liu M, Yang Y., "Action recognition from depth sequences using weighted fusion of 2D and 3D autocorrelation of gradients features", Multimed Tool Appl, 2016;76(3): 4651-69. https://doi.org/10.1007/s11042-016-3284-7
  51. Kobayashi T, Otsu N., "Motion recognition using local auto-correlation of space time gradients", Pattern Recogn Lett, 2012;33(9):1188e95. https://doi.org/10.1016/j.patrec.2012.01.007
  52. Liu H, He Q, Liu M., "Human action recognition using adaptive hierarchical depth motion maps and Gabor filter", IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017.
  53. Wang P, Li W, Gao Z, Zhang J, Tang C, Ogunbona P., "Deep convolutional neural networks for action recognition using depth map sequences", arXiv preprint arXiv:1501.04686; 2015.
  54. Oreifej, O., Liu, Z., "HON4D: Histogram of oriented 4d normals for activity recognition from depth sequences", IEEE Conference on Computer Vision and Pattern Recognition, (2013)
  55. Zhang, H., Parker, L., "4-dimensional local Spatio-temporal features for human activity recognition", In: International Conference on Intelligent Robots and Systems. (2011) 2044{2049
  56. Grifiths, T.L., Steyvers, M., Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America 101(Suppl 1) (2004) 5228-5235. https://doi.org/10.1073/pnas.0307752101
  57. Pritchard, J. K.; Stephens, M.; Donnelly, P. (June 2000), "Inference of population structure using multi-locus genotype data", Genetics. 155 (2): pp. 945-959. https://doi.org/10.1093/genetics/155.2.945
  58. Johansson, G., "Visual motion perception", Scientific American (1975).
  59. Ye, M., Wang, X., Yang, R., Ren, L., Pollefeys, M., "Accurate 3d pose estimation from a single depth image", IEEE International Conference on Computer Vision. (2011) 731-738
  60. Criminisi, A., Shotton, J., Robertson, D., Konukoglu, E., "Regression forests for efficient anatomy detection and localization in ct studies", Workshop on Medical Computer Vision. (2010)
  61. Campbell, L., Bobick, A., "Recognition of human body motion using phase space constraints", IEEE International Conference on Computer Vision. (1995) 624-630.
  62. M. Jiang , J. Kong , G. Bebis , H. Huo, "Informative joints based human action recognition using skeleton contexts", Signal Process. Image Commun., 33 (2015) 29-40. https://doi.org/10.1016/j.image.2015.02.004
  63. Koppula, H.S., Gupta, R., Saxena, A.: "Human activity learning using object affordances from RGB-D videos", CoRR abs/1208.0967 (2012)
  64. Lai, K., Bo, L., Ren, X., Fox, D., "Sparse distance learning for object recognition combining RGB and depth information", International Conferences on Robotics and Automation. (2011) 4007-4013.
  65. Sung, J., Ponce, C., Selman, B., Saxena, A., "Human activity detection from RGBD images", In: Plan, Activity, and Intent Recognition. (2011)
  66. Yao, A., Gall, J., Van Gool, L., "Coupled action recognition and pose estimation from multiple views", International Journal of Computer Vision, 100(1) (2012) 16-37 https://doi.org/10.1007/s11263-012-0532-9
  67. M uller, M., R oder, T., Clausen, M., "Efficient content-based retrieval of motion capture data", ACM Transactions on Graphics, 24 (2005) 677-685 https://doi.org/10.1145/1073204.1073247
  68. Gall, J., Yao, A., Razavi, N., Van Gool, L., Lempitsky, V., "Hough forests for object detection, tracking, and action recognition", IEEE Transactions on Pattern Analysis and Machine Intelligence (2011)
  69. Tenorth, M., Bandouch, J., Beetz, M., "The TUM kitchen data set of everyday manipulation activities for motion tracking and action recognition", IEEE Workshop on Tracking Humans for the Evaluation of their Motion in Image Sequences, (2009).
  70. X. Yang and Y. L. Tian, "Eigen joints-based action recognition using naive-Bayes-nearest-neighbor," in Computer vision and pattern recognition workshops (CVPRW), 2012, pp. 14-19.
  71. F. Ofli, R. Chaudhry, G. Kurillo, R. Vidal, and R. Bajcsy, "Sequence of the most informative joints (SMIJ): A new representation for human skeletal action recognition", J. Vis. Commun. Image Represent., Vol. 25, no. 1, pp. 24-38, 2014. https://doi.org/10.1016/j.jvcir.2013.04.007
  72. M. Barnachon, S. Bouakaz, B. Boufama, and E. Guillou, "Ongoing human action recognition with motion capture", Pattern Recognit., vol. 47, no. 1, pp. 238-247, 2014. https://doi.org/10.1016/j.patcog.2013.06.020
  73. R. Vemulapalli, F. Arrate, and R. Chellappa, "Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group," CVPR, pp. 588-595, 2014.
  74. Q. Ke, M. Bennamoun, S. An, F. Sohel, and F. Boussaid, "A New Representation of Skeleton Sequences for 3D Action Recognition," in CVPR, June 2017
  75. F. Han, B. Reily, W. Hoff, and H. Zhang, "space-time representation of people based on 3d skeletal data: a review", arXiv preprint arXiv:1601.01006, 2016.
  76. K. Yun, J. Honorio, D. Chattopadhyay, T. L. Berg, and D. Samaras., "Two-person interaction detection using body pose features and multiple instance learning", IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 28-35, 2012.
  77. S. Yan, Y. Xiong, and D. Lin, "Spatial Temporal Graph Convolutional Net works for Skeleton-Based Action Recognition," in AAAI, 2018.
  78. Kay, W.; Carreira, J.; Simonyan, K.; Zhang, B.; Hillier, C.; Vijayanarasimhan, S.; Viola, F.; Green, T.; Back, T.; Natsev, P.; et al. 2017. The kinetics human action video dataset. In arXiv:1705.06950.
  79. C. Wolf, J. Mille, E. Lombardi, O. Celiktutan, M. Jiu, M. Baccouche, E. Dellandrea, C.-E. Bichot, C. Garcia, B. Sankur., "The LIRIS Human Activities Dataset and the ICPR 2012 Human Activities Recognition and Localization Competition", In: LIRIS Laboratory, Tech. Rep. RR-LIRIS-2012-004, March 2012
  80. Aouaidjia Kamel, Bin Sheng, Yang Po, Ping Li, and Ruimin Shen, "Deep Convolutional Neural Networks for Human Action Recognition Using Depth Maps and Postures", IEEE Transactions on Systems, Man, and Cybernetics: Systems, Volume: 49 , Issue: 9 , Sept. 2019, pp.1806-1819. https://doi.org/10.1109/tsmc.2018.2850149
  81. C. Linqin, L. Xiaolin, F. Chen, and M. Xiang, "Robust Human Action recognition based on Depth Motion Maps and improved Convolutional Neural Networks", Journal of Electronic Imaging, Vol. 27, No.5, 2018.
  82. Fanjia Li, Aichun Zhu, Yonggang Xu, Ran Cui, And Gang Hua, "Multi-Stream and Enhanced Spatial-Temporal Graph Convolution Network for Skeleton-Based Action Recognition", IEEE Access, Volume 8, 2020, pp. 97757-97770. https://doi.org/10.1109/access.2020.2996779
  83. Yun Han, Sheng Luen Chung, Qiang Xiao, Wei You Lin, and Shun Feng Su, "Global Spatio-Temporal Attention for Action Recognition Based on 3D Human Skeleton Data", IEEE Access, Volume 8, 2020, pp.88604-88616. https://doi.org/10.1109/access.2020.2992740