DOI QR코드

DOI QR Code

Vector space based augmented structural kinematic feature descriptor for human activity recognition in videos

  • Received : 2018.02.28
  • Accepted : 2018.06.25
  • Published : 2018.08.07

Abstract

A vector space based augmented structural kinematic (VSASK) feature descriptor is proposed for human activity recognition. An action descriptor is built by integrating the structural and kinematic properties of the actor using vector space based augmented matrix representation. Using the local or global information separately may not provide sufficient action characteristics. The proposed action descriptor combines both the local (pose) and global (position and velocity) features using augmented matrix schema and thereby increases the robustness of the descriptor. A multiclass support vector machine (SVM) is used to learn each action descriptor for the corresponding activity classification and understanding. The performance of the proposed descriptor is experimentally analyzed using the Weizmann and KTH datasets. The average recognition rate for the Weizmann and KTH datasets is 100% and 99.89%, respectively. The computational time for the proposed descriptor learning is 0.003 seconds, which is an improvement of approximately 1.4% over the existing methods.

Keywords

References

  1. O. D. Lara and M. A. Labrador, A survey on human activity recognition using wearable sensors, IEEE Commun. Surveys Tuts. 15 (2013), no. 3, 1192-1209. https://doi.org/10.1109/SURV.2012.110112.00192
  2. L. Liu et al., Learning spatio-temporal representations for action recognition: A genetic programming approach, IEEE Trans. Cybern. 46 (2016), no. 1, 158-170. https://doi.org/10.1109/TCYB.2015.2399172
  3. Y. Gao et al., Violence detection using oriented violent flows, Image Vis. Comput. 48 (2016), 37-41.
  4. X. Fang et al., Action recognition using edge trajectories and motion acceleration descriptor, Mach. Vis. Appl. 27 (2016), no. 6, 861-875. https://doi.org/10.1007/s00138-016-0746-x
  5. F. Han et al., Space-time representation of people based on 3D skeletal data: A review, Comput. Vis. Image Underst. 158 (2017), 85-105. https://doi.org/10.1016/j.cviu.2017.01.011
  6. A. Jalal et al., Robust human activity recognition from depth video using spatiotemporal multi-fused features, Pattern Recogn. 61 (2017), 295-308. https://doi.org/10.1016/j.patcog.2016.08.003
  7. J. Luo, W. Wang, and H. Qi, Spatio-temporal feature extraction and representation for RGB-D human action recognition, Pattern Recogn. Lett. 50 (2014), 139-148. https://doi.org/10.1016/j.patrec.2014.03.024
  8. S. Althloothi et al., Human activity recognition using multi-features and multiple kernel learning, Pattern Recogn. 47 (2014), no. 5, 1800-1812. https://doi.org/10.1016/j.patcog.2013.11.032
  9. Y. Song et al., Combining rgb and depth features for action recognition based on sparse representation, Proceedings of the 7th International Conference on Internet Multimedia Computing and Service, ACM, Aug 2015, pp. 49.
  10. X. Yang and Y. L. Tian, Super normal vector for human activity recognition with depth cameras, IEEE Trans. Pattern Anal. Mach. Intell. 39 (2017), no. 5, 1028-1039. https://doi.org/10.1109/TPAMI.2016.2565479
  11. M. Liu, H. Liu, and C. Chen, Enhanced skeleton visualization for view invariant human action recognition, Pattern Recogn. 68 (2017), 346-362. https://doi.org/10.1016/j.patcog.2017.02.030
  12. M. Liu, H. Liu, and C. Chen, 3D action recognition using multiscale energy-based global ternary image, IEEE Trans. Circuits Syst. Video Technol. (2017).
  13. I. Lillo, J. Carlos Niebles, and A. Soto, Sparse composition of body poses and atomic actions for human activity recognition in RGB-D videos, Image Vis. Comput. 59 (2017), 63-75. https://doi.org/10.1016/j.imavis.2016.11.004
  14. D. Tran and L. Torresani, EXMOVES: Mid-level features for efficient action recognition and video analysis, Int. J. Comput. Vision 119 (2016), no. 3, 239-253. https://doi.org/10.1007/s11263-016-0905-6
  15. H. Zhang and L. E. Parker, Code4d: color-depth local spatiotemporal features for human activity recognition from rgb-d videos, IEEE Trans. Circuits Syst. Video Technol. 26 (2016), no. 3, 541-555. https://doi.org/10.1109/TCSVT.2014.2376139
  16. E. S. L. Ho et al., Improving posture classification accuracy for depth sensor-based human activity monitoring in smart environments, Comput. Vis. Image Underst. 148 (2016), 97-110. https://doi.org/10.1016/j.cviu.2015.12.011
  17. D. K. Vishwakarma and R. Kapoor, Hybrid classifier based human activity recognition using the silhouette and cells, Expert Syst. Appl. 42 (2015), no. 20, 6957-6965. https://doi.org/10.1016/j.eswa.2015.04.039
  18. A. Andre Chaaraoui, P. Climent-Perez, and F. Florez-Revuelta, Silhouette-based human action recognition using sequences of key poses, Pattern Recogn. Lett. 34 (2013), no. 15, 1799-1807. https://doi.org/10.1016/j.patrec.2013.01.021
  19. A. Bayat, M. Pomplun, and D. A. Tran, A study on human activity recognition using accelerometer data from smartphones, Procedia Comp. Sci. 34 (2014), 450-457. https://doi.org/10.1016/j.procs.2014.07.009
  20. Y. Kwon, K. Kang, and C. Bae, Unsupervised learning for human activity recognition using smartphone sensors, Expert Syst. Appl. 41 (2014), no. 14, 6067-6074. https://doi.org/10.1016/j.eswa.2014.04.037
  21. W.-Y. Deng, Q.-H. Zheng, and Z.-M. Wang, Cross-person activity recognition using reduced kernel extreme learning machine, Neural Netw. 53 (2014), 1-7. https://doi.org/10.1016/j.neunet.2014.01.008
  22. N. P. Cuntoor, B. Yegnanarayana, and R. Chellappa, Activity modeling using event probability sequences, IEEE Trans. Image Process. 17 (2008), no. 4, 594-607. https://doi.org/10.1109/TIP.2008.916991
  23. D. Duque, H. Santos, and P. Cortez, Prediction of abnormal behaviors for intelligent video surveillance systems, Symposium on Computational Intelligence and Data Mining, IEEE, Mar 2007, pp. 362-367.
  24. Z. Zhang, T. Tan, and K. Huang, An extended grammar system for learning and recognizing complex visual events, IEEE Trans. Pattern Anal. Mach. Intell. 33 (2011), no. 2, 240-255. https://doi.org/10.1109/TPAMI.2010.60
  25. J.-W. Hsieh et al., Video-based human movement analysis and its application to surveillance systems, IEEE Trans. Multimedia 10 (2008), no. 3, 372-384. https://doi.org/10.1109/TMM.2008.917403
  26. S.-W. Lee et al., Hierarchical active shape model with motion prediction for real-time tracking of non-rigid objects, IET Comput. Vision 1 (2007), no. 1, 17-24. https://doi.org/10.1049/iet-cvi:20045243
  27. J. Ben-Arie et al., Human activity recognition using multidimensional indexing, IEEE Trans. Pattern Anal. Mach. Intell. 24 (2002), no. 8, 1091-1104. https://doi.org/10.1109/TPAMI.2002.1023805
  28. J. Carlos Niebles, H. C. Wang, and L. Fei-Fei, Unsupervised learning of human action categories using spatial-temporal words, Int. J. Comput. Vision 79 (2008), no. 3, 299-318. https://doi.org/10.1007/s11263-007-0122-4
  29. N. Ikizler and P. Duygulu, Histogram of oriented rectangles: A new pose descriptor for human action recognition, Image Vis. Comput. 27 (2009), no. 10, 1515-1526. https://doi.org/10.1016/j.imavis.2009.02.002
  30. G. Yu et al., Fast action detection via discriminative random forest voting and top-k sub volume search, IEEE Trans. Multimedia 13 (2011), no. 3, 507-517. https://doi.org/10.1109/TMM.2011.2128301
  31. H. Wang et al., Supervised class-specific dictionary learning for sparse modeling in action recognition, Pattern Recogn. 45 (2012), no. 11, 3902-3911. https://doi.org/10.1016/j.patcog.2012.04.024
  32. D. Zhao et al., Combining appearance and structural features for human action recognition, Neurocomputing 113 (2013), 88-96. https://doi.org/10.1016/j.neucom.2013.01.022
  33. K. K. Reddy and M. Shah, Recognizing 50 human action categories of web videos, Mach. Vis. Appl. 24 (2013), no. 5, 971-981. https://doi.org/10.1007/s00138-012-0450-4
  34. M. Javan Roshtkhari and M. D. Levine, Human activity recognition in videos using a single example, Image Vis. Comput. 31 (2013), no. 11, 864-876. https://doi.org/10.1016/j.imavis.2013.08.005
  35. L. Ballan et al., Recognizing human actions by using effective codebooks and tracking, Advanced Topics in Computer Vision, Springer, London, 2013, pp. 65-93.
  36. S. Atiqur Rahman et al., Fast action recognition using negative space features, Expert Syst. Appl. 41 (2014), no. 2, pp. 574-587. https://doi.org/10.1016/j.eswa.2013.07.082
  37. T. Wee Chua and K. Leman, A novel human action representation via convolution of shape-motion histograms, International Conference on Multimedia Modeling, Springer, Jan 2014, pp. 98-108.
  38. A. Iosifidis, A. Tefas, and I. Pitas, Discriminant bag of words based representation for human action recognition, Pattern Recogn. Lett. 49 (2014), 185-192. https://doi.org/10.1016/j.patrec.2014.07.011
  39. A. Eweiwi, M. Shahzad Cheema, and C. Bauckhage, Action recognition in still images by learning spatial interest regions from videos, Pattern Recogn. Lett. 51 (2015), 8-15. https://doi.org/10.1016/j.patrec.2014.07.017
  40. B. Yao et al., A fuzzy logic-based system for the automation of human behavior recognition using machine vision in intelligent environments, Soft. Comput. 19 (2015), no. 2, 499-506. https://doi.org/10.1007/s00500-014-1270-4
  41. L. Yao, Y. Liu, and S. Huang, Spatio-temporal information for human action recognition, EURASIP J. Image Video Process. 39 (2016), 1-9.
  42. Y. Zhao et al., Region-based mixture models for human action recognition in low-resolution videos, Neurocomputing 247 (2017), 1-5. https://doi.org/10.1016/j.neucom.2017.03.033
  43. H. Qian et al., Recognizing human actions from silhouettes described with weighted distance metric and kinematics, Multimed. Tools Appl. 76 (2017), no. 21, 21889-21910. https://doi.org/10.1007/s11042-017-4610-4
  44. Y. Shi et al., Sequential deep trajectory descriptor for action recognition with three-stream CNN, IEEE Trans. Multimed. 19 (2017), no. 7, 1510-1520. https://doi.org/10.1109/TMM.2017.2666540
  45. K. Xu, X. Jiang, and T. Sun, Two-stream dictionary learning architecture for action recognition, IEEE Trans. Circuits Syst. Video Technol. 27 (2017), no. 3, 567-576. https://doi.org/10.1109/TCSVT.2017.2665359
  46. S. Singh, C. Arora, and C. V. Jawahar, Trajectory aligned features for first person action recognition, Pattern Recogn. 62 (2017), 45-55. https://doi.org/10.1016/j.patcog.2016.07.031
  47. M. Li and H. Leung, Graph-based approach for 3D human skeletal action recognition, Pattern Recogn. Lett. 87 (2017), 195-202. https://doi.org/10.1016/j.patrec.2016.07.021
  48. X. Ji et al., The spatial laplacian and temporal energy pyramid representation for human action recognition using depth sequences, Knowl.-Based Syst. 122 (2017), 64-74. https://doi.org/10.1016/j.knosys.2017.01.035

Cited by

  1. Human Activity Recognition Using Gaussian Mixture Hidden Conditional Random Fields vol.2019, pp.None, 2019, https://doi.org/10.1155/2019/8590560
  2. Sensor Data Acquisition and Multimodal Sensor Fusion for Human Activity Recognition Using Deep Learning vol.19, pp.7, 2018, https://doi.org/10.3390/s19071716
  3. Zero-Shot Human Activity Recognition Using Non-Visual Sensors vol.20, pp.3, 2018, https://doi.org/10.3390/s20030825