Browse > Article
http://dx.doi.org/10.4218/etrij.2018-0102

Vector space based augmented structural kinematic feature descriptor for human activity recognition in videos  

Dharmalingam, Sowmiya (Department of Information and Communication Engineering, Anna University)
Palanisamy, Anandhakumar (Department of Computer Technology, Madras Institute of Technology, Anna University)
Publication Information
ETRI Journal / v.40, no.4, 2018 , pp. 499-510 More about this Journal
Abstract
A vector space based augmented structural kinematic (VSASK) feature descriptor is proposed for human activity recognition. An action descriptor is built by integrating the structural and kinematic properties of the actor using vector space based augmented matrix representation. Using the local or global information separately may not provide sufficient action characteristics. The proposed action descriptor combines both the local (pose) and global (position and velocity) features using augmented matrix schema and thereby increases the robustness of the descriptor. A multiclass support vector machine (SVM) is used to learn each action descriptor for the corresponding activity classification and understanding. The performance of the proposed descriptor is experimentally analyzed using the Weizmann and KTH datasets. The average recognition rate for the Weizmann and KTH datasets is 100% and 99.89%, respectively. The computational time for the proposed descriptor learning is 0.003 seconds, which is an improvement of approximately 1.4% over the existing methods.
Keywords
human activity recognition; kinematic features; multiclass support vector machine classifier; structural features; vector space based augmented structural kinematic;
Citations & Related Records
연도 인용수 순위
  • Reference
1 L. Ballan et al., Recognizing human actions by using effective codebooks and tracking, Advanced Topics in Computer Vision, Springer, London, 2013, pp. 65-93.
2 O. D. Lara and M. A. Labrador, A survey on human activity recognition using wearable sensors, IEEE Commun. Surveys Tuts. 15 (2013), no. 3, 1192-1209.   DOI
3 L. Liu et al., Learning spatio-temporal representations for action recognition: A genetic programming approach, IEEE Trans. Cybern. 46 (2016), no. 1, 158-170.   DOI
4 S. Atiqur Rahman et al., Fast action recognition using negative space features, Expert Syst. Appl. 41 (2014), no. 2, pp. 574-587.   DOI
5 T. Wee Chua and K. Leman, A novel human action representation via convolution of shape-motion histograms, International Conference on Multimedia Modeling, Springer, Jan 2014, pp. 98-108.
6 A. Iosifidis, A. Tefas, and I. Pitas, Discriminant bag of words based representation for human action recognition, Pattern Recogn. Lett. 49 (2014), 185-192.   DOI
7 A. Eweiwi, M. Shahzad Cheema, and C. Bauckhage, Action recognition in still images by learning spatial interest regions from videos, Pattern Recogn. Lett. 51 (2015), 8-15.   DOI
8 B. Yao et al., A fuzzy logic-based system for the automation of human behavior recognition using machine vision in intelligent environments, Soft. Comput. 19 (2015), no. 2, 499-506.   DOI
9 L. Yao, Y. Liu, and S. Huang, Spatio-temporal information for human action recognition, EURASIP J. Image Video Process. 39 (2016), 1-9.
10 Y. Zhao et al., Region-based mixture models for human action recognition in low-resolution videos, Neurocomputing 247 (2017), 1-5.   DOI
11 H. Qian et al., Recognizing human actions from silhouettes described with weighted distance metric and kinematics, Multimed. Tools Appl. 76 (2017), no. 21, 21889-21910.   DOI
12 J. Luo, W. Wang, and H. Qi, Spatio-temporal feature extraction and representation for RGB-D human action recognition, Pattern Recogn. Lett. 50 (2014), 139-148.   DOI
13 X. Fang et al., Action recognition using edge trajectories and motion acceleration descriptor, Mach. Vis. Appl. 27 (2016), no. 6, 861-875.   DOI
14 F. Han et al., Space-time representation of people based on 3D skeletal data: A review, Comput. Vis. Image Underst. 158 (2017), 85-105.   DOI
15 A. Jalal et al., Robust human activity recognition from depth video using spatiotemporal multi-fused features, Pattern Recogn. 61 (2017), 295-308.   DOI
16 S. Althloothi et al., Human activity recognition using multi-features and multiple kernel learning, Pattern Recogn. 47 (2014), no. 5, 1800-1812.   DOI
17 Y. Song et al., Combining rgb and depth features for action recognition based on sparse representation, Proceedings of the 7th International Conference on Internet Multimedia Computing and Service, ACM, Aug 2015, pp. 49.
18 X. Yang and Y. L. Tian, Super normal vector for human activity recognition with depth cameras, IEEE Trans. Pattern Anal. Mach. Intell. 39 (2017), no. 5, 1028-1039.   DOI
19 M. Liu, H. Liu, and C. Chen, Enhanced skeleton visualization for view invariant human action recognition, Pattern Recogn. 68 (2017), 346-362.   DOI
20 Y. Shi et al., Sequential deep trajectory descriptor for action recognition with three-stream CNN, IEEE Trans. Multimed. 19 (2017), no. 7, 1510-1520.   DOI
21 K. Xu, X. Jiang, and T. Sun, Two-stream dictionary learning architecture for action recognition, IEEE Trans. Circuits Syst. Video Technol. 27 (2017), no. 3, 567-576.   DOI
22 S. Singh, C. Arora, and C. V. Jawahar, Trajectory aligned features for first person action recognition, Pattern Recogn. 62 (2017), 45-55.   DOI
23 A. Andre Chaaraoui, P. Climent-Perez, and F. Florez-Revuelta, Silhouette-based human action recognition using sequences of key poses, Pattern Recogn. Lett. 34 (2013), no. 15, 1799-1807.   DOI
24 A. Bayat, M. Pomplun, and D. A. Tran, A study on human activity recognition using accelerometer data from smartphones, Procedia Comp. Sci. 34 (2014), 450-457.   DOI
25 Y. Kwon, K. Kang, and C. Bae, Unsupervised learning for human activity recognition using smartphone sensors, Expert Syst. Appl. 41 (2014), no. 14, 6067-6074.   DOI
26 W.-Y. Deng, Q.-H. Zheng, and Z.-M. Wang, Cross-person activity recognition using reduced kernel extreme learning machine, Neural Netw. 53 (2014), 1-7.   DOI
27 N. P. Cuntoor, B. Yegnanarayana, and R. Chellappa, Activity modeling using event probability sequences, IEEE Trans. Image Process. 17 (2008), no. 4, 594-607.   DOI
28 D. Duque, H. Santos, and P. Cortez, Prediction of abnormal behaviors for intelligent video surveillance systems, Symposium on Computational Intelligence and Data Mining, IEEE, Mar 2007, pp. 362-367.
29 Z. Zhang, T. Tan, and K. Huang, An extended grammar system for learning and recognizing complex visual events, IEEE Trans. Pattern Anal. Mach. Intell. 33 (2011), no. 2, 240-255.   DOI
30 I. Lillo, J. Carlos Niebles, and A. Soto, Sparse composition of body poses and atomic actions for human activity recognition in RGB-D videos, Image Vis. Comput. 59 (2017), 63-75.   DOI
31 D. Tran and L. Torresani, EXMOVES: Mid-level features for efficient action recognition and video analysis, Int. J. Comput. Vision 119 (2016), no. 3, 239-253.   DOI
32 H. Zhang and L. E. Parker, Code4d: color-depth local spatiotemporal features for human activity recognition from rgb-d videos, IEEE Trans. Circuits Syst. Video Technol. 26 (2016), no. 3, 541-555.   DOI
33 E. S. L. Ho et al., Improving posture classification accuracy for depth sensor-based human activity monitoring in smart environments, Comput. Vis. Image Underst. 148 (2016), 97-110.   DOI
34 D. K. Vishwakarma and R. Kapoor, Hybrid classifier based human activity recognition using the silhouette and cells, Expert Syst. Appl. 42 (2015), no. 20, 6957-6965.   DOI
35 M. Li and H. Leung, Graph-based approach for 3D human skeletal action recognition, Pattern Recogn. Lett. 87 (2017), 195-202.   DOI
36 X. Ji et al., The spatial laplacian and temporal energy pyramid representation for human action recognition using depth sequences, Knowl.-Based Syst. 122 (2017), 64-74.   DOI
37 J. Ben-Arie et al., Human activity recognition using multidimensional indexing, IEEE Trans. Pattern Anal. Mach. Intell. 24 (2002), no. 8, 1091-1104.   DOI
38 M. Liu, H. Liu, and C. Chen, 3D action recognition using multiscale energy-based global ternary image, IEEE Trans. Circuits Syst. Video Technol. (2017).
39 J.-W. Hsieh et al., Video-based human movement analysis and its application to surveillance systems, IEEE Trans. Multimedia 10 (2008), no. 3, 372-384.   DOI
40 S.-W. Lee et al., Hierarchical active shape model with motion prediction for real-time tracking of non-rigid objects, IET Comput. Vision 1 (2007), no. 1, 17-24.   DOI
41 J. Carlos Niebles, H. C. Wang, and L. Fei-Fei, Unsupervised learning of human action categories using spatial-temporal words, Int. J. Comput. Vision 79 (2008), no. 3, 299-318.   DOI
42 N. Ikizler and P. Duygulu, Histogram of oriented rectangles: A new pose descriptor for human action recognition, Image Vis. Comput. 27 (2009), no. 10, 1515-1526.   DOI
43 G. Yu et al., Fast action detection via discriminative random forest voting and top-k sub volume search, IEEE Trans. Multimedia 13 (2011), no. 3, 507-517.   DOI
44 H. Wang et al., Supervised class-specific dictionary learning for sparse modeling in action recognition, Pattern Recogn. 45 (2012), no. 11, 3902-3911.   DOI
45 D. Zhao et al., Combining appearance and structural features for human action recognition, Neurocomputing 113 (2013), 88-96.   DOI
46 K. K. Reddy and M. Shah, Recognizing 50 human action categories of web videos, Mach. Vis. Appl. 24 (2013), no. 5, 971-981.   DOI
47 Y. Gao et al., Violence detection using oriented violent flows, Image Vis. Comput. 48 (2016), 37-41.
48 M. Javan Roshtkhari and M. D. Levine, Human activity recognition in videos using a single example, Image Vis. Comput. 31 (2013), no. 11, 864-876.   DOI