Browse > Article

Spatio-temporal Semantic Features for Human Action Recognition  

Liu, Jia (Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University)
Wang, Xiaonian (Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University)
Li, Tianyu (Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University)
Yang, Jie (Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University)
Publication Information
KSII Transactions on Internet and Information Systems (TIIS) / v.6, no.10, 2012 , pp. 2632-2649 More about this Journal
Abstract
Most approaches to human action recognition is limited due to the use of simple action datasets under controlled environments or focus on excessively localized features without sufficiently exploring the spatio-temporal information. This paper proposed a framework for recognizing realistic human actions. Specifically, a new action representation is proposed based on computing a rich set of descriptors from keypoint trajectories. To obtain efficient and compact representations for actions, we develop a feature fusion method to combine spatial-temporal local motion descriptors by the movement of the camera which is detected by the distribution of spatio-temporal interest points in the clips. A new topic model called Markov Semantic Model is proposed for semantic feature selection which relies on the different kinds of dependencies between words produced by "syntactic " and "semantic" constraints. The informative features are selected collaboratively based on the different types of dependencies between words produced by short range and long range constraints. Building on the nonlinear SVMs, we validate this proposed hierarchical framework on several realistic action datasets.
Keywords
action recognition; spatio-temporal features; topic model; markov model;
Citations & Related Records

Times Cited By Web Of Science : 0  (Related Records In Web of Science)
연도 인용수 순위
  • Reference
1 M. Bregonzio, S.Gong and T. Xiang, "Recognising action as clouds of space-time interest points". In IEEE Conf. Computer Vision and Patt. Recog, pp.1948-1955, Aug 2009.
2 Fathi and G. Mori, "Action recognition by learning midlevel motion features," In IEEE Conf. Computer Vision and Patt. Recog, pp.1-8 , Aug 2008.
3 Z. Zhang, Y. Hu, S. Chan, and L.-T. Chia, "Motion context: A new representation for human action recognition," In Proc. European Conf. Computer Vision, vol.4, pp.817-829, Oct 2008.
4 M. Rodriguez, J.Ahmed, and M.Shah, "Action MACH: A Spatio-temporal Maximum Average Correlation Height Filter for Action Recognition," In IEEE Conf. Computer Vision and Patt. Recog, pp. 1-8, Aug 2008.
5 J. Liu and J. Luo snd M. Shah, "Recognizing realistic actions from videos "in the wild"". In IEEE Conf. Computer Vision and Patt. Recog, pp. 1996-2003, Aug 2009.
6 M. Marszalek, I. Laptev, C. Schmid, "Actions in context," IEEE Conf. Computer Vision and Patt. Recog, pp. 2929-2936 , Aug 2009.
7 J. Sun, X. Wu, S. Yan et.al., "Hierarchical spatio-temporal context modeling for action recognition," In IEEE Conf. Computer Vision and Patt. Recog, pp. 2004-2011, Aug 2009.
8 R. Messing, C. Pal, and H. Kautz, "Activity recognition using the velocity histories of tracked keypoints," In IEEE Intl. Conf. Computer Vision, pp. 104-111, Aug 2009.
9 M. Bregonzio, J. Li, S. Gong and T. Xiang, "Discriminative Topics Modelling for Action Feature Selection and Recognition," In Proc. Conf. British Machine Vision, pp. 1-11, Aug 2010.
10 C. Schuldt, I. Laptev, and B. Caputo, "Recognizing Human Actions: A Local SVM Approach". In Intl. Conf. Pattern Recognition, pp. 32-36, Aug2004.
11 Laptev and T. Lindeberg, "Space-Time Interest Points," In IEEE Intl. Conf. Computer Vision, pp. 432-439, Jun 2003.
12 P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie. "Behavior recognition via sparse spatio-temporal features". In IEEE Intl. Workshop Visual Surveillance and Performance Evaluation of Tracking and Surveillance,pp. 65-72, Oct 2005.
13 S-F Wong, R. Cipolla, "Extracting spatiotemporal interest points using global information," In IEEE Intl. Conf. Computer Vision, pp.1-8, Oct 2007.
14 G. Willems, T. Tuytelaars, L. J. Van Gool, "An efficient dense and scale-invariant spatiotemporal interest point detector," In Proc. European Conf. Computer Vision, pp. 650-663, Oct 2008.
15 J. Liu, M. Shah, "Learning human actions via information maximization," In IEEE Conf. Computer Vision and Patt. Recog, pp1-8, Aug 2008.
16 J.C. Niebles, F.F. Li, "A hierarchical model of shape and appearance for human action classification," In IEEE Conf. Computer Vision and Patt. Recog, pp1-8, Jun 2007.
17 K.G. Derpanis, M. Sizintsev, K.J. Cannons, and R.P. Wildes, "Efficient action spotting based on a spacetime oriented structure representation", In IEEE Conf. Computer Vision and Patt. Recog, pp. 1990-1997, Jun 2010.
18 H. Jiang and D. R.Martin, "Finding actions using shape flows," In Proc. European Conf. Computer Vision, pp. 278-292, Oct 2008.
19 P. Matikainen, M. Hebert, and R. Sukthankar, "Trajectons: Action recognition through the motion analysis of tracked features," In IEEE Conf. Computer Vision and Patt. Recog, pp.514-521, Jun 2009.
20 J.M. Morel and G.Yu, "ASIFT: A New Framework for Fully Affine Invariant Image Comparison", SIAM Journal on Imaging Sciences, vol.2, no.2, pp. 438-469 , Apr 2009.   DOI
21 M. Steyvers, T. Griffiths, "Probabilistic Topic Models",Handbook of Latent Semantic Analysis, Psychology Press, vol. 427, no.7, pp. 424-440 , Feb 2007.
22 J.C. Niebles, H-C. Wang, F.F. Li, "Unsupervised learning of human action categories using spatial-temporal words," International Journal of Computer Vision, vol.79 no.3, pp. 299-318, Sep 2008.   DOI   ScienceOn
23 S-F. Wong, T-K. Kim and R. Cipolla, "Learning motion categories using both semantic and structural information," In IEEE Conf. Computer Vision and Patt. Recog, pp.18-23, Jun 2007.
24 J.G. Zhang, S.H. Gong, "Action categorization by structural probabilistic latent semantic analysis," Computer Vision and Image Understanding, vol. 114, no.8, pp. 857-864 , May 2010.   DOI   ScienceOn
25 Y. Wang, G. Mori, "Human Action Recognition by Semi-latent Topic," Models. IEEE Trans. Pattern Anal. Mach. Intel, vol. 31 no.10, pp.1762-1774 , Oct 2009.   DOI
26 T. Hospedales, S. Gong and T. Xiang, "A Markov Clustering Topic Model for Mining Behaviour in Video," In IEEE Intl. Conf. Computer Vision, pp.1165-1172 , Oct 2009.
27 D. M. Blei, M. I. Jordan and A. Y. Ng, and Jafferty, "Latent Dirichlet allocation." Journal of Machine Learning Research, vol.3, no.4, pp. 993-1022, Jan 2003.
28 P. Scovanner, S. Ali, M. Shah, "A 3-dimensional SIFT descriptor and its application to action recognition", In Intl Conf. Multimedia, pp.357-360, Sep 2007.
29 T. Griffiths, M. Steyvers, D. M. Blei, J. B. Tenenbaum. "Integrating topics and syntax". Advances in Neural Information Processing Systems vol.17 no.17, pp. 537-544 , Dec 2005.
30 T. Griffiths, M. Steyvers, "Finding scientific topics". Proceedings of the National Academy of Sciences, vol. 101, no.1, pp.5228-5235 , Apr 2004.   DOI
31 T. Minka, J. Lafferty, "Expectation-propagation for the generative aspect model," In Proc. Conf. Uncertainty in Artificial Intelligence, pp.352-359 , Aug 2002.
32 Y. M. Lui and R. Beveridge, "Tangent bundle for human action recognition," In Proc. Conf. Automatic Face and Gesture Recognition, pp.97-102 , Mar 2011.
33 S. Maji, A.C. Berg, J. Malik, "Classification using intersection kernel support vector machines is efficient," In IEEE Conf. Computer Vision and Patt. Recog, pp.1-8, Jun 2008.
34 H. Wang, M. Muneeb Ullah, A. Klaser, I. Laptev, and C. Schmid, "Evaluation of local spatio-temporal features for action recognition," In Proc. Conf. British Machine Vision, pp.1-10 , Sep 2009.
35 Yao, J. Gall, and L. V. Gool, "A hough transform-based voting framework for action recognition". In IEEE Conf. Computer Vision and Patt. Recog, pp. 2061 - 2068, Jun 2010.
36 Q.V. Le, W.Y. Zou, S.Y. Yeung, A.Y. Ng. "Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis," In IEEE Conf. Computer Vision and Patt. Recog, pp.3361 - 3368, Jun 2011.
37 J. Liu, Y. Yang and M. Shah, "Learning semantic visual vocabularies using diffusion distance," IEEE Conf. Computer Vision and Patt. Recog, pp.461 - 468, Jun 2009.
38 D. Zhang and G. Lu, "Study and evaluation of different fourier methods for Image Retrieval", Image and Vision Computing, vol.23, no.1 pp. 33-49, Ja 2005.   DOI   ScienceOn
39 H. Wang and Alexander et al. Action Recognition by Dense Trajectories,In IEEE Conf. Computer Vision and Patt. Recog, pp. 3169-3176, Aug 2011.
40 J. Liu, J. Yang. "Action recognition using spatiotemporal features and hybrid generative/discriminative models". Journal of Electronic Imaging. vol.21 no.2, pp.1-11 Apr 2012.
41 M. Belkin and P. Niyogi, "Laplacian Eigenmaps for dimensionality reduction and data representation". Neural Computation, vol.15, no. 6, pp.1373-1396, Jun 2003.   DOI   ScienceOn