Browse > Article
http://dx.doi.org/10.5370/JEET.2015.10.4.1851

Multiscale Spatial Position Coding under Locality Constraint for Action Recognition  

Yang, Jiang-feng (School of Communication and Engineering Information, University of Electronic Science and Technology of China (UESTC))
Ma, Zheng (School of Communication and Engineering Information, UESTC)
Xie, Mei (School of Electronic Engineering, UESTC)
Publication Information
Journal of Electrical Engineering and Technology / v.10, no.4, 2015 , pp. 1851-1863 More about this Journal
Abstract
– In the paper, to handle the problem of traditional bag-of-features model ignoring the spatial relationship of local features in human action recognition, we proposed a Multiscale Spatial Position Coding under Locality Constraint method. Specifically, to describe this spatial relationship, we proposed a mixed feature combining motion feature and multi-spatial-scale configuration. To utilize temporal information between features, sub spatial-temporal-volumes are built. Next, the pooled features of sub-STVs are obtained via max-pooling method. In classification stage, the Locality-Constrained Group Sparse Representation is adopted to utilize the intrinsic group information of the sub-STV features. The experimental results on the KTH, Weizmann, and UCF sports datasets show that our action recognition system outperforms the classical local ST feature-based recognition systems published recently.
Keywords
Action recognition; Action representation; Multiscale spatial position coding under locality constraint;
Citations & Related Records
연도 인용수 순위
  • Reference
1 K. G. Derpanis, J. M. Gryn, Three-dimensional nth derivative of Gaussian separable steerable filters, IEEE Int. Conf. on Image Processing, vol. 3, 2005.
2 K.G. Derpanis, M. Sizintsev, K. Cannons, R. P. Wildes, “Efficient Action Spotting based on a Space time Oriented Structure Representation,” In Proceedings of the IEEE Conference Computer Vision and Pattern Recognition, 2010.
3 R. baraniuk, M. Wakin, “Random projection of smooth manifold,” foundation of computational mathmaematics, vol. 9, pp. 51-77, 2009.   DOI
4 C.P. Wei, Y.W. Chao, Y. R. Yeh, “Locality-sensitive dictionary learning for sparse representation based classification,” Pattern Recognition, vol. 46, no. 5, pp. 1277-1287, 2013.   DOI   ScienceOn
5 K. Yu, T. Zhang, Y. Gong, “Nonlinear learning using local coordinate coding,” in Proceedings of the 23rd Annual Conference on Neural Information Processing Systems, pp. 2223-2231, December, 2009.
6 J. Wang, J. Yang, K. Yu, F. Lv, “Locality-constrained linear coding for image classification,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3360-3367, June, 2010.
7 S. T. Roweis, L. K. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science, vol. 290, no. 5500, pp. 2323-2326, 2000.   DOI   ScienceOn
8 J. Wright, A.Y. Yang, A. Ganesh, S.S. Sastry, Y. Ma, “Robust face recognition via sparse representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210-227, 2009.   DOI   ScienceOn
9 X. Wu, D. Xu, L. Duan, J. Luo, “Action recognition using context and appearance distribution features,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 489-496, June 2011.
10 M. Bregonzio, T. Xiang, S. Gong, “Fusing appearance and distribution information of interest points for action recognition,” Pattern Recognition, vol. 45, no. 3, pp. 1220-1234, 2012.   DOI   ScienceOn
11 B. Chakraborty, M. B. Holte, T. B. Moeslund, J. Gonzalez, “Selective spatio-temporal interest points,” Computer Vision and Image Understanding, vol. 116, no. 3, pp. 396-410, 2012.   DOI   ScienceOn
12 Z. Zhang, C. Wang, B. Xiao, “Action recognition using context constrained linear coding,” Signal Processing Letters, vol. 19, no. 7, pp. 439-442, 2012.   DOI   ScienceOn
13 J. Wang, J. Yang, K. Yu, F. Lv, “Locality-constrained linear coding for image classification,” in Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, pp. 3360-3367, 2010.
14 A. Elgammal, R. Duraiswami, L. Davis, “Efficient kernel density estimation using the fast gauss transform with applications to color modeling and tracking,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, pp. 1499-1504, 2003.   DOI   ScienceOn
15 M. Breunig, H. P. Kriegel, R. T. Ng, J. Sander, “LOF: identifying density-based local outliers,” in Proceeding software 2000 ACM SIGMOD International Conference on Management of Data, 2000.
16 K. Yu, T. Zhang, Y. Gong, “Nonlinear learning using local coordinate coding,” in Advances in Neural Information Processing Systems, vol. 22, pp. 2223-2231, 2009.
17 B. A. Olshausen, D. J. Field, “Emergence of simple-cell receptive field properties by learning a sparse code for natural images,” Nature, vol. 381, no. 6583, pp. 607-609, 1996.   DOI   ScienceOn
18 Y. W. Chao, Y. R. Yeh, Y. W. Chen, “Locality-constrained group sparse representation for robust face recognition,” in Proceedings of the IEEE International Conference on Image Processing (ICIP), pp. 761-764, 2011.
19 M. Zheng, J. Bu, C. Chen, ”Graph regularized sparse coding for image representation,” IEEE Transactions on Image Processing, vol. 20, pp. 1327-1336, 2011.   DOI   ScienceOn
20 J. A. Tropp, “Greed is good: algorithmic results for sparse approximation,” IEEE Transactions on Information Theory, vol. 50, pp. 2231-2242, 2004.   DOI   ScienceOn
21 X. Zhu, Z. Yang, J. Tsien, “Statistics of natural action structures and human action recognition,” Journal of Vision, vol. 12, no. 9, pp. 834-834, 2012.   DOI
22 G. Willems, T. Tuytelaar, L. Van Gool, “An efficient dense and scaleinvariant spatio-temporal interest point detector,” in Proceedings of the Europen Conference on Computer Vision, pp. 650-663, 2008.
23 I. Laptev, M. Marszaek, C. Schmid, B. Rozenfeld, “Learning realistic human actions from movies,” in Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-8, June 2008.
24 M. J. Escobar, P. Kornprobst, “Action recognition via bioinspired features: the richness of center-surround interaction,” Computer Vision and Image Understanding, vol. 116, no. 5, pp. 593-605, 2012.   DOI   ScienceOn
25 S. T. Roweis, L. K. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science, vol. 290, pp. 2323-2326, 2000.   DOI   ScienceOn
26 B. Chakraborty, M. B. Holte, T. B. Moeslund, J. Gonzalez, “Selective spatio-temporal interest points,” Computer Vision and Image Understanding, vol. 116, no. 3, pp. 396-410, 2012.   DOI   ScienceOn
27 Y. Zhu, X. Zhao, Y. Fu, “Sparse coding on local spatial temporal volumes for human action recognition,” in Proceedings of the Computer Vision, pp. 660-671, Springer, Berlin, Germany, 2010.
28 T. Guha, R K. Ward, “Learning sparse representations for human action recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 8, pp. 1576-1588, 2012.   DOI   ScienceOn
29 J. Tenenbaum, V. DeSilva, J. Langford, “Aglobal geometric framework for nonlinear dimensionality reduction”, Science, vol. 290, pp. 2319-2323, 2000.   DOI   ScienceOn
30 S. M. Yoon and A. Kuijper, “Human action recognition based on skeleton splitting,” Expert Systems with Applications, vol. 40, no. 17, pp. 6848-6855, 2013.   DOI   ScienceOn
31 B. Huang, G. Tian, F. Zhou, “Human typical action recognition using gray scale image of silhouette sequence,” Computers and Electrical Engineering, vol. 38, no. 5, pp. 1177-1185, 2012.   DOI   ScienceOn
32 S. A. Rahman, M. K. H. Leung, S. Y. Cho, “Human action recognition employing negative space features,” Journal of Visual Communication and Image Representation, vol. 24, no. 3, pp. 217-231, 2013.   DOI   ScienceOn
33 B. Saghafi, D. Rajan, “Human action recognition using pose-based discriminant embedding,” Signal Processing, vol. 27, no. 1, pp. 96-111, 2012.   DOI
34 A. Klaser, M. Marszalek, C. Schmid, “A spatio-temporal descriptor based on 3D-gradients,” in Proceedings of the British Machine Vision Conference, 2008.
35 L. Shao, L. Ji, Y. Liu, J. Zhang, “Human action segmentation and recognition via motion and shape analysis,” Pattern Recognition Letters, vol. 33, no. 4, pp. 438-445, 2012.   DOI   ScienceOn
36 X. Deng, X. Liu, M. Song, “LF-EME: local features with elastic manifold embedding for human action recognition,” Neurocomputing, vol. 99, no. 1, pp. 144-153, 2013.   DOI   ScienceOn
37 P. Dollar, V. Rabaud, G. Cottrell, S. Belongie, “Behavior recognition via sparse spatio-temporal features,” Proceedings of the 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65-72, October, 2005.
38 P. Scovanner, S. Ali, M. Shah, “A 3-dimensional sift descriptor and its application to action recognition,” in Proceedings of the 15th ACM International Conference on Multimedia, pp. 357-360, September 2007.
39 X. Wu, J. Lai, “Tensor-based projection using ridge regression and its application to action classification,” IET Image Processing, vol. 4, no. 6, pp. 486-493, 2010.   DOI   ScienceOn
40 A. A. Chaaraoui, P. C. Perez, “Silhouette-based human action recognition using sequences of key poses,” Pattern Recognition Letters, vol. 34, no. 15, pp. 1799-1807, 2013.   DOI   ScienceOn
41 J. Yang, K. Yu, Y. Gong, T. Huang, “Linear spatial pyramid matching using sparse coding for image classification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009.
42 K. N. Tran, I. A. Kakadiaris, S. K. Shah, “Modeling motion of body parts for action recognition,” in Proceedings of the British Machine Vision Conference, pp. 1-12, 2011.
43 S. Lazebnik, C. Schmid, J. Ponce, “Beyond bags of features: spatial pyramid matching for recognizing natural scene categories,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2169-2178, June, 2006.
44 D. Xu, Y. Huang, Z. Zeng, X. Xu, “Human gait recognition using patch distribution feature and locality-constrained group sparse representation,” IEEE Transactions on Image Processing, vol. 21, no. 1, pp. 316-326, 2012.   DOI   ScienceOn
45 M. Liu, S. Yan, Y. Fu, T. S. Huang, “Flexible X-Y patches for face recognition,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2113-2116, April 2008.
46 2013, http://spams-devel.gforge.inria.fr/.
47 H. Lee, A. Battle, R. Raina, A. Ng, “Efficient sparse coding algorithms,” Advances in Neural Information Processing Systems, MIT Press, pp. 801-808, 2007.