A Tree Regularized Classifier-Exploiting Hierarchical Structure Information in Feature Vector for Human Action Recognition

Luo, Huiwu;Zhao, Fei;Chen, Shangfeng;Lu, Huanzhang;

doi:10.3837/tiis.2017.03.020

KSII Transactions on Internet and Information Systems (TIIS)

제11권3호
/
Pages.1614-1632
/
2017
/
1976-7277(pISSN)
/
1976-7277(eISSN)

한국인터넷정보학회 (Korean Society for Internet Information)

DOI QR Code

A Tree Regularized Classifier-Exploiting Hierarchical Structure Information in Feature Vector for Human Action Recognition

Luo, Huiwu (National Key Laboratory of Automatic Target Recognition (ATR), School of Electronic Science and Engineering, National University of Defense Technology) ;
Zhao, Fei (National Key Laboratory of Automatic Target Recognition (ATR), School of Electronic Science and Engineering, National University of Defense Technology) ;
Chen, Shangfeng (National Key Laboratory of Automatic Target Recognition (ATR), School of Electronic Science and Engineering, National University of Defense Technology) ;
Lu, Huanzhang (National Key Laboratory of Automatic Target Recognition (ATR), School of Electronic Science and Engineering, National University of Defense Technology)

투고 : 2016.10.03
심사 : 2016.12.27
발행 : 2017.03.31

https://doi.org/10.3837/tiis.2017.03.020 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

Bag of visual words is a popular model in human action recognition, but usually suffers from loss of spatial and temporal configuration information of local features, and large quantization error in its feature coding procedure. In this paper, to overcome the two deficiencies, we combine sparse coding with spatio-temporal pyramid for human action recognition, and regard this method as the baseline. More importantly, which is also the focus of this paper, we find that there is a hierarchical structure in feature vector constructed by the baseline method. To exploit the hierarchical structure information for better recognition accuracy, we propose a tree regularized classifier to convey the hierarchical structure information. The main contributions of this paper can be summarized as: first, we introduce a tree regularized classifier to encode the hierarchical structure information in feature vector for human action recognition. Second, we present an optimization algorithm to learn the parameters of the proposed classifier. Third, the performance of the proposed classifier is evaluated on YouTube, Hollywood2, and UCF50 datasets, the experimental results show that the proposed tree regularized classifier obtains better performance than SVM and other popular classifiers, and achieves promising results on the three datasets.

키워드

참고문헌

Wang X, Wang. L and Qiao. Y, "A comparative study of encoding, pooling and normalization methods for action recognition," in Proc. of Asian Conference on Computer Vision, 2012.
K. Yu, J. Yang, Y. Gong, "Linear Spatial Pyramid Matching Using Sparse Coding," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 2009.
J.C. van Gemert, C.J. Veenman, A.W.M. Smeulders and J.M. Geusebroek, "Visual word ambiguity," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 7, pp.1271-1283, 2010. https://doi.org/10.1109/TPAMI.2009.132
J. Wang, J. Yang, K. Yu, F. Lv and T. Huang, "Locality-constrained Linear Coding for Image Classification," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 2010.
J. Sanchez, F. Perronnin, T. Mensink and J. Verbeek, "Image Classification with the Fisher Vector: Theory and Practice," International Journal of Computer Vision, vol. 105, no. 3, pp. 222-245, 2013. https://doi.org/10.1007/s11263-013-0636-x
A. Kovashka, K. Grauman, "Learning a hierarchy of discriminative space-time neighborhood features for human action recognition," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 2010.
J. Wang, Z. Chen, Y. Wu, "Action recognition with multiscale spatio-temporal contexts," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 2011.
H. Wang, C. Yuan, W. Hu, H. Ling, W. Yang, and C. Sun, "Action Recognition Using Nonnegative Action Component Representation and Sparse Basis Selection," IEEE Transaction on Image Processing, vol. 23, pp. 570-581, Feb. 2014. https://doi.org/10.1109/TIP.2013.2292550
H.Wang, A.Kläser, C.Schmid, C.Liu, "Dense trajectories and motion boundary descriptors for action recognition," International Journal of Computer Vision, vol. 103, no. 1, pp. 60-79, 2013. https://doi.org/10.1007/s11263-012-0594-8
I. Laptev, M. Marszalek, C. Schmid, B. Rozenfeld, "Learning realistic human actions from movies," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 2008.
M. Ullah, SN. Parizi, I. Laptev, "Improving bag-of features action recognition with non-local cues," in Proc. of British Machine Vision Conference, 2010.
S. Lazebnik, C. Schmid, J. Ponce, "Beyond bags of features: Spatio-temporal pyramid matching for recognizing natural scene categories," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 2006.
A. F. T. Martins, D. Yogatama, N.A. Smith and M. A. T. Figueiredo, "Structured Sparsity in Natural Language Processing: Models, Algorithms, and Applications," in Proc. of the European Chapter of the Association for Computational Linguistics: Tutorials, 2014.
M. Yuan, Y. Lin, "Model selection and estimation in regression with grouped variables," Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 68, no. 1, pp. 49-67, 2006. https://doi.org/10.1111/j.1467-9868.2005.00532.x
P. Zhao, G. Rocha, B. Yu, "The composite absolute penalties family for grouped and hierarchical variable selection," The Annals of Statistics, vol. 37, no. 6A, pp. 3468-3497, 2009. https://doi.org/10.1214/07-AOS584
Yogatama. D, Smith. N. A, "Linguistic structured sparsity in text categorization," in Proc. of the Annual Meeting of the Association for Computational Linguistics, 2014.
Yogatama. D, Smith. N. A, "Making the most of bag of words: Sentence regularization with alternating direction method of multipliers," in Proc. of the 31st International Conference on Machine Learning, 2014.
L. Yan, W. Li, G. Xue, and D. Han, "Coupled Group Lasso for Web-Scale CTR Prediction in Display Advertising," in Proc. of the 31st International Conference on Machine Learning, 2014.
W. Deng, W. Yin, Y. Zhang, "Group sparse optimization by alternating direction method," in Proc. of SPIE Optical Engineering+ Applications. International Society for Optics and Photonics, 2013.
N. Parikh, S. Boyd, "Proximal algorithms," Foundations and Trends in optimization, vol. 1, no. 3, pp. 123-231, 2013.
Bach F, Jenatton R, Mairal J, Obozinski. G, "Optimization with sparsity-inducing penalties," Foundations and Trends in Machine Learning, vol, 1,no. 4, pp. 1-106, 2012.
Jenatton R, Mairal J, Obozinski G, Bach. F, "Proximal methods for hierarchical sparse coding," The Journal of Machine Learning Research, vol. 1,no. 12, pp. 2297-2334, 2011.
Qin Z, Goldfarb D, "Structured sparsity via alternating direction methods," The Journal of Machine Learning Research, vol.1, no. 13, pp. 1435-1468, 2012.
S. Boyd, N. Parikh, E. Chu, B. Peleato, "Distributed optimization and statistical learning via the alternating direction method of multipliers," Foundations and Trends in Machine Learning, vol.3 ,no. 1, pp. 1-122, 2011. https://doi.org/10.1561/2200000016
M. Marszalek, I. Laptev, and C. Schmid, "Actions in context," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 2009.
J. Liu, J. Luo and M. Shah, "Recognizing realistic actions from videos "in the wild"," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 2009.
J. Platt, "Fast training of support vector machines using sequential minimal optimization," Advances in kernel methods-support vector learning, vol. 3, no. 1, pp. 32-37, 1999.
C. Chang and C. Lin, "LIBSVM : a library for support vector machines," ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 3, pp. 1-27, 2011.
Duda. R, Hart. P, Stork. D, Pattern classification, 2nd. Ed, John Wiley & Sons, New York, 2012.
Z. Lu and Y. Peng, "Latent semantic learning with structured sparse representation for human action recognition," Pattern Recognition, vol. 46, no. 7, pp. 1799-1809, 2013. https://doi.org/10.1016/j.patcog.2012.09.027
L. Liu, L. Shao, X. Li and K. Lu, "Learning Spatio-Temporal Representations for Action Recognition: A Genetic Programming Approach," IEEE Transactions on Cybernetics, vol. 46, no. 1, pp. 158-170. 2016. https://doi.org/10.1109/TCYB.2015.2399172
Kishore K. Reddy, and Mubarak Shah, "Recognizing 50 Human Action Categories of Web Videos," Machine Vision and Applications, vol. 24, no. 5, pp. 971-987. 2013. https://doi.org/10.1007/s00138-012-0450-4
Y G. Jiang, Dai Q, Liu W, X Y Xue, and C W .NGO, "Human Action Recognition in Unconstrained Videos by Explicit Motion Modeling," IEEE Transactions on Image Processing, vol. 24, no. 11, pp. 3781-3795. 2015. https://doi.org/10.1109/TIP.2015.2456412
C. Beaudry, R. Péteri, and L Mascarilla, "An efficient and sparse approach for large scale human action recognition in videos," Machine Vision and Applications, vol. 27, no. 4, pp. 529-543. 2016. https://doi.org/10.1007/s00138-016-0760-z
C. Liu, J. Liu, Z. He, Y. Zhai, Q. Hu, and Y Huang, "Convolutional neural random fields for action recognition," Pattern Recognition, vol. 59, pp. 213-224. 2016. https://doi.org/10.1016/j.patcog.2016.03.019
M. Ranzato, F. Huang, Y. Boureau, and Y. LeCun, "Unsupervised learning of invariant feature hierarchies with applications to object recognition," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 2007.
G. Hinton, S. Osindero, and Y. Teh, "A fast learning algorithm for deep belief nets," Neural Computation, vol. 18, no. 7, pp. 1527-1554, 2006. https://doi.org/10.1162/neco.2006.18.7.1527
J. Mairal, F. Bach and J. Ponce, "Sparse Modeling for Image and Vision Processing," Foundations and Trends in Computer Graphics and Vision, vol. 8, no.2-3, pp. 85-283, 2014. https://doi.org/10.1561/0600000058
J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, "Robust face recognition via sparse representation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no.2, pp:210-227, 2009. https://doi.org/10.1109/TPAMI.2008.79
L. Zhang, M. Yang, and X. Feng, "Sparse representation or collaborative representation: Which helps face recognition?," in Proc. of International Conference on Computer Vision, pp: 471-478, 2011.
R. E. Fan, K. W. Chang, C. J. Hsieh, X. R. Wang, and C. J. Lin, "LIBLINEAR: A library for large linear classification," Journal of machine learning research, vol.9 no.8, pp:1871-1874, 2008.
L. Wang, Y. Qiao, and X. Tang, "MoFAP: A Multi-Level Representation for Action Recognition," International Journal of Computer Vision, vol 119, no.3, pp.254-271, 2016. https://doi.org/10.1007/s11263-015-0859-0
S. J. Kim, K. Koh, M. Lustig, S. Boyd, and D. Gorinevsky, "A interior-point method for large-scale l1-regularized least squares," IEEE Journal on Selected Topics in Signal Processing, vol 1, no.4, pp: 606-617, 2007. https://doi.org/10.1109/JSTSP.2007.910971
K. Simonyan, A. Zisserman, "Two-stream convolutional networks for action recognition in videos," Advances in Neural Information Processing Systems, 2014.
O. Kihl, D. Picard, and P.H. Gosselin, "Local polynomial space-time descriptors for action classification," Machine Vision and Applications, vol.27, no.3, pp: 351-361, 2016. https://doi.org/10.1007/s00138-014-0652-z

KSII Transactions on Internet and Information Systems (TIIS)

A Tree Regularized Classifier-Exploiting Hierarchical Structure Information in Feature Vector for Human Action Recognition

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)