[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3837/tiis.2014.02.009

Human Action Recognition Using Pyramid Histograms of Oriented Gradients and Collaborative Multi-task Learning

Gao, Zan (Key Laboratory of Computer Vision and System, Ministry of Education, Tianjin University of Technology)
Zhang, Hua (Key Laboratory of Computer Vision and System, Ministry of Education, Tianjin University of Technology)
Liu, An-An (School of Electronic Information Engineering, Tianjin University)
Xue, Yan-Bing (Key Laboratory of Computer Vision and System, Ministry of Education, Tianjin University of Technology)
Xu, Guang-Ping (Key Laboratory of Computer Vision and System, Ministry of Education, Tianjin University of Technology)

Publication Information

KSII Transactions on Internet and Information Systems (TIIS) / v.8, no.2, 2014 , pp. 483-503 More about this Journal

Abstract

In this paper, human action recognition using pyramid histograms of oriented gradients and collaborative multi-task learning is proposed. First, we accumulate global activities and construct motion history image (MHI) for both RGB and depth channels respectively to encode the dynamics of one action in different modalities, and then different action descriptors are extracted from depth and RGB MHI to represent global textual and structural characteristics of these actions. Specially, average value in hierarchical block, GIST and pyramid histograms of oriented gradients descriptors are employed to represent human motion. To demonstrate the superiority of the proposed method, we evaluate them by KNN, SVM with linear and RBF kernels, SRC and CRC models on DHA dataset, the well-known dataset for human action recognition. Large scale experimental results show our descriptors are robust, stable and efficient, and outperform the state-of-the-art methods. In addition, we investigate the performance of our descriptors further by combining these descriptors on DHA dataset, and observe that the performances of combined descriptors are much better than just using only sole descriptor. With multimodal features, we also propose a collaborative multi-task learning method for model learning and inference based on transfer learning theory. The main contributions lie in four aspects: 1) the proposed encoding the scheme can filter the stationary part of human body and reduce noise interference; 2) different kind of features and models are assessed, and the neighbor gradients information and pyramid layers are very helpful for representing these actions; 3) The proposed model can fuse the features from different modalities regardless of the sensor types, the ranges of the value, and the dimensions of different features; 4) The latent common knowledge among different modalities can be discovered by transfer learning to boost the performance.

Keywords

Action Recognition; collaborative multi-task learning; PHOG; depth;

Citations & Related Records

Reference

1	N. Dalal and B. Triggs, "Histograms of oriented gradients for human detection," in Proc. of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR, pp. 886- 893, 2005. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=1467360&queryText%3DHistograms+of+oriented+gradients+for+human+detection
2	A. Bosch, M.-X. Zisserman, "Representing Shape with a Spatial Pyramid Kernel," in Proc. of the 6th ACM International Conference on Image and Video Retrieval, pp.401-408, 2007. http://dl.acm.org/citation.cfm?id=1282340
3	Yue Gao, Jinhui Tang, Richang Hong, Shuicheng Yan, Qionghai Dai, Naiyao Zhang, Tat-Seng Chua, "Camera Constraint-Free View-Based 3D Object Retrieval," IEEE Transactions on Image Processing, vol.21, no.4, pp. 2269 -2281, 2012. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6030936&queryText%3DCamera+Constraint-Free+View-Based+3D+Object+Retrieval DOI ScienceOn
4	Yue Gao, Meng Wang, Zhengjun Zha, Qi Tian, Qionghai Dai, Naiyao Zhang, "Less is More: Efficient 3D Object Retrieval with Query View Selection," IEEE Transactions on Multimedia, vol.11, no.5, pp.1007-1018, 2011.
5	Yue Gao, Rongrong Ji, Longfei Zhang, Alexander Hauptmann, "Symbiotic Tracker Ensemble Towards A Unified Tracking Framework," IEEE Transactions on Circuits and Systems for Video Technology, 2014.
6	Jun Yu, Meng Wang, and Dacheng Tao, "Semi-supervised Multi-view Distance Metric Learning for Cartoon Synthesis," IEEE Transactions on Image Processing, Vol.21, No.11, Nov, 2012. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6236161&queryText%3DSemi-supervised+Multi-view+Distance+Metric+Learning+for+Cartoon+Synthesis
7	Jun Yu a, Dacheng Tao, YongRui, JunCheng, "Pairwise constraints based multi-view features fusion for scene classification," Pattern Recognition, Vol.46, 2013, pp.483-496. http://www.sciencedirect.com/science/article/pii/S0031320312003524 DOI ScienceOn
8	Jun Yu, YongRui, and Bo Chen, "Exploiting Click Constraints and Multi-view Features for Image Reranking," IEEE Transactions on Multimedia, Vol.16, No.1, Jan. 2014. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6623163&queryText%3DExploiting+Click+Constraints+and+Multi-view+Features+for+Image+Reranking
9	Jun Yu, Dongquan Liu, Dacheng Tao , and Hock Soon Seah, 2012, On Combining Multi-view Features for Cartoon Character Retrieval and Clip Synthesis, IEEE Transactions on Systems, Man and Cybernetics-Part B: Cybernetics, Vol.42, Np.5, Oct, 2012. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6189803
10	Hua Wang, Feiping Nie, Heng Huang, "Multi-View Clustering and Feature Learning via Structured Sparsity," ICML, 2013. http://jmlr.org/proceedings/papers/v28/wang13c.pdf
11	A. Liu, and D. Han, "Spatiotemporal Sparsity Induced Similarity Measure for Human Action Recognition," International Journal of Digital Content Technology and its Applications, vol.4, no.5, pp. 23-37, 2010.
12	V. Megavannan, B Agarwal R. Venkatesh Babu, "Human Action Recognition using Depth Maps," in Proc. of International Conference on Signal Processing and Communications, SPCOM pp.1-5, 2012. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6290032&queryText%3DHuman+Action+Recognition+using+Depth+Maps
13	Meng Wang, Hao Li, Dacheng Tao, Ke Lu, Xindong Wu, "Multimodal Graph-Based Reranking for Web Image Search," IEEE Transactions on Image Processing, vol. 21, no. 11, pp. 4649-4661, 2012. DOI ScienceOn
14	Meng Wang and Xian-Sheng Hua, "Active Learning in Multimedia Annotation and Retrieval: A Survey," ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 2, pp.10-31, 2011. http://dl.acm.org/citation.cfm?id=1899414
15	Yue Gao, Meng Wang, Zhengjun Zha, Jialie Shen, Xuelong Li, Xindong Wu, "Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search," IEEE Transactions on Image Processing, vol.22, no.1, pp. 363-376, 2013. DOI ScienceOn
16	Meng Wang, Xian-Sheng Hua, Jinhui Tang, Richang Hong, "Beyond Distance Measurement: Constructing Neighborhood Similarity for Video Annotation," IEEE Transactions on Multimedia, vol. 11, no. 3, pp. 465-476, 2009. DOI ScienceOn
17	Meng Wang, Xian-Sheng Hua, Richang Hong, Jinhui Tang, Guo-Jun Qi, Yan Song, "Unified Video Annotation Via Multi-Graph Learning," IEEE Transactions on Circuits and Systems for Video Technology, vol. 19, no. 5, pp. 733-746, 2009. DOI ScienceOn
18	Yue Gao, Meng Wang, Rongrong Ji, Xindong Wu, Qionghai Dai, "3D Object Retrieval with Hausdorff Distance Learning," IEEE Transactions on Industrial Electronics, vol. 61, no. 4, pp. 2088-2098, 2014. DOI ScienceOn
19	Meng Wang, Bingbing Ni, Xian-Sheng Hua, Tat-Seng Chua, "Assistive Tagging: A Survey of Multimedia Tagging with Human-Computer Joint Exploration," ACM Computing Surveys, vol. 4, no. 4, Article 25, 2012. http://www.medsci.cn/sci/show_paper.asp?id=d8003193194
20	Meng Wang, Richang Hong, Guangda Li, Zheng-Jun Zha, Shuicheng Yan, Tat-Seng Chua, "Event Driven Web Video Summarization by Tag Localization and Key-Shot Identification," IEEE Transactions on Multimedia, vol. 14, no. 4, pp. 975-985, 2012. DOI ScienceOn
21	Yue Gao, Meng Wang, Dacheng Tao, Rongrong Ji, Qionghai Dai, "3D Object Retrieval and Recognition with Hypergraph Analysis," IEEE Transactions on Image Processing, vol.21, no.9, pp. 4290-4303, 2012. DOI ScienceOn
22	M.-Y. Chen and A.-G. Hauptmann, "MoSIFT: Reocgnizing Human Actions in Surveillance Videos," CMU-CS-09-161, Carnegie Mellon University, 2009. http://www.cs.cmu.edu/-mychen/publication/ChenMoSIFTCMU09.pdf
23	M. Hu, "Visual pattern recognition by moment invariants," IRE Transactions on Information Theory, vol.8, no.2, pp.179-187, 1962.
24	R. Mehrotra, "Gabor filter-based edge detection," Pattern Recognition, vol.25, no.12, pp. 1479-1494, 1992. DOI ScienceOn
25	Y.-C. Lin, M.-C. Hua, W-.H. Cheng, Y.-H. Hsieh, H.-M. Chen, "Human Action Recognition and Retrieval Using Sole Depth Information," in Proc. of the 20th ACM international conference on Multimedia, pp.1053-1056, 2012.
26	W. Li, Z. Zhang, and Z.-C. Liu, "Action recognition based on a bag of 3D points," in Proc. of International Conference on Human Communicative Behavior Analysis Workshop, CVPR 2010, pp.2-6. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=5543273&queryText%3DAction+recognition+based+on+a+bag+of+3D+points%2C
27	J. W. Davis and A. Tyagi, "Minimal-latency human action recognition using reliable-inference," Image and Vision Computing, vol.24, no.5, pp.455-472, 2006. http://www.cse.ohio-state.edu/-jwdavis/Publications/ivc06.pdf DOI ScienceOn
28	M. J. Black, Y. Yacoob, A. D. Jepson, and D. J. Fleet, "Learning parameterized models of image motion," in Proc. of IEEE International Conference on Computer Vision and Pattern Recognition, pp.561-567, 1997. 1, 2.
29	A. A. Efros, A. C. Berg, G.Mori, and J.Malik, "Recognizing action at a distance," in Proc. of IEEE International Conference on Computer Vision, pp.1, 2, 2003.
30	J. L. B. D. J. Fleet and S. S. Beauchemin, "Performance of optical flow techniques," International Journal of Computer Vision, vol.12, no.1, pp.43-77, 1994. http://link.springer.com/article/10.1007%2FBF01420984 DOI ScienceOn
31	A. Klaser, M. Marszalek, and C. Schmid, "A spatio-temporal descriptor based on 3d gradients," in Proc. of The British Machine Vision Conference, 2008. 2 http://lear.inrialpes.fr/pubs/2008/KMS08/
32	J. Wang, Z.-C. Liu, Y. Wu, J.-S Yuan, "Mining actionlet ensemble for action recognition with depth cameras," in Proc. of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR, pp.1290 -1297, 2012. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6247813&queryText%3DMining+actionlet+ensemble+for+action+recognition+with+depth+cameras
33	A. Bobick and J. Davis, "The representation and recognition of action using temporal templates," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.23, no.3, pp. 257-267, 2001. DOI ScienceOn
34	L. Gorelick, M. Blank, E. Shechtman, M. Irani, and R. Basri, "Actions as space-time shapes," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.29, no.12, pp. 2247-2253, 2007. DOI ScienceOn
35	P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie, "Behavior recognition via sparse spatio-temporal features," in Proc. of the IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65-72, 2005. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=1570899&queryText%3DBehavior+recognition+via+sparse+spatio-temporal+features
36	C. Schuldt, L. Laptev and B. Caputo, "Recognizing human actions: a local SVM approach," in Proc. of the International Conference on Pattern Recognition, ICPR, pp.32-36, 2004. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=1334462&queryText%3DRecognizing+human+actions%3A+a+local+SVM+approach
37	S. Marcel, Y. Rodrigue, G. Heusch, "On the Recent Use of Binary Patterns for Face Authentication," International Journal on Image and Video Processing Special Issue on Facial image Processing, pp.1-8, 2007. http://publications.idiap.ch/index.php/publications/show/294
38	I. Laptev and T. Lindeberg, "Space-time interest points," in Proc. of the International Conference Computer Vision, ICCV, pp. 432-439, 2003. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=1238378&queryText%3DSpace-time+interest+points
39	Y. Nesterov, "Introductory lectures on convex optimization: A basic course," Springer, 2004.
40	B.-B Ni, G. Wang, P. Moulin, "RGBD-HuDaAct: A Color-Depth Video Database for Human Daily Activity Recognition," in Proc. of International Conference on Computer Vision workshop, ICCV, pp.1147-1153, 2012. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6130379&queryText%3DRGBD-HuDaAct%3A+A+Color-Depth+Video+Database+for+Human+Daily+Activity+Recognition
41	C.-C. Chang, C.J. Lin, 2001, LIBSVM: a library for support vector machines. 2001, http://www.csie.ntu.edu.tw/-cjlin/libsvm/.
42	Z.Gao, M.-Y. Chen, A.-G. Hauptmann and A.-N. Cai, "Comparing Evaluation Protocols on the KTH Dataset," in Proc. of the First international conference on Human behavior understanding, HBU, pp.88-100, 2010. http://link.springer.com/chapter/10.1007%2F978-3-642-14715-9_10
43	C.-H. Liu, Y. Yang, Y. Chen, "Human action recognition using sparse representation," in Proc. of Processing of IEEE International Conference on Intelligent Computing and Intelligent Systems, pp.184-188, 2009. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=5357701&queryText%3DHuman+action+recognition+using+sparse+representation
44	Zan Gao, Jian-ming Song, Hua Zhang, An-An Liu, Yan-bing Xue and Guang-ping Xu, "Action Recognition Via Multi-modality Information," Journal of electrical engineering & Technology, Vol.9 No. 2, pp.742-751, 2014. http://www.jeet.or.kr/LTKPSWeb/uploadfiles/be/201311/191120131352530183750.pdf
45	Zan Gao, An-An Liu, Hua Zhang, Guang-ping Xu,Yan-bing Xue, "Human action recognition based on sparse representation induced by L1/L2 regulations," ICPR, pp. 1868-1871, 2012. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6460518&queryText%3DHuman+action+recognition+based+on+sparse+representation+induced+by+L1%2FL2+regulations
46	K. Guo, P. Ishwar, and J. Konrad, "Action Recognition Using Sparse Representation on Covariance Manifolds of Optical Flow," in Proc. of 2010 Seventh IEEE International Conference on Advanced Video and Signal Based Surveillance, pp.188-195, 2010. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=5597145&queryText%3DAction+Recognition+Using+Sparse+Representation+on+Covariance+Manifolds+of+Optical+Flow
47	Z. Gao, H. Zhang, G.P. Xu, Y.B. Xue, "Human Behavior Recognition Using Structured and Discriminative Sparse Representation," International Journal of Digital Content Technology and its Applications, Vol.6,No.23, 2012, PP. 416-422. DOI
48	J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, "Robust face recognition via sparse representation," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol.31,np.2, pp. 210-227, 2009. DOI ScienceOn
49	L. Zhang, M. Yang and X. Feng, "Sparse Representation or Collaborative Representation: Which Helps Face Recognition?" in Proc. of International Conference on Computer Vision, ICCV 2011. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6126277&queryText%3DSparse+Representation+or+Collaborative+Representation%3A+Which+Helps+Face+Recognition%3F
50	A. Oliva, A. Torralba, "Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope," International Journal of Computer Vision, vol.42, no.3, pp.145-175, 2001. DOI ScienceOn

6	(2014) KSII Transactions on internet and information systems : TIIS Multiple Person Tracking based on Spatial-temporal Information by Global Graph Clustering / 9 (6) , 2217
6	(2020) The Visual computer Deep motion templates and extreme learning machine for sign language recognition / 36 (6) , 1233