Browse > Article
http://dx.doi.org/10.3837/tiis.2017.02.028

A Robust Approach for Human Activity Recognition Using 3-D Body Joint Motion Features with Deep Belief Network  

Uddin, Md. Zia (Department of Informatics, University of Oslo)
Kim, Jaehyoun (Department of Computer Education, Sungkyunkwan University)
Publication Information
KSII Transactions on Internet and Information Systems (TIIS) / v.11, no.2, 2017 , pp. 1118-1133 More about this Journal
Abstract
Computer vision-based human activity recognition (HAR) has become very famous these days due to its applications in various fields such as smart home healthcare for elderly people. A video-based activity recognition system basically has many goals such as to react based on people's behavior that allows the systems to proactively assist them with their tasks. A novel approach is proposed in this work for depth video based human activity recognition using joint-based motion features of depth body shapes and Deep Belief Network (DBN). From depth video, different body parts of human activities are segmented first by means of a trained random forest. The motion features representing the magnitude and direction of each joint in next frame are extracted. Finally, the features are applied for training a DBN to be used for recognition later. The proposed HAR approach showed superior performance over conventional approaches on private and public datasets, indicating a prominent approach for practical applications in smartly controlled environments.
Keywords
Depth video; 3-D body joints; Deep Belief Network (DBN);
Citations & Related Records
연도 인용수 순위
  • Reference
1 A. Jalal ., M.Z. Uddin, J.T. Kim, and T.S. Kim, "Recognition of human home activities via depth silhouettes and R transformation for smart homes," Indoor and Built Environment, vol. 21, no. 1, pp. 184-190, 2011.   DOI
2 M. Z. Uddin and T.-S. Kim, "Independent Shape Component-based Human Activity Recognition via Hidden Markov Model," Applied Intelligence, vol. 2, pp. 193-206, 2010.
3 N. Robertson and I. Reid, "A General Method for Human Activity Recognition in Video," Computer Vision and Image Understanding, Vol. 104, No. 2. pp. 232-248, 2006.   DOI
4 H. Kang, C. W. Lee, and K. Jung, "Recognition-based gesture spotting in video games," Pattern Recognition Letters, Vol. 25, pp. 1701-1714, 2004.   DOI
5 F.S. Chen, C.M. Fu, and C.L. Huang, "Hand gesture recognition using a real-time tracking method and Hidden Markov Models," Image and Vision Computing, vol. 21. pp.745-758, 2005.
6 J. Yamato, J. Ohya, and K. Ishii, "Recognizing human action in time-sequential images using hidden markov model," in Proc. of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 379-385, 1992.
7 F. Niu and M. Abdel-Mottaleb, "View-invariant human activity recognition based on shape and motion Features," in Proc. of IEEE Sixth International Symposium on Multimedia Software Engineering, pp. 546-556, 2004.
8 F. Niu and M. Abdel-Mottaleb, "HMM-based segmentation and recognition of human activities from video sequences," in Proc. of IEEE International Conference on Multimedia & Expo. , pp. 804-807, 2005.
9 S.-S. Cho, A-Reum Lee, H.-I. Suk, J.-S. Park, and S.-W. Lee, "Volumetric spatial feature representation for view invariant human action recognition using a depth camera," Optical Engineering, vol. 54(3), no. 033102, pp. 1-8, 2015.
10 M. E. Hussein, M. Torki, M. A. Gowayyed, and M. ElSaban, "Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations," in Proc. of International Joint Conference on Artificial Intelligence (IJCAI), pp. 2466-2472, 2013.
11 P. Dreuw, H. Ney, G. Martinez, O. Crasborn, J. Piater, J.M. Moya, and M. Wheatley, "The signspeak project - bridging the gap between signers and speakers," in Proc. of International Conference on Language Resources and Evaluation, pp. 476-481, 2010.
12 L. Zhou, W. Li, Y. Zhang, P. Ogunbona, D. T. Nguyen, and H. Zhang, "Discriminative key pose extraction using extended lc-ksvd for action recognition," in Proc. of International Conference on Digital Image Computing: Techniques and Applications (DICTA), pp. 1-8. IEEE, 2014.
13 X. Yang, C. Zhang, and Y. Tian, "Recognizing actions using depth motion mapsbased histograms of oriented gradients," in Proc. of ACM International Conference on Multimedia, pp. 1057-1060, 2012.
14 H.S. Koppula, R. Gupta, and A. Saxena, "Learning human activities and object affordances from RGB-D videos," International Journal of Robotics Research, vol. 32, no. 8, pp. 951-970, 2013.   DOI
15 J. Sung, C. Ponce, B. Selman, and A. Saxena, "Unstructured human activity detection from rgbd images," in Proc. of IEEE International Conference on Robotics and Automation, pp. 842-849, 2012.
16 H. Hamer, K. Schindler, E. Koller-Meier, and L. Van Gool, "Tracking a hand manipulating an object," in Proc. of IEEE International Conference on Computer Vision, pp. 1475-1482, 2009.
17 M.Z. Uddin, N.D. Thang, and T.S. Kim, "Human Activity Recognition Using Body Joint Angle Features and Hidden Markov Model," ETRI Journal, pp. 569-579, 2011.
18 S. Izadi, "KinectFusion: Real-time 3D Reconstruction and Interaction Using a Moving Depth Camera," in Proc. of ACM User Interface and Software Technologies, pp. 559-568, 2011.
19 G. Hinton, L. Deng, D. Yu, G.E. Dahl, A.R. Mohamed, N. Jaitly, A. Vanhoucke, Nguyen, P., T.N. Sainath, and B. Kingsbury, "Deep neural networks for acoustic modeling in speech recognition," IEEE Signal Processing Magazine, vol. 29, no. 6, pp.82-97, 2012.   DOI
20 G. E. Hinton, S. Osindero, Y. The, "A fast learning algorithm for deep belief nets," Neural Computation, vol. 18, pp. 1527-1554, 2006.   DOI
21 P. S. Aleksic and A. K. Katsaggelos, "Automatic facial expression recognition using facial animation parameters and multistream HMMs," IEEE Transaction on Information and Security, vol. 1, pp. 3-11, 2006.   DOI
22 P. Simari, D. Nowrouzezahrai, E. Kalogerakis, and K. Singh, "Multi-objective shape segmentation and labeling," in Proc. of Eurographics Symposium on Geometry Processing, Vol. 28, pp. 1415-1425, 2009.
23 V. Ferrari, M.-M. Jimenez, and A. Zisserman, "2D Human Pose Estimation in TV Shows," Visual Motion Analysis, LNCS 2009, Vol. 5604, pp. 128-147, 2009.
24 H. Lu, K. N. Plataniotis, and A. N. Venetsanopoulos, "A Full-Body Layered Deformable Model for Automatic Model-Based Gait Recognition," EURASIP Journal on Advances in Signal Processing, Vol. 1, pp. 1-13, 2008.
25 J. Wright and G. Hua, "Implicit Elastic Matching with Random Projections for Pose-Variant face recognition," in Proc. of IEEE conf. on Computer Vision and Pattern Recognition, pp. 1502-1509, 2009.
26 A. Bosch, A. Zisserman, and X. Munoz, "Image classification using random forests and ferns," in Proc. of IEEE Int. Conf. on Computer Vision , pp. 1-8, 2007.
27 M. Z. Uddin and M. M. Hassan, "A Depth Video-Based Facial Expression Recognition System Using Radon Transform, Generalized Discriminant Analysis, and Hidden Markov Model," Multimedia Tools And Applications, Vol. 74, No. 11, pp. 3675-3690, 2015.   DOI
28 W Li., Z. Zhang, and. Z. Liu, "Action recognition based on a bag of 3d points," in Proc. of Workshop on Human Activity Understanding from 3D Data, pp. 9-14, 2010.
29 O. Oreifej and Z. Liu, "Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 716-723, 2013.
30 Y. m. Song, S. Noh, J. Yu, C. w. Park, B. g. Lee, "Background subtraction based on Gaussian mixture models using color and depth information," in Proc. of International Conference on Control, Automation and Information Sciences (ICCAIS), pp. 132-135, 2014.
31 J. Wang, Z. Liu, Y. Wu, and J. Yuan, "Mining actionlet ensemble for action recognition with depth cameras," in Proc. of 2012 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, pp. 1290-1297, 2012.
32 S. Fothergill, H. Mentis, P. Kohli, and S. Nowozin, "Instructing people for training gestural interactive systems," in Proc. of ACM Conference on Human Factors in Computing Systems, pp. 1737-1746, 2012.
33 J. Wang, Z. Liu, Y. Wu , and Junsong Yuan, "Mining actionlet ensemble for action recognition with depth cameras," in Proc. of IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, pp. 1290-1297, IEEE, Providence, 2012.
34 S. Yang, C. Yuan, W. Hu, and X. Ding, "A hierarchical model based on latent dirichlet allocation for action recognition," in Proc. of IEEE 22nd International Conference on In Pattern Recognition (ICPR), pp. 2613-2618. 2014.
35 P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie, "Behavior recognition via sparse spatio-temporal features," in Proc. of 2nd Joint IEEE Int. Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65-72, IEEE, Washington, 2005.
36 I. Laptev, R. Rennes, M. Marszalek, C. Schmid, and B. Rozenfeld, "Learning realistic human actions from movies," in Proc. of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1-8, IEEE, Anchorage, 2008.
37 X. Lu and J. Aggarwal, "Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera," in Proc. of IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, pp. 2834-2841, IEEE, Portland , 2013.