Browse > Article
http://dx.doi.org/10.4218/etrij.14.0113.0647

Human Action Recognition Based on 3D Human Modeling and Cyclic HMMs  

Ke, Shian-Ru (Department of Electrical Engineering, University of Washington)
Thuc, Hoang Le Uyen (Department of Electronic and Telecommunication Engineering, Danang University of Technology)
Hwang, Jenq-Neng (Department of Electrical Engineering, University of Washington)
Yoo, Jang-Hee (SW.Content Research Laboratory, ETRI)
Choi, Kyoung-Ho (Department of Information and Electronics, Mokpo National University)
Publication Information
ETRI Journal / v.36, no.4, 2014 , pp. 662-672 More about this Journal
Abstract
Human action recognition is used in areas such as surveillance, entertainment, and healthcare. This paper proposes a system to recognize both single and continuous human actions from monocular video sequences, based on 3D human modeling and cyclic hidden Markov models (CHMMs). First, for each frame in a monocular video sequence, the 3D coordinates of joints belonging to a human object, through actions of multiple cycles, are extracted using 3D human modeling techniques. The 3D coordinates are then converted into a set of geometrical relational features (GRFs) for dimensionality reduction and discrimination increase. For further dimensionality reduction, k-means clustering is applied to the GRFs to generate clustered feature vectors. These vectors are used to train CHMMs separately for different types of actions, based on the Baum-Welch re-estimation algorithm. For recognition of continuous actions that are concatenated from several distinct types of actions, a designed graphical model is used to systematically concatenate different separately trained CHMMs. The experimental results show the effective performance of our proposed system in both single and continuous action recognition problems.
Keywords
Human action recognition; 3D modeling; hidden Markov model; geometrical relational features;
Citations & Related Records
연도 인용수 순위
  • Reference
1 D.G. Lowe, "Object Recognition from Local Scale-Invariant Features," IEEE Int. Conf. Comput. Vis., Kerkira, Greece, vol. 2, 1999, pp. 1150-1157.
2 D.G. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints," Int. J. Comput. Vis., vol. 60, no. 2, 2004, pp. 91-110.   DOI   ScienceOn
3 N. Dalal and B. Triggs, "Histograms of Oriented Gradients for Human Detection," IEEE CVPR, San Diego, CA, USA, vol. 1, June 25, 2005, pp. 886-893.
4 C. Lin, F. Hsu, and W. Lin, "Recognizing Human Actions Using NWFE-Based Histogram Vectors," EURASIP J. Advances Signal Proc., vol. 2010, no. 9, Feb. 2010.
5 B.D. Lucas and T. Kanade, "An Iterative Image Registration Technique with an Application to Stereo Vision," Int. Joint Conf. Artif. Intell., Vancouver, Canada, 1981, pp. 674-679.
6 J. Shi and C. Tomasi, "Good Features to Track," IEEE CVPR, Seattle, WA, USA, June 1994, pp. 593-600.
7 P. Scovanner, S. Ali, and M. Shah, "A 3-Dimensional SIFT Descriptor and its Application to Action Recognition," Proc. Int. Conf. Multimedia, Augsburg, Germany, 2007, pp. 357-360.
8 W. Lu and J.J. Little, "Simultaneous Tracking and Action Recognition Using the PCA-HOG Descriptor," Canadian Conf. Comput. Robot Vis., June 7-9, 2006, p. 6.
9 H. Kataoka and Y. Aoki, "Symmetrical Judgment and Improvement of CoHOG Feature Descriptor for Pedestrian Detection," IAPR Conf. Mach. Vis. Appl., Nara, Japan, June 13-15, 2011, pp. 536-539.
10 G. Rogez, J.J. Guerrero, and C. Orrite, "View-Invariant Human Feature Extraction for Video-Surveillance Applications," IEEE Conf. AVSS, London, UK, Sept. 5-7, 2007, pp. 324-329.
11 X. Lu, Q. Liu, and S. Oe, "Recognizing Non-rigid Human Actions Using Joints Tracking in Space-Time," Int. Conf. ITCC, Las Vegas, NW, USA, vol. 1, Apr. 5-7, 2004, pp. 620-624.
12 D. Weinland, E. Boyer, and R. Ronfard, "Action Recognition from Arbitrary Views Using 3D Exemplars," IEEE Int. Conf. Comput. Vis., Rio de Janeiro, Brazil, Oct. 14-21, 2007, pp.1-7.
13 I.N. Junejo et al., "View-Independent Action Recognition from Temporal Self-similarities," IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 1, Jan. 2011, pp. 172-185.   DOI   ScienceOn
14 M. Lee and R. Nevatia, "Body Part Detection for Human Pose Estimation and Tracking," IEEE Workshop Motion Video Comput., Austin, TX, USA, Feb. 2007.
15 M. Lee and R. Nevatia, "Human Pose Tracking in Monocular Sequence Using Multilevel Structured Models," IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 1, Jan. 2009, pp. 27-38.   DOI   ScienceOn
16 W. Gilks, S. Richardson, and D. Spiegelhalter, Markov Chain Monte Carlo in Practice, London, UK: Chapman and Hall, 1996.
17 S. Zhu, R. Zhang, and Z. Tu, "Integrating Bottom-up/Top-down for Object Recognition by Data Driven Markov Chain Monte Carlo," IEEE CVPR, Hilton Head Island, SC, USA, vol. 1, 2000, pp. 738-745.
18 S. Ke et al., "Real-Time 3D Human Pose Estimation from Monocular View with Applications to Event Detection and Video Gaming," IEEE Int. Conf. AVSS, Boston, MA, USA, Aug. 29-Sept. 1, 2010, pp. 489-496.
19 S. Ke et al., "View-Invariant 3D Human Body Pose Reconstruction Using a Monocular Video Camera," ACM/IEEE ICDSC, Ghent, Belgium, Aug. 22-25, 2011, pp. 1-6.
20 L.W. Campbell et al., "Invariant Features for 3-D Gesture Recognition," Proc. Int. Conf. Autom. Face Gesture Recogn., Killington, VT, USA, Oct. 14-16, 1996, pp. 157-162.
21 M. Muller, T. Roder, and M. Clausen, "Efficient Content-Based Retrieval of Motion Capture Data," ACM SIGGRAPH, Los Angeles, CA, USA, vol. 24, no. 3, July 2005, pp. 677-685.
22 H. Thuc et al., "Human Action Recognition Based on 3D Body Modeling from Monocular Videos," Frontiers Comput. Vis. Workshop, Kawasaki, Japan, Feb. 2-4, 2012, pp. 6-13.
23 V.N. Vapnik, Statistical Learning Theory, New York: Wiley, 1998.
24 H. Thuc, P. Tuan, and J. Hwang, "An Effective 3D Geometric Relational Feature Descriptor for Human Action Recognition," IEEE Int. Conf. Comput. Commun. Tech. RIVF, Ho Chi Minh City, Vietnam, Feb. 27-Mar. 1, 2012, pp. 1-6.
25 C. Schuldt, I. Laptev, and B. Caputo, "Recognizing Human Actions: A Local SVM Approach," Proc. IEEE Int. Conf. Pattern Recogn., Cambridge, UK, vol. 3, Aug. 23-26, 2004, pp. 32-36.
26 L. Rabiner and B. Juang, Fundamentals of Speech Recognition, 1st ed., Upper Saddle River, NJ: Prentice Hall, 1993.
27 S. Sempena, N.U. Maulidevi, and P.R. Aryan, "Human Action Recognition Using Dynamic Time Warping," Int. Conf. Electr. Eng. Informat., Bandung, Indonesia, July 17-19, 2011, pp. 1-5.
28 V.N. Vapnik, S.E. Golowich, and A.J. Smola, "Support Vector Method for Function Approximation, Regression Estimation and Signal Processing," In M. C. Mozer, M. I. Jordan, and T. Petsche editors, Advances in Neural Information Processing Systems 9, Cambridge, MA: MIT Press, 1997.
29 L. Rabiner and B. Juang, "An Introduction to Hidden Markov Models," IEEE ASSP Mag., vol. 3, no. 1, Jan. 1986, pp. 4-16.   DOI   ScienceOn
30 J. Yamato, J. Ohya, and K. Ishii, "Recognizing Human Action in Time-Sequential Images Using Hidden Markov Model," IEEE CVPR, Champaign, IL, USA, June 15-18, 1992, pp. 379-385.
31 M. Umeda, "Recognition of Multi-font Printed Chinese Characters," Proc. IEEE CVPR, Las Vegas, NV, USA, 1982, pp. 793-796.
32 H. Thuc et al, "Quasi-Periodic Action Recognition from Monocular Videos via 3D Human Models and Cyclic HMMs," Int. Conf. ATC, Hanoi, Vietnam, Oct. 10-12, 2012, pp. 110-113.
33 J.A. Hartigan and M.A. Wong, "Algorithm AS 136: A K-Means Clustering Algorithm," Appl. Statistics, New York: Wiley, 1979, pp. 100-108.
34 J. Bilmes and G. Zweig, "The Graphical Models Toolkit: An Open Source Software System for Speech and Time-Series Processing," IEEE ICASSP, Orlando, FL, USA, vol. 4, May 13-17, 2002, pp. 3916-3919.
35 T. Horprasert, D. Harwood, and L.S. Davis, "A Statistical Approach for Real-Time Robust Background Subtraction and Shadow Detection," IEEE ICCV, Frame-Rate Workshop, Greece, Sept. 1999, pp. 1-19.
36 D. Comaniciu, V. Ramesh, and P. Meer, "Kernel-Based Object Tracking," IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 5, May 2003, pp. 564-577.   DOI   ScienceOn
37 G. Welch and G. Bishop, "An Introduction to the Kalman Filter," Technical Report TR 95-041, Department of Computer Science, University of North Carolina at Chapel Hill, 1995.
38 S.A. Al-Shehri, "A Simple and Novel Method for Skin Detection and Face Locating and Tracking," Proc. APCHI, Rotorua, New Zealand, 2004, pp. 1-8.
39 J. Canny, "A Computational Approach to Edge Detection," IEEE Trans. Pattern Anal. Mach. Intell., vol. 8, no. 6, Nov. 1986, pp. 679-698.
40 D. Comaniciu and P. Meer, "Mean Shift: A Robust Approach toward Feature Space Analysis," IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 5, May 2002, pp. 603-619.   DOI   ScienceOn
41 W. Press et al., "Numerical Recipes in C++: The Art of Scientific Computing," Pearson Education, 1992.
42 R.A. Wagner and M.J. Fischer, "The String-to-String Correction Problem," J. ACM, vol. 21, no. 1, Jan. 1974, pp. 168-173.   DOI   ScienceOn
43 G. Rogez, C. Orrite, and J. Martínez, "A Spatio-Temporal 2DModels Framework for Human Pose Recovery in Monocular Sequences," Pattern Recogn., vol. 41, no. 9, Sept. 2008, pp. 2926-2944.   DOI   ScienceOn
44 G. Rogez et al., "Randomized Trees for Human Pose Detection," IEEE CPVR, Anchorage, AK, USA, 2008, pp. 1-8.
45 L. Sigal, A. Balan, and M.J. Black, "HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion," Int. J. Comput. Vis., vol. 87, no. 1-2, Mar. 2010, pp. 4-27.   DOI   ScienceOn
46 W. Kim, C. Jung, and C. Kim, "Spatiotemporal Saliency Detection and its Applications in Static and Dynamic Scenes," IEEE Trans. Circuits Syst. Video Technol., vol. 21, no. 4, Apr. 2011, pp. 446-456.   DOI   ScienceOn
47 M. Blank et al., "Actions as Space-Time Shapes," IEEE Int. Conf. Comput. Vis., Beijing, China, vol. 2, 2005, pp. 1395-1402.
48 I. Laptev et al., "Learning Realistic Human Actions from Movies," Proc. IEEE CVPR, Anchorage, AK, USA, June 23-28, 2008, pp. 1-8.
49 Y. Ke, R. Sukthankar, and M. Hebert, "Spatio-Temporal Shape and Flow Correlation for Action Recognition," IEEE CVPR, 2007.
50 I. Laptev, "On Space-Time Interest Points," Int. J. Comput. Vis., vol. 64, no. 2-3, Sept. 2005, pp. 107-123.   DOI
51 S. Kumari and S.K. Mitra, "Human Action Recognition Using DFT," Proc. NCVPRIPG, Dec. 15-17, 2011, pp. 239-242.