DOI QR코드

DOI QR Code

Human Action Recognition Based on 3D Human Modeling and Cyclic HMMs

  • Ke, Shian-Ru (Department of Electrical Engineering, University of Washington) ;
  • Thuc, Hoang Le Uyen (Department of Electronic and Telecommunication Engineering, Danang University of Technology) ;
  • Hwang, Jenq-Neng (Department of Electrical Engineering, University of Washington) ;
  • Yoo, Jang-Hee (SW.Content Research Laboratory, ETRI) ;
  • Choi, Kyoung-Ho (Department of Information and Electronics, Mokpo National University)
  • Received : 2013.07.02
  • Accepted : 2013.12.03
  • Published : 2014.08.01

Abstract

Human action recognition is used in areas such as surveillance, entertainment, and healthcare. This paper proposes a system to recognize both single and continuous human actions from monocular video sequences, based on 3D human modeling and cyclic hidden Markov models (CHMMs). First, for each frame in a monocular video sequence, the 3D coordinates of joints belonging to a human object, through actions of multiple cycles, are extracted using 3D human modeling techniques. The 3D coordinates are then converted into a set of geometrical relational features (GRFs) for dimensionality reduction and discrimination increase. For further dimensionality reduction, k-means clustering is applied to the GRFs to generate clustered feature vectors. These vectors are used to train CHMMs separately for different types of actions, based on the Baum-Welch re-estimation algorithm. For recognition of continuous actions that are concatenated from several distinct types of actions, a designed graphical model is used to systematically concatenate different separately trained CHMMs. The experimental results show the effective performance of our proposed system in both single and continuous action recognition problems.

Keywords

References

  1. M. Blank et al., "Actions as Space-Time Shapes," IEEE Int. Conf. Comput. Vis., Beijing, China, vol. 2, 2005, pp. 1395-1402.
  2. Y. Ke, R. Sukthankar, and M. Hebert, "Spatio-Temporal Shape and Flow Correlation for Action Recognition," IEEE CVPR, 2007.
  3. W. Kim, C. Jung, and C. Kim, "Spatiotemporal Saliency Detection and its Applications in Static and Dynamic Scenes," IEEE Trans. Circuits Syst. Video Technol., vol. 21, no. 4, Apr. 2011, pp. 446-456. https://doi.org/10.1109/TCSVT.2011.2125450
  4. I. Laptev, "On Space-Time Interest Points," Int. J. Comput. Vis., vol. 64, no. 2-3, Sept. 2005, pp. 107-123. https://doi.org/10.1007/s11263-005-1838-7
  5. I. Laptev et al., "Learning Realistic Human Actions from Movies," Proc. IEEE CVPR, Anchorage, AK, USA, June 23-28, 2008, pp. 1-8.
  6. S. Kumari and S.K. Mitra, "Human Action Recognition Using DFT," Proc. NCVPRIPG, Dec. 15-17, 2011, pp. 239-242.
  7. D.G. Lowe, "Object Recognition from Local Scale-Invariant Features," IEEE Int. Conf. Comput. Vis., Kerkira, Greece, vol. 2, 1999, pp. 1150-1157.
  8. D.G. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints," Int. J. Comput. Vis., vol. 60, no. 2, 2004, pp. 91-110. https://doi.org/10.1023/B:VISI.0000029664.99615.94
  9. N. Dalal and B. Triggs, "Histograms of Oriented Gradients for Human Detection," IEEE CVPR, San Diego, CA, USA, vol. 1, June 25, 2005, pp. 886-893.
  10. C. Lin, F. Hsu, and W. Lin, "Recognizing Human Actions Using NWFE-Based Histogram Vectors," EURASIP J. Advances Signal Proc., vol. 2010, no. 9, Feb. 2010.
  11. B.D. Lucas and T. Kanade, "An Iterative Image Registration Technique with an Application to Stereo Vision," Int. Joint Conf. Artif. Intell., Vancouver, Canada, 1981, pp. 674-679.
  12. J. Shi and C. Tomasi, "Good Features to Track," IEEE CVPR, Seattle, WA, USA, June 1994, pp. 593-600.
  13. P. Scovanner, S. Ali, and M. Shah, "A 3-Dimensional SIFT Descriptor and its Application to Action Recognition," Proc. Int. Conf. Multimedia, Augsburg, Germany, 2007, pp. 357-360.
  14. W. Lu and J.J. Little, "Simultaneous Tracking and Action Recognition Using the PCA-HOG Descriptor," Canadian Conf. Comput. Robot Vis., June 7-9, 2006, p. 6.
  15. H. Kataoka and Y. Aoki, "Symmetrical Judgment and Improvement of CoHOG Feature Descriptor for Pedestrian Detection," IAPR Conf. Mach. Vis. Appl., Nara, Japan, June 13-15, 2011, pp. 536-539.
  16. X. Lu, Q. Liu, and S. Oe, "Recognizing Non-rigid Human Actions Using Joints Tracking in Space-Time," Int. Conf. ITCC, Las Vegas, NW, USA, vol. 1, Apr. 5-7, 2004, pp. 620-624.
  17. D. Weinland, E. Boyer, and R. Ronfard, "Action Recognition from Arbitrary Views Using 3D Exemplars," IEEE Int. Conf. Comput. Vis., Rio de Janeiro, Brazil, Oct. 14-21, 2007, pp.1-7.
  18. I.N. Junejo et al., "View-Independent Action Recognition from Temporal Self-similarities," IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 1, Jan. 2011, pp. 172-185. https://doi.org/10.1109/TPAMI.2010.68
  19. G. Rogez, J.J. Guerrero, and C. Orrite, "View-Invariant Human Feature Extraction for Video-Surveillance Applications," IEEE Conf. AVSS, London, UK, Sept. 5-7, 2007, pp. 324-329.
  20. M. Lee and R. Nevatia, "Body Part Detection for Human Pose Estimation and Tracking," IEEE Workshop Motion Video Comput., Austin, TX, USA, Feb. 2007.
  21. M. Lee and R. Nevatia, "Human Pose Tracking in Monocular Sequence Using Multilevel Structured Models," IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 1, Jan. 2009, pp. 27-38. https://doi.org/10.1109/TPAMI.2008.35
  22. W. Gilks, S. Richardson, and D. Spiegelhalter, Markov Chain Monte Carlo in Practice, London, UK: Chapman and Hall, 1996.
  23. S. Zhu, R. Zhang, and Z. Tu, "Integrating Bottom-up/Top-down for Object Recognition by Data Driven Markov Chain Monte Carlo," IEEE CVPR, Hilton Head Island, SC, USA, vol. 1, 2000, pp. 738-745.
  24. S. Ke et al., "Real-Time 3D Human Pose Estimation from Monocular View with Applications to Event Detection and Video Gaming," IEEE Int. Conf. AVSS, Boston, MA, USA, Aug. 29-Sept. 1, 2010, pp. 489-496.
  25. S. Ke et al., "View-Invariant 3D Human Body Pose Reconstruction Using a Monocular Video Camera," ACM/IEEE ICDSC, Ghent, Belgium, Aug. 22-25, 2011, pp. 1-6.
  26. L.W. Campbell et al., "Invariant Features for 3-D Gesture Recognition," Proc. Int. Conf. Autom. Face Gesture Recogn., Killington, VT, USA, Oct. 14-16, 1996, pp. 157-162.
  27. M. Muller, T. Roder, and M. Clausen, "Efficient Content-Based Retrieval of Motion Capture Data," ACM SIGGRAPH, Los Angeles, CA, USA, vol. 24, no. 3, July 2005, pp. 677-685.
  28. H. Thuc et al., "Human Action Recognition Based on 3D Body Modeling from Monocular Videos," Frontiers Comput. Vis. Workshop, Kawasaki, Japan, Feb. 2-4, 2012, pp. 6-13.
  29. H. Thuc, P. Tuan, and J. Hwang, "An Effective 3D Geometric Relational Feature Descriptor for Human Action Recognition," IEEE Int. Conf. Comput. Commun. Tech. RIVF, Ho Chi Minh City, Vietnam, Feb. 27-Mar. 1, 2012, pp. 1-6.
  30. L. Rabiner and B. Juang, Fundamentals of Speech Recognition, 1st ed., Upper Saddle River, NJ: Prentice Hall, 1993.
  31. S. Sempena, N.U. Maulidevi, and P.R. Aryan, "Human Action Recognition Using Dynamic Time Warping," Int. Conf. Electr. Eng. Informat., Bandung, Indonesia, July 17-19, 2011, pp. 1-5.
  32. V.N. Vapnik, Statistical Learning Theory, New York: Wiley, 1998.
  33. V.N. Vapnik, S.E. Golowich, and A.J. Smola, "Support Vector Method for Function Approximation, Regression Estimation and Signal Processing," In M. C. Mozer, M. I. Jordan, and T. Petsche editors, Advances in Neural Information Processing Systems 9, Cambridge, MA: MIT Press, 1997.
  34. C. Schuldt, I. Laptev, and B. Caputo, "Recognizing Human Actions: A Local SVM Approach," Proc. IEEE Int. Conf. Pattern Recogn., Cambridge, UK, vol. 3, Aug. 23-26, 2004, pp. 32-36.
  35. L. Rabiner and B. Juang, "An Introduction to Hidden Markov Models," IEEE ASSP Mag., vol. 3, no. 1, Jan. 1986, pp. 4-16. https://doi.org/10.1109/MASSP.1986.1165351
  36. J. Yamato, J. Ohya, and K. Ishii, "Recognizing Human Action in Time-Sequential Images Using Hidden Markov Model," IEEE CVPR, Champaign, IL, USA, June 15-18, 1992, pp. 379-385.
  37. M. Umeda, "Recognition of Multi-font Printed Chinese Characters," Proc. IEEE CVPR, Las Vegas, NV, USA, 1982, pp. 793-796.
  38. H. Thuc et al, "Quasi-Periodic Action Recognition from Monocular Videos via 3D Human Models and Cyclic HMMs," Int. Conf. ATC, Hanoi, Vietnam, Oct. 10-12, 2012, pp. 110-113.
  39. J.A. Hartigan and M.A. Wong, "Algorithm AS 136: A K-Means Clustering Algorithm," Appl. Statistics, New York: Wiley, 1979, pp. 100-108.
  40. J. Bilmes and G. Zweig, "The Graphical Models Toolkit: An Open Source Software System for Speech and Time-Series Processing," IEEE ICASSP, Orlando, FL, USA, vol. 4, May 13-17, 2002, pp. 3916-3919.
  41. T. Horprasert, D. Harwood, and L.S. Davis, "A Statistical Approach for Real-Time Robust Background Subtraction and Shadow Detection," IEEE ICCV, Frame-Rate Workshop, Greece, Sept. 1999, pp. 1-19.
  42. S.A. Al-Shehri, "A Simple and Novel Method for Skin Detection and Face Locating and Tracking," Proc. APCHI, Rotorua, New Zealand, 2004, pp. 1-8.
  43. J. Canny, "A Computational Approach to Edge Detection," IEEE Trans. Pattern Anal. Mach. Intell., vol. 8, no. 6, Nov. 1986, pp. 679-698.
  44. D. Comaniciu and P. Meer, "Mean Shift: A Robust Approach toward Feature Space Analysis," IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 5, May 2002, pp. 603-619. https://doi.org/10.1109/34.1000236
  45. D. Comaniciu, V. Ramesh, and P. Meer, "Kernel-Based Object Tracking," IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 5, May 2003, pp. 564-577. https://doi.org/10.1109/TPAMI.2003.1195991
  46. G. Welch and G. Bishop, "An Introduction to the Kalman Filter," Technical Report TR 95-041, Department of Computer Science, University of North Carolina at Chapel Hill, 1995.
  47. W. Press et al., "Numerical Recipes in C++: The Art of Scientific Computing," Pearson Education, 1992.
  48. R.A. Wagner and M.J. Fischer, "The String-to-String Correction Problem," J. ACM, vol. 21, no. 1, Jan. 1974, pp. 168-173. https://doi.org/10.1145/321796.321811
  49. G. Rogez, C. Orrite, and J. Martínez, "A Spatio-Temporal 2DModels Framework for Human Pose Recovery in Monocular Sequences," Pattern Recogn., vol. 41, no. 9, Sept. 2008, pp. 2926-2944. https://doi.org/10.1016/j.patcog.2008.02.012
  50. G. Rogez et al., "Randomized Trees for Human Pose Detection," IEEE CPVR, Anchorage, AK, USA, 2008, pp. 1-8.
  51. L. Sigal, A. Balan, and M.J. Black, "HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion," Int. J. Comput. Vis., vol. 87, no. 1-2, Mar. 2010, pp. 4-27. https://doi.org/10.1007/s11263-009-0273-6

Cited by

  1. Dual-Phase Approach to Improve Prediction of Heart Disease in Mobile Environment vol.37, pp.2, 2014, https://doi.org/10.4218/etrij.15.2314.0103
  2. Skin Condition Estimation Using Mobile Handheld Camera vol.38, pp.4, 2014, https://doi.org/10.4218/etrij.16.0115.0942
  3. An Online Continuous Human Action Recognition Algorithm Based on the Kinect Sensor vol.16, pp.2, 2014, https://doi.org/10.3390/s16020161
  4. Motion-Vector Refinement for Video Error Concealment Using Downhill Simplex Approach vol.40, pp.2, 2018, https://doi.org/10.4218/etrij.2017-0078