DOI QR코드

DOI QR Code

2D Human Pose Estimation based on Object Detection using RGB-D information

  • Park, Seohee (Department of Computer Science, Kyonggi University) ;
  • Ji, Myunggeun (Department of Computer Science, Kyonggi University) ;
  • Chun, Junchul (Department of Computer Science, Kyonggi University)
  • Received : 2017.10.21
  • Accepted : 2017.12.13
  • Published : 2018.02.28

Abstract

In recent years, video surveillance research has been able to recognize various behaviors of pedestrians and analyze the overall situation of objects by combining image analysis technology and deep learning method. Human Activity Recognition (HAR), which is important issue in video surveillance research, is a field to detect abnormal behavior of pedestrians in CCTV environment. In order to recognize human behavior, it is necessary to detect the human in the image and to estimate the pose from the detected human. In this paper, we propose a novel approach for 2D Human Pose Estimation based on object detection using RGB-D information. By adding depth information to the RGB information that has some limitation in detecting object due to lack of topological information, we can improve the detecting accuracy. Subsequently, the rescaled region of the detected object is applied to ConVol.utional Pose Machines (CPM) which is a sequential prediction structure based on ConVol.utional Neural Network. We utilize CPM to generate belief maps to predict the positions of keypoint representing human body parts and to estimate human pose by detecting 14 key body points. From the experimental results, we can prove that the proposed method detects target objects robustly in occlusion. It is also possible to perform 2D human pose estimation by providing an accurately detected region as an input of the CPM. As for the future work, we will estimate the 3D human pose by mapping the 2D coordinate information on the body part onto the 3D space. Consequently, we can provide useful human behavior information in the research of HAR.

Keywords

References

  1. Grant, Jason M., and Patrick J. Flynn, "Crowd Scene Understanding from Video: A Survey," ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Vol.. 13, No. 2, pp. 19, 2017.
  2. Paul, Manoranjan, Shah ME Haque, and Subrata Chakraborty, "Human detection in surveillance videos and its applications-a review," EURASIP Journal on Advances in Signal Processing, Vol. 176, No. 1, pp.1-16, 2013.
  3. San, Phyo P., et al, "DEEP LEARNING FOR HUMAN ACTIVITY RECOGNITION," 2017.
  4. Gong, Wenjuan, et al, "Human Pose Estimation from Monocular Images: A Comprehensive Survey," Sensors, Vol. 16, No. 12, pp. 1-39, 2016.
  5. Zhang, Shugang, et al, "Vision-Based Human Activity Recognition: A Review," Journal of Healthcare Engineering, Vol. 2017, pp. 1-31, 2017.
  6. Vrigkas, Michalis, Christophoros Nikou, and Ioannis A. Kakadiaris, "A review of human activity recognition methods," Frontiers in Robotics and AI, Vol. 2, article 28, 2015.
  7. Ramakrishna, Varun, et al., "Pose machines: Articulated pose estimation via inference machines," in Proc. of European Conference on Computer Vision, pp. 33-47, 2014.
  8. Wei, Shih-En, et al, "ConVol.utional pose machines," in Proc. of Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4724-4732, 2016.
  9. Andriluka, Mykhaylo, et al, "2d human pose estimation: New benchmark and state of the art analysis," in Proc. of Proceedings of the IEEE Conference on computer Vision and Pattern Recognition, pp. 3686-3693, 2014.
  10. Poppe, Ronald, "A survey on vision-based human action recognition," Image and vision computing, Vol. 28, No. 6, pp.976-990, 2010. https://doi.org/10.1016/j.imavis.2009.11.014
  11. Weinland, Daniel, Remi Ronfard, and Edmond Boyer, "A survey of vision-based methods for action representation, segmentation and recognition," Computer vision and image understanding, Vol. 115, No. 2, pp. 224-241, 2011. https://doi.org/10.1016/j.cviu.2010.10.002
  12. Aggarwal, Jake K., and Michael S. Ryoo, "Human activity analysis: A review," ACM Computing Surveys (CSUR), Vol. 43, No. 3, pp. 16, 2011.
  13. Chen, Lulu, Hong Wei, and James Ferryman, "A survey of human motion analysis using depth imagery," Pattern Recognition Letters, Vol. 34, No. 15 pp. 1995-2006, 2013. https://doi.org/10.1016/j.patrec.2013.02.006
  14. Presti, Liliana Lo, and Marco La Cascia, "3D skeleton-based human action classification: A survey," Pattern Recognition, Vol. 53, pp. 130-147, 2016. https://doi.org/10.1016/j.patcog.2015.11.019
  15. Chen, Wenzheng, et al, "Synthesizing training images for boosting human 3d pose estimation," in Proc. of 3D Vision (3DV), 2016 Fourth International Conference on. IEEE, pp. 479-488, 2016.
  16. Tome, Denis, Chris Russell, and Lourdes Agapito, "Lifting from the deep: ConVol.utional 3d pose estimation from a single image," arXiv preprint arXiv:1701.00295, 2017.
  17. Papandreou, George, et al, "Towards Accurate Multi-Person Pose Estimation in the Wild," arXiv preprint arXiv:1701.01779, 2017.
  18. Insafutdinov, Eldar, et al, "Articulated multi-person tracking in the wild," arXiv preprint arXiv:1612.01465, 2016.
  19. Cao, Zhe, et al., "Realtime multi-person 2d pose estimation using part affinity fields," arXiv preprint arXiv:1611.08050, 2016.
  20. OpenPose: A Real-Time Multi-Person Keypoint Detection and Multi-Threading C++ Library, 2017.
  21. Simon, Tomas, et al, "Hand Keypoint Detection in Single Images using Multiview Bootstrapping," arXiv preprint arXiv:1704.07809, 2017.
  22. Dimitrijevic, Miodrag, Vincent Lepetit, and Pascal Fua, "Human body pose detection using bayesian spatio-temporal templates," Computer vision and image understanding, Vol. 104, No. 2, pp. 127-139, 2006. https://doi.org/10.1016/j.cviu.2006.07.007
  23. Zivkovic, Zoran, "Improved adaptive Gaussian mixture model for background subtraction," in Proc. of Proceedings of the 17th International Conference on Pattern Recognition, pp. 28-31, 2004.
  24. Niblack, Wayne, "An introduction to digital image processing," Strandberg Publishing Company, 1985.
  25. Hirschmuller, Heiko, "Stereo processing by semiglobal matching and mutual information," IEEE Transactions on pattern analysis and machine intelligence, Vol. 30, No. 2, pp. 328-341, 2008. https://doi.org/10.1109/TPAMI.2007.1166
  26. Toshev, Alexander, and Christian Szegedy, "Deeppose: Human pose estimation via deep neural networks," in Proc. of Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653-1660, 2014.
  27. Lehrmann, Andreas M., Peter V. Gehler, and Sebastian Nowozin, "A non-parametric bayesian network prior of human pose," in Proc. of Proceedings of the IEEE International Conference on Computer Vision, pp. 1281-1288, 2013.
  28. Linna, Marko, Juho Kannala, and Esa Rahtu., "Real-time human pose estimation from video with conVol.utional neural networks," arXiv preprint arXiv:1609.07420, 2016.
  29. Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification with deep conVol.utional neural networks," Advances in neural information processing systems, pp. 1097-1105, 2012.
  30. Dahl, George E., Tara N. Sainath, and Geoffrey E. Hinton, "Improving deep neural networks for LVCSR using rectified linear units and dropout," in Proc. of Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, pp. 8609-8613, 2013.
  31. Belagiannis, Vasileios, and Andrew Zisserman, "Recurrent human pose estimation," in Proc. of Automatic Face & Gesture Recognition (FG 2017), 2017 12th IEEE International Conference on. IEEE, pp. 468-475, 2017.
  32. Chen, Ching-Hang, and Deva Ramanan, "3D Human Pose Estimation = 2D Pose Estimation + Matching," arXiv preprint arXiv:1612.06524, 2016.
  33. Rafi, Umer, et al, "An Efficient ConVol.utional Network for Human Pose Estimation," BMVC, Vol. 1, 2016.
  34. Bogo, Federica, et al, "Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image," in Proc. of European Conference on Computer Vision, Springer International Publishing, pp. 561-578, 2016.
  35. Lassner, Christoph, et al, "Unite the People: Closing the Loop Between 3D and 2D Human Representations," arXiv preprint arXiv:1701.02468, 2017.
  36. Martinez, Julieta, et al, "A simple yet effective baseline for 3d human pose estimation," arXiv preprint arXiv:1705.03098, 2017.
  37. Ramakrishna, Varun, Takeo Kanade, and Yaser Sheikh, "Reconstructing 3d human pose from 2d image landmarks," Computer Vision-ECCV 2012, pp.573-586, 2012.
  38. Loper, Matthew, et al, "SMPL: A skinned multi-person linear model," ACM Transactions on Graphics (TOG), Vol. 34, No. 6, pp.248:1-248:16, 2015.
  39. Mehta, Dushyant, et al, "VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera," arXiv preprint arXiv:1705.01583, 2017.
  40. Enzweiler, Markus, and Dariu M. Gavrila, "Monocular pedestrian detection: Survey and experiments," in Proc. of IEEE transactions on pattern analysis and machine intelligence, Vol. 31, No. 12, pp. 2179-2195, 2009. https://doi.org/10.1109/TPAMI.2008.260
  41. Hu, Weiming, et al, "A survey on visual surveillance of object motion and behaviors," IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), Vol. 34, No. 3, pp. 334-352, 2004. https://doi.org/10.1109/TSMCC.2004.829274
  42. Newell, Alejandro, Kaiyu Yang, and Jia Deng, "Stacked hourglass networks for human pose estimation," in Proc. of European Conference on Computer Vision, Springer International Publishing, pp. 483-499, 2016.
  43. Zivkovic, Zoran, and Ferdinand Van Der Heijden, "Efficient adaptive density estimation per image pixel for the task of background subtraction," Pattern recognition letters, Vol. 27, No. 7, pp. 773-780, 2006. https://doi.org/10.1016/j.patrec.2005.11.005
  44. Nair, Vinod, and Geoffrey E. Hinton, "Rectified linear units improve restricted boltzmann machines," in Proc. of Proceedings of the 27th international conference on machine learning (ICML-10), pp. 807-814, 2010.
  45. TensorFlow: An open-source software library for Machine Intelligence.
  46. Uddin, Md, and Jaehyoun Kim, "A Robust Approach for Human Activity Recognition Using 3-D Body Joint Motion Features with Deep Belief Network," KSII Transactions on Internet & Information Systems, Vol. 11, No.2, 2017.
  47. Uddin, Md, and Jaehyoun Kim, "Human Activity Recognition Using Spatiotemporal 3-D Body Joint Features with Hidden Markov Models," KSII Transactions on Internet & Information Systems, Vol. 10, No.6, 2016.

Cited by

  1. Deep Window Detection in Street Scenes vol.14, pp.2, 2018, https://doi.org/10.3837/tiis.2020.02.022
  2. Real-time Human Pose Estimation using RGB-D images and Deep Learning vol.21, pp.3, 2020, https://doi.org/10.7472/jksii.2020.21.3.113