DOI QR코드

DOI QR Code

Robust 2D human upper-body pose estimation with fully convolutional network

  • Lee, Seunghee (Department of Civil and Environmental Engineering, Korean Advanced Institute for Science and Technology) ;
  • Koo, Jungmo (Department of Civil and Environmental Engineering, Korean Advanced Institute for Science and Technology) ;
  • Kim, Jinki (Department of Civil and Environmental Engineering, Korean Advanced Institute for Science and Technology) ;
  • Myung, Hyun (Department of Civil and Environmental Engineering, Korean Advanced Institute for Science and Technology)
  • Received : 2018.04.28
  • Accepted : 2018.05.11
  • Published : 2018.06.25

Abstract

With the increasing demand for the development of human pose estimation, such as human-computer interaction and human activity recognition, there have been numerous approaches to detect the 2D poses of people in images more efficiently. Despite many years of human pose estimation research, the estimation of human poses with images remains difficult to produce satisfactory results. In this study, we propose a robust 2D human body pose estimation method using an RGB camera sensor. Our pose estimation method is efficient and cost-effective since the use of RGB camera sensor is economically beneficial compared to more commonly used high-priced sensors. For the estimation of upper-body joint positions, semantic segmentation with a fully convolutional network was exploited. From acquired RGB images, joint heatmaps accurately estimate the coordinates of the location of each joint. The network architecture was designed to learn and detect the locations of joints via the sequential prediction processing method. Our proposed method was tested and validated for efficient estimation of the human upper-body pose. The obtained results reveal the potential of a simple RGB camera sensor for human pose estimation applications.

Keywords

Acknowledgement

Grant : Research on Adaptive Machine Learning Technology Development for Intelligent Autonomous Digital Companion

Supported by : Ministry of Trade, Industry, and Energy (MOTIE), IITP

References

  1. Aggarwal, J.K. and Park, S. (2004), "Human motion: Modeling and recognition of actions and interactions", Proceedings of the 2nd International Symposium on 3D Data Processing, Visualization and Transmission (3DPVT), Thessaloniki, Greece, September.
  2. Andriluka, M., Pishchulin, L., Gehler, P. and Schiele, B. (2014), "2D human pose estimation: New benchmark and state of the art analysis", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Washington, D.C., U.S.A., June.
  3. Asus (2015), Xtion Pro Live, .
  4. Chen, X. and Yuille, A.L. (2014), "Articulated pose estimation by a graphical model with image dependent pairwise relations", Proceedings of the Advances in Neural Information Processing Systems Conference (NIPS), Montreal, Canada, December.
  5. Chu, X., Ouyang, W., Li, H. and Wang, X. (2016), "Structured feature learning for pose estimation", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, Nevada, U.S.A.,
  6. Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E. and Darrell, T. (2014), "Decaf: A deep convolutional activation feature for generic visual recognition", Proceedings of the International Conference on Machine Learning (ICML), Beijing, China, June.
  7. Droeschel, D. and Behnke, S. (2011), "3D body pose estimation using an adaptive person model for articulated ICP", Proceedings of the International Conference on Intelligent Robotics and Applications (ICIRA), Aachen, Germany, December.
  8. Everingham, M., Eslami, S.A., Van Gool, L., Williams, C.K., Winn, J. and Zisserman, A. (2015), "The pascal visual object classes challenge: A retrospective", J. Comput. Vis., 111(1), 98-136. https://doi.org/10.1007/s11263-014-0733-5
  9. Ferrari, V., Marin-Jimenez, M., and Zisserman, A. (2008), "Progressive search space reduction for human pose estimation", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, Alaska, U.S.A., June.
  10. Fujiyoshi, H., Lipton, A.J. and Kanade, T. (2004), "Real-time human motion analysis by image skeletonization", IEICE Trans. Inf. Syst., 87(1), 113-120.
  11. Ganapathi, V., Plagemann, C., Koller, D. and Thrun, S. (2010), "Real time motion capture using a single time-of-flight camera", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, California, U.S.A., June.
  12. Guo, Y., Xu, G., and Tsuji, S. (1994), "Tracking human body motion based on a stick figure model", J. Vis. Commun. Image R., 5(1), 1-9. https://doi.org/10.1006/jvci.1994.1001
  13. Haritaogalu, I. (1998), "W4S: A real-time system for detecting and tracking people in 2 1/2-D", Proceedings of the 5th European Conference on Computer Vision (ECCV), Freiburg, Germany, June.
  14. Hernandez-Vela, A., Zlateva, N., Marinov, A., Reyes, M., Radeva, P., Dimov, D. and Escalera, S. (2012), "Graph cuts optimization for multi-limb human segmentation in depth maps", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, Rhode Island, U.S.A., June.
  15. Jain, H.P., Subramanian, A., Das, S. and Mittal, A. (2011), "Real-time upper-body human pose estimation using a depth camera", Proceedings of the International Conference on Computer Vision/Computer Graphics Collaboration Techniques and Applications, Rocquencourt, France, October.
  16. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S. and Darrell, T. (2014), "Caffe: Convolutional architecture for fast feature embedding", Proceedings of the 22nd ACM International Conference on Multimedia, Orlando, Florida, U.S.A., November.
  17. Johnson, S., and Everingham, M. (2010), "Clustered pose and nonlinear appearance models for human pose estimation", Proceedings of the British Machine Vision Conference (BMVC).
  18. Kim, D.H., Park, G.M., Yoo, Y.H., Ryu, S.J., Jeong, I.B. and Kim, J.H. (2017), "Realization of task intelligence for service robots in an unstructured environment", Ann. Rev. Control, 44, 9-18. https://doi.org/10.1016/j.arcontrol.2017.09.013
  19. Kim, H., Lee, S., Kim, Y., Lee, D., Ju, J. and Myung, H. (2015), "Human pose estimation algorithm for lowcost computing platform using depth information only", Proceedings of the International Conference on Robot Intelligence Technology and Applications (RiTA), Bucheon, Korea, December.
  20. Kim, H., Lee, S., Kim, Y., Lee, S., Lee, D., Ju, J. and Myung, H. (2016), "Weighted joint-based human behavior recognition algorithm using only depth information for low-cost intelligent video-surveillance system", Expert Syst. Appl., 45, 131-141. https://doi.org/10.1016/j.eswa.2015.09.035
  21. Kim, H., Lee, S., Lee, D., Choi, S., Ju, J. and Myung, H. (2015), "Real-time human pose estimation and gesture recognition from depth images using superpixels and SVM classifier", Sensors, 15(6), 12410-12427. https://doi.org/10.3390/s150612410
  22. Kingma, D.P. and Ba, J. (2015), Adam: A Method for Stochastic Optimization, arXiv preprint arXiv:1412.6980.
  23. Krizhevsky, A., Sutskever, I. and Hinton, G.E. (2012), "Imagenet classification with deep convolutional neural networks", Proceedings of the Advances in Neural Information Processing Systems (NIPS), Lake Tahoe, Nevada, U.S.A., December
  24. Ladicky, L., Torr, P.H. and Zisserman, A. (2013), "Human pose estimation using a joint pixel-wise and partwise formulation", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, Oregon, U.S.A., June.
  25. Lee, S., Kim, H., Lee, S., Kim, Y., Lee, D., Ju, J. and Myung, H. (2014), "Detection of a suicide by hanging based on a 3-D image analysis", IEEE Sens. J., 14(9), 2934-2935. https://doi.org/10.1109/JSEN.2014.2332070
  26. Lee, S., Koo, J., Kim, H., Jung, K. and Myung, H. (2017), "A robust estimation of 2D human upper-body poses using fully convolutional network", Proceedings of the International Conference on Robot Intelligence Technology and Applications (RiTA), Daejeon, Korea, December.
  27. Long, J., Shelhamer, E. and Darrell, T. (2015), "Fully convolutional networks for semantic segmentation", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, Massachusetts, U.S.A., June.
  28. Moeslund, T.B., Hilton, A., and Kruger, V. (2006), "A survey of advances in vision-based human motion capture and analysis", Comput. Vis. Image Und., 104(2-3), 90-126. https://doi.org/10.1016/j.cviu.2006.08.002
  29. NVIDIA (2018a), GTX 1080, .
  30. NVIDIA (2018b), DIGITS, .
  31. Ohya, J. and Kishino, F. (1994), "Human posture estimation from multiple images using genetic algorithm", Proceedings of the 12th IAPR International Conference on Pattern Recognition, Jerusalem, Israel, October.
  32. Plagemann, C., Ganapathi, V., Koller, D. and Thrun, S. (2010), "Real-time identification and localization of body parts from depth images", Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Anchorage, Alaska, U.S.A., May.
  33. Presti, L.L. and La Cascia, M. (2016), "3D skeleton-based human action classification: A survey", Pattern Recogn., 53, 130-147. https://doi.org/10.1016/j.patcog.2015.11.019
  34. Ramanan, D. (2007), "Learning to parse images of articulated bodies", Proceedings of the Advances in Neural Information Processing Systems (NIPS), Vancouver, British Columbia, Canada.
  35. Roweis, S.T. and Saul, L.K. (2000), "Nonlinear dimensionality reduction by locally linear embedding", Science, 290(5500), 2323-2326. https://doi.org/10.1126/science.290.5500.2323
  36. Sapp, B. and Taskar, B. (2013), "Modec: Multimodal decomposable models for human pose estimation", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, Oregon, U.S.A., June.
  37. Schwarz, L.A., Mkhitaryan, A., Mateus, D. and Navab, N. (2012), "Human skeleton tracking from depth data using geodesic distances and optical flow", Image Vis. Comput., 30(3), 217-226. https://doi.org/10.1016/j.imavis.2011.12.001
  38. Shotton, J., Girshick, R., Fitzgibbon, A., Sharp, T., Cook, M., Finocchio, M., Moore, R., Kohli, P., Criminisi, A., Kipman, A. and Blake, A. (2013), "Efficient human pose estimation from single depth images", IEEE Trans. Pattern Anal. Mach. Intell., 35(12), 2821-2840. https://doi.org/10.1109/TPAMI.2012.241
  39. Shotton, J., Sharp, T., Kipman, A., Fitzgibbon, A., Finocchio, M., Blake, A., Cook, M. and Moore, R. (2013), "Real-time human pose recognition in parts from single depth images", Commun. ACM, 56(1), 116-124. https://doi.org/10.1145/2398356.2398381
  40. Simonyan, K. and Zisserman, A. (2014), Very Deep Convolutional Networks for Large-Scale Image Recognition, arXiv preprint arXiv:1409.1556.
  41. Straka, M., Hauswiesner, S., Ruther, M. and Bischof, H. (2011), "Skeletal graph based human pose estimation in real-time", Proceedings of the British Machine Vision Conference (BMVC), Dundee, Scotland, U.K., August-September.
  42. Takahashi, K., Uemura, T. and Ohya, J. (2000), "Neural-network-based real-time human body posture estimation", Proceedings of the 2000 IEEE Signal Processing Society Workshop, Lafayette, Louisiana U.S.A.
  43. Tompson, J., Goroshin, R., Jain, A., LeCun, Y. and Bregler, C. (2015), "Efficient object localization using convolutional networks", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, Massachusetts, U.S.A., June.
  44. Tompson, J.J., Jain, A., LeCun, Y. and Bregler, C. (2014), "Joint training of a convolutional network and a graphical model for human pose estimation", Proceedings of the Advances in Neural Information Processing Systems (NIPS), Montreal, Canada, December.
  45. Toshev, A. and Szegedy, C. (2014), "DeepPose: Human pose estimation via deep neural networks", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, Ohio, U.S.A., June.
  46. Wei, S.E., Ramakrishna, V., Kanade, T. and Sheikh, Y. (2016), "Convolutional pose machines", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, Nevada, U.S.A., June-July.
  47. Xia, F., Wang, P., Chen, X. and Yuille, A. (2017), Joint Multi-Person Pose Estimation and Semantic Part Segmentation, arXiv preprint arXiv:1708.03383.
  48. Yang, W., Ouyang, W., Li, H. and Wang, X. (2016), "End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, Nevada, U.S.A., June-July.
  49. Zhang, Z., Seah, H.S., Quah, C.K. and Sun, J. (2013), "GPU-accelerated real-time tracking of full-body motion with multi-layer search", IEEE Trans. Multimedia, 15(1), 106-119. https://doi.org/10.1109/TMM.2012.2225040