Browse > Article
http://dx.doi.org/10.7472/jksii.2020.21.3.113

Real-time Human Pose Estimation using RGB-D images and Deep Learning  

Rim, Beanbonyka (Department of Computer Science, Soonchunhyang University)
Sung, Nak-Jun (Department of Computer Science, Soonchunhyang University)
Ma, Jun (Department of Computer Science, Soonchunhyang University)
Choi, Yoo-Joo (Department of Newmedia, Seoul Media Institute of Technology)
Hong, Min (Department of Computer Software Engineering, Soonchunhyang University)
Publication Information
Journal of Internet Computing and Services / v.21, no.3, 2020 , pp. 113-121 More about this Journal
Abstract
Human Pose Estimation (HPE) which localizes the human body joints becomes a high potential for high-level applications in the field of computer vision. The main challenges of HPE in real-time are occlusion, illumination change and diversity of pose appearance. The single RGB image is fed into HPE framework in order to reduce the computation cost by using depth-independent device such as a common camera, webcam, or phone cam. However, HPE based on the single RGB is not able to solve the above challenges due to inherent characteristics of color or texture. On the other hand, depth information which is fed into HPE framework and detects the human body parts in 3D coordinates can be usefully used to solve the above challenges. However, the depth information-based HPE requires the depth-dependent device which has space constraint and is cost consuming. Especially, the result of depth information-based HPE is less reliable due to the requirement of pose initialization and less stabilization of frame tracking. Therefore, this paper proposes a new method of HPE which is robust in estimating self-occlusion. There are many human parts which can be occluded by other body parts. However, this paper focuses only on head self-occlusion. The new method is a combination of the RGB image-based HPE framework and the depth information-based HPE framework. We evaluated the performance of the proposed method by COCO Object Keypoint Similarity library. By taking an advantage of RGB image-based HPE method and depth information-based HPE method, our HPE method based on RGB-D achieved the mAP of 0.903 and mAR of 0.938. It proved that our method outperforms the RGB-based HPE and the depth-based HPE.
Keywords
Human pose estimation; human skeleton tracking; keypoint localization; deep learning;
Citations & Related Records
Times Cited By KSCI : 6  (Citation Analysis)
연도 인용수 순위
1 S. Kim and Y. Choi, "Design of Authoring Tool for Dynamic Projection Mapping onto Multiple Bodies", Proceeding of International Conference on Internet, ICONI, pp. 197-199, 2018. https://design486.tistory.com/486
2 S. K. Kim, S. Kang, Y. Choi, M. Choi and M. Hong, "Augmented-Reality Survey: from Concept to Application," KSII Transactions on Internet and Information Systems, vol. 11, no. 2, pp. 982-1004, 2017. https://doi.org/10.3837/tiis.2017.02.019.   DOI
3 V. Ramakrishna, D. Munoz, M. Hebert, J. Bagnell and Y. Sheikh, "Pose machines: Articulated pose estimation via inference machines." Proceeding of European Conference on Computer Vision, pp. 33-47, Springer, Cham, 2014. https://doi.org/10.1007/978-3-319-10605-2_3
4 Y. Zhu, B. Dariush and K. Fujimura, "Controlled human pose estimation from depth image streams", Proceeding of IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 1-8, Anchorage, AK, 2008. https://doi.org/10.1109/CVPRW.2008.4563163
5 S. Wei, V. Ramakrishna, T. Kanade and Y. Sheikh, "Convolutional Pose Machines", Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, pp. 4724-4732, 2016. https://www.cv-foundation.org/openaccess/content_cvpr_2016/html/Wei_Convolutional_Pose_Machines_CVPR_2016_paper.html
6 Z. Cao, T. Simon, S. Wei and Y. Sheikh, "Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields", Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291-7299, 2017. http://openaccess.thecvf.com/content_cvpr_2017/html/Cao_Realtime_Multi-Person_2D_CVPR_2017_paper.html
7 J. Shotton, R. Girshick, A. Fitzgibbon, T. Sharp, M. Cook, M. Finocchio, R. Moore, P. kohli, A. Criminisi, A. Kipman and A. Blake, "Efficient Human Pose Estimation from Single Depth Images", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, No. 12, pp.2821-2240, 2013. https://doi.org/10.1109/TPAMI.2012.241   DOI
8 J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman and A. Blake, "Real-Time Human Pose Recognition in Parts from Single Depth Images", Proceeding of Computer Vision and Pattern Recognition Conference, pp. 1297-1304, Providence, RI, 2011. https://doi.org/10.1109/CVPR.2011.5995316
9 P. Kohli and J. Shotton, "Key Developments in Human Pose Estimation for Kinect." Consumer Depth Cameras for Computer Vision, pp.63-70, Springer, London, 2013. https://doi.org/10.1007/978-1-4471-4640-7_4
10 M. Andriluka, L. Pishchulin, P. Gehler and B. Schiele, "2D Human Pose Estimation: New Benchmark and State of the Art Analysis", Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3686-3693, Columbus, OH, 2014. https://doi.org/10.1109/CVPR.2014.471
11 Y. Nie, J. Lee, S. Yoon and D. S. Park, "A Multi-Stage Convolution Machine with Scaling and Dilation for Human Pose Estimation," KSII Transactions on Internet and Information Systems, vol. 13, no. 6, pp. 3182-3198, 2019. https://doi.org/10.3837/tiis.2019.06.023.   DOI
12 S. Park, M. Ji and J. Chun, "2D Human Pose Estimation based on Object Detection using RGB-D information," KSII Transactions on Internet and Information Systems, vol. 12, no. 2, pp. 800-816, 2018. https://doi.org/10.3837/tiis.2018.02.015.   DOI
13 H. Tang, Q. Wang and H. Chen, "Research on 3D Human Pose Estimation Using RGBD Camera," 2019 IEEE 9th International Conference on Electronics Information and Emergency Communication (ICEIEC), pp. 538-541, Beijing, China, 2019. https://doi.org/10.1109/ICEIEC.2019.8784591
14 S. Hong and Y. Kim, "Dynamic Pose Estimation Using Multiple RGB-D Cameras", Sensors, vol. 18(11), no. 3865, 2018. https://doi.org/10.3390/s18113865   DOI
15 J. Chun, S. Park and M. Ji, "3D Human Pose Estimation from RGB-D Images Using Deep Learning Method", SSIP 2018: Proceedings of the 2018 International Conference on Sensors, Signal and Image, Association of Computing Machinery, pp. 51-55, New York, USA, 2018. https://doi.org/10.1145/3290589.3290591
16 T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar and C. Zitnick, "Microsoft COCO: Common Objects in Context." Proceeding of European conference on computer vision, pp. 740-755. Springer, Cham, 2014. https://doi.org/10.1007/978-3-319-10602-1_48