[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.5909/JBE.2022.27.2.174

Multi-view Semi-supervised Learning-based 3D Human Pose Estimation

Kim, Do Yeop (Department of Electronics and Communications Engineering, Kwangwoon University)
Chang, Ju Yong (Department of Electronics and Communications Engineering, Kwangwoon University)

Publication Information

Journal of Broadcast Engineering / v.27, no.2, 2022 , pp. 174-184 More about this Journal

Abstract

3D human pose estimation models can be classified into a multi-view model and a single-view model. In general, the multi-view model shows superior pose estimation performance compared to the single-view model. In the case of the single-view model, the improvement of the 3D pose estimation performance requires a large amount of training data. However, it is not easy to obtain annotations for training 3D pose estimation models. To address this problem, we propose a method to generate pseudo ground-truths of multi-view human pose data from a multi-view model and exploit the resultant pseudo ground-truths to train a single-view model. In addition, we propose a multi-view consistency loss function that considers the consistency of poses estimated from multi-view images, showing that the proposed loss helps the effective training of single-view models. Experiments using Human3.6M and MPI-INF-3DHP datasets show that the proposed method is effective for training single-view 3D human pose estimation models.

Keywords

3D human pose estimation; Semi-supervised learning; Multi-view consistency; Deep learning;

Citations & Related Records

Reference

1	D. Mehta, H. Rhodin, D. Casas, P. Fua, O. Sotnychenko, W. Xu, and C. Theobalt, "Monocular 3D human pose estimation in the wild using improved CNN supervision," IEEE International Conference on 3D Vision, pp. 506-516, 2017. doi: https://doi.org/10.1109/3DV.2017.00064 DOI
2	J. C. Gower, "Generalized procrustes analysis," Psychometrika, vol. 40, no. 2, 1975. doi: https://doi.org/10.1007/BF02291478 DOI
3	D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," International Conference on Learning Representations, 2015. doi: https://doi.org/10.48550/arXiv.1412.6980 DOI
4	A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, "Automatic differentiation in pytorch," Advances in Neural Information Processing Systems Workshops, 2017. https://openreview.net/pdf?id=BJJsrmfCZ
5	A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Advances in Neural Information Processing Systems, pp. 1097-1105, 2012. doi: https://doi.org/10.1145/3065386 DOI
6	G. Pavlakos, X. Zhou, K. G. Derpanis, and K. Daniilidis, "Coarseto-fine volumetric prediction for single-image 3D human pose," IEEE Conference on Computer Vision and Pattern Recognition, 2017. doi: https://doi.org/10.1109/CVPR.2017.139 DOI
7	U. Iqbal, P. Molchanov, and J. Kautz, "Weakly-supervised 3d human pose learning via multi-view images in the wild," IEEE Conference on Computer Vision and Pattern Recognition, 2020. doi: https://doi.org/10. 1109/CVPR42600.2020.00529 DOI
8	K. Iskakov, E. Burkov, V. Lempisky, and Y. Malkov, "Learnable triangulation of human pose," IEEE International Conference on Computer Vision, 2019. doi: https://doi.org/10.1109/ICCV.2019.00781 DOI
9	M. Kocabas, S. Karagoz, and E. Akbas, "Self-supervised learning of 3d human pose using multi-view geometry," IEEE Conference on Computer Vision and Pattern Recognition, 2019. doi: https://doi.org/10.1109/CVPR.2019.00117 DOI
10	R. Hartley and A. Zisserman, Multiple view geometry in computer vision, Cambridge university press, 2003. doi: https://doi.org/10.1017/ CBO9780511811685 DOI
11	X. Sun, B. Xiao, F. Wei, S. Liang, and Y. Wei, "Integral human pose regression," European Conference on Computer Vision, 2018. doi: https://doi.org/10.1007/978-3-030-01231-1_33 DOI
12	J. Martinez, R. Hossain, J. Romero, and J. J. Little, "A simple yet effective baseline for 3d human pose estimation," IEEE International Conference on Computer Vision, 2017. doi: https://doi.org/10.1109/ ICCV.2017.288 DOI
13	H. Ci, C. Wang, X. Ma, and Y. Wang, "Optimizing network structure for 3D human pose estimation," IEEE International Conference on Computer Vision, 2019. doi: https://doi.org/10.1109/ICCV.2019.00235 DOI
14	A. Zeng, X. Sun, F. Huang, M. Liu, Q. Xu, and S. Lin, "SRNet: Improving Generalization in 3D Human Pose Estimation with a Split-and-Recombine Approach," European Conference on Computer Vision, 2020. doi: https://doi.org/10.1007/978-3-030-58568-6_30 DOI
15	A. Newell, K. Yang, and J. Deng, "Stacked hourglass networks for human pose estimation," European Conference on Computer Vision, 2016. doi: https://doi.org/10.1007/978-3-319-46484-8_29 DOI
16	K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, 2016. doi: https://doi.org/10.1109/CVPR.2016. 90 DOI
17	C. Ionescu, D. Papava, V. Olaru, and C. Sminchisescu, "Human3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 7, pp. 1325-1339, 2013. doi: https://doi.org/10.1109/TPAMI.2013.248 DOI

KSCI

Multi-view Semi-supervised Learning-based 3D Human Pose Estimation 다시점 준지도 학습 기반 3차원 휴먼 자세 추정

Multi-view Semi-supervised Learning-based 3D Human Pose Estimation