Browse > Article
http://dx.doi.org/10.5909/JBE.2022.27.2.174

Multi-view Semi-supervised Learning-based 3D Human Pose Estimation  

Kim, Do Yeop (Department of Electronics and Communications Engineering, Kwangwoon University)
Chang, Ju Yong (Department of Electronics and Communications Engineering, Kwangwoon University)
Publication Information
Journal of Broadcast Engineering / v.27, no.2, 2022 , pp. 174-184 More about this Journal
Abstract
3D human pose estimation models can be classified into a multi-view model and a single-view model. In general, the multi-view model shows superior pose estimation performance compared to the single-view model. In the case of the single-view model, the improvement of the 3D pose estimation performance requires a large amount of training data. However, it is not easy to obtain annotations for training 3D pose estimation models. To address this problem, we propose a method to generate pseudo ground-truths of multi-view human pose data from a multi-view model and exploit the resultant pseudo ground-truths to train a single-view model. In addition, we propose a multi-view consistency loss function that considers the consistency of poses estimated from multi-view images, showing that the proposed loss helps the effective training of single-view models. Experiments using Human3.6M and MPI-INF-3DHP datasets show that the proposed method is effective for training single-view 3D human pose estimation models.
Keywords
3D human pose estimation; Semi-supervised learning; Multi-view consistency; Deep learning;
Citations & Related Records
연도 인용수 순위
  • Reference
1 D. Mehta, H. Rhodin, D. Casas, P. Fua, O. Sotnychenko, W. Xu, and C. Theobalt, "Monocular 3D human pose estimation in the wild using improved CNN supervision," IEEE International Conference on 3D Vision, pp. 506-516, 2017. doi: https://doi.org/10.1109/3DV.2017.00064   DOI
2 J. C. Gower, "Generalized procrustes analysis," Psychometrika, vol. 40, no. 2, 1975. doi: https://doi.org/10.1007/BF02291478   DOI
3 D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," International Conference on Learning Representations, 2015. doi: https://doi.org/10.48550/arXiv.1412.6980   DOI
4 A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, "Automatic differentiation in pytorch," Advances in Neural Information Processing Systems Workshops, 2017. https://openreview.net/pdf?id=BJJsrmfCZ
5 A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Advances in Neural Information Processing Systems, pp. 1097-1105, 2012. doi: https://doi.org/10.1145/3065386   DOI
6 G. Pavlakos, X. Zhou, K. G. Derpanis, and K. Daniilidis, "Coarseto-fine volumetric prediction for single-image 3D human pose," IEEE Conference on Computer Vision and Pattern Recognition, 2017. doi: https://doi.org/10.1109/CVPR.2017.139   DOI
7 U. Iqbal, P. Molchanov, and J. Kautz, "Weakly-supervised 3d human pose learning via multi-view images in the wild," IEEE Conference on Computer Vision and Pattern Recognition, 2020. doi: https://doi.org/10. 1109/CVPR42600.2020.00529   DOI
8 K. Iskakov, E. Burkov, V. Lempisky, and Y. Malkov, "Learnable triangulation of human pose," IEEE International Conference on Computer Vision, 2019. doi: https://doi.org/10.1109/ICCV.2019.00781   DOI
9 M. Kocabas, S. Karagoz, and E. Akbas, "Self-supervised learning of 3d human pose using multi-view geometry," IEEE Conference on Computer Vision and Pattern Recognition, 2019. doi: https://doi.org/10.1109/CVPR.2019.00117   DOI
10 R. Hartley and A. Zisserman, Multiple view geometry in computer vision, Cambridge university press, 2003. doi: https://doi.org/10.1017/ CBO9780511811685   DOI
11 X. Sun, B. Xiao, F. Wei, S. Liang, and Y. Wei, "Integral human pose regression," European Conference on Computer Vision, 2018. doi: https://doi.org/10.1007/978-3-030-01231-1_33   DOI
12 J. Martinez, R. Hossain, J. Romero, and J. J. Little, "A simple yet effective baseline for 3d human pose estimation," IEEE International Conference on Computer Vision, 2017. doi: https://doi.org/10.1109/ ICCV.2017.288   DOI
13 H. Ci, C. Wang, X. Ma, and Y. Wang, "Optimizing network structure for 3D human pose estimation," IEEE International Conference on Computer Vision, 2019. doi: https://doi.org/10.1109/ICCV.2019.00235   DOI
14 A. Zeng, X. Sun, F. Huang, M. Liu, Q. Xu, and S. Lin, "SRNet: Improving Generalization in 3D Human Pose Estimation with a Split-and-Recombine Approach," European Conference on Computer Vision, 2020. doi: https://doi.org/10.1007/978-3-030-58568-6_30   DOI
15 A. Newell, K. Yang, and J. Deng, "Stacked hourglass networks for human pose estimation," European Conference on Computer Vision, 2016. doi: https://doi.org/10.1007/978-3-319-46484-8_29   DOI
16 K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, 2016. doi: https://doi.org/10.1109/CVPR.2016. 90   DOI
17 C. Ionescu, D. Papava, V. Olaru, and C. Sminchisescu, "Human3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 7, pp. 1325-1339, 2013. doi: https://doi.org/10.1109/TPAMI.2013.248   DOI