DOI QR코드

DOI QR Code

Multi-view Semi-supervised Learning-based 3D Human Pose Estimation

다시점 준지도 학습 기반 3차원 휴먼 자세 추정

  • Kim, Do Yeop (Department of Electronics and Communications Engineering, Kwangwoon University) ;
  • Chang, Ju Yong (Department of Electronics and Communications Engineering, Kwangwoon University)
  • 김도엽 (광운대학교 전자통신공학과) ;
  • 장주용 (광운대학교 전자통신공학과)
  • Received : 2022.01.17
  • Accepted : 2022.02.15
  • Published : 2022.03.30

Abstract

3D human pose estimation models can be classified into a multi-view model and a single-view model. In general, the multi-view model shows superior pose estimation performance compared to the single-view model. In the case of the single-view model, the improvement of the 3D pose estimation performance requires a large amount of training data. However, it is not easy to obtain annotations for training 3D pose estimation models. To address this problem, we propose a method to generate pseudo ground-truths of multi-view human pose data from a multi-view model and exploit the resultant pseudo ground-truths to train a single-view model. In addition, we propose a multi-view consistency loss function that considers the consistency of poses estimated from multi-view images, showing that the proposed loss helps the effective training of single-view models. Experiments using Human3.6M and MPI-INF-3DHP datasets show that the proposed method is effective for training single-view 3D human pose estimation models.

3차원 휴먼 자세 추정 모델은 다시점 모델과 단시점 모델로 분류될 수 있다. 일반적으로 다시점 모델은 단시점 모델에 비하여 뛰어난 자세 추정 성능을 보인다. 단시점 모델의 경우 3차원 자세 추정 성능의 향상은 많은 양의 학습 데이터를 필요로 한다. 하지만 3차원 자세에 대한 참값을 획득하는 것은 쉬운 일이 아니다. 이러한 문제를 다루기 위해, 우리는 다시점 모델로부터 다시점 휴먼 자세 데이터에 대한 의사 참값을 생성하고, 이를 단시점 모델의 학습에 활용하는 방법을 제안한다. 또한, 우리는 각각의 다시점 영상으로부터 추정된 자세의 일관성을 고려하는 다시점 일관성 손실함수를 제안하여, 이것이 단시점 모델의 효과적인 학습에 도움을 준다는 것을 보인다. Human3.6M과 MPI-INF-3DHP 데이터셋을 사용한 실험은 제안하는 방법이 3차원 휴먼 자세 추정을 위한 단시점 모델의 학습에 효과적임을 보여준다.

Keywords

Acknowledgement

This work was partly supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) (No. 2021-0-00348, Development of A Cloud-based Video Surveillance System for Unmanned Store Environments using Integrated 2D/3D Video Analysis, 90%) and the Excellent researcher support project of Kwangwoon University in 2021 (10%).

References

  1. K. Iskakov, E. Burkov, V. Lempisky, and Y. Malkov, "Learnable triangulation of human pose," IEEE International Conference on Computer Vision, 2019. doi: https://doi.org/10.1109/ICCV.2019.00781
  2. M. Kocabas, S. Karagoz, and E. Akbas, "Self-supervised learning of 3d human pose using multi-view geometry," IEEE Conference on Computer Vision and Pattern Recognition, 2019. doi: https://doi.org/10.1109/CVPR.2019.00117
  3. R. Hartley and A. Zisserman, Multiple view geometry in computer vision, Cambridge university press, 2003. doi: https://doi.org/10.1017/ CBO9780511811685
  4. G. Pavlakos, X. Zhou, K. G. Derpanis, and K. Daniilidis, "Coarseto-fine volumetric prediction for single-image 3D human pose," IEEE Conference on Computer Vision and Pattern Recognition, 2017. doi: https://doi.org/10.1109/CVPR.2017.139
  5. X. Sun, B. Xiao, F. Wei, S. Liang, and Y. Wei, "Integral human pose regression," European Conference on Computer Vision, 2018. doi: https://doi.org/10.1007/978-3-030-01231-1_33
  6. U. Iqbal, P. Molchanov, and J. Kautz, "Weakly-supervised 3d human pose learning via multi-view images in the wild," IEEE Conference on Computer Vision and Pattern Recognition, 2020. doi: https://doi.org/10. 1109/CVPR42600.2020.00529 https://doi.org/10.1109/CVPR42600.2020.00529
  7. J. Martinez, R. Hossain, J. Romero, and J. J. Little, "A simple yet effective baseline for 3d human pose estimation," IEEE International Conference on Computer Vision, 2017. doi: https://doi.org/10.1109/ ICCV.2017.288
  8. H. Ci, C. Wang, X. Ma, and Y. Wang, "Optimizing network structure for 3D human pose estimation," IEEE International Conference on Computer Vision, 2019. doi: https://doi.org/10.1109/ICCV.2019.00235
  9. A. Zeng, X. Sun, F. Huang, M. Liu, Q. Xu, and S. Lin, "SRNet: Improving Generalization in 3D Human Pose Estimation with a Split-and-Recombine Approach," European Conference on Computer Vision, 2020. doi: https://doi.org/10.1007/978-3-030-58568-6_30
  10. A. Newell, K. Yang, and J. Deng, "Stacked hourglass networks for human pose estimation," European Conference on Computer Vision, 2016. doi: https://doi.org/10.1007/978-3-319-46484-8_29
  11. A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Advances in Neural Information Processing Systems, pp. 1097-1105, 2012. doi: https://doi.org/10.1145/3065386
  12. K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, 2016. doi: https://doi.org/10.1109/CVPR.2016. 90
  13. C. Ionescu, D. Papava, V. Olaru, and C. Sminchisescu, "Human3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 7, pp. 1325-1339, 2013. doi: https://doi.org/10.1109/TPAMI.2013.248
  14. D. Mehta, H. Rhodin, D. Casas, P. Fua, O. Sotnychenko, W. Xu, and C. Theobalt, "Monocular 3D human pose estimation in the wild using improved CNN supervision," IEEE International Conference on 3D Vision, pp. 506-516, 2017. doi: https://doi.org/10.1109/3DV.2017.00064
  15. J. C. Gower, "Generalized procrustes analysis," Psychometrika, vol. 40, no. 2, 1975. doi: https://doi.org/10.1007/BF02291478
  16. D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," International Conference on Learning Representations, 2015. doi: https://doi.org/10.48550/arXiv.1412.6980
  17. A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, "Automatic differentiation in pytorch," Advances in Neural Information Processing Systems Workshops, 2017. https://openreview.net/pdf?id=BJJsrmfCZ