DOI QR코드

DOI QR Code

Hard Example Generation by Novel View Synthesis for 3-D Pose Estimation

3차원 자세 추정 기법의 성능 향상을 위한 임의 시점 합성 기반의 고난도 예제 생성

  • Minji Kim (The Department of Computer Science and Artificial Intelligence, Jeonbuk National University) ;
  • Sungchan Kim (The Department of Computer Science and Artificial Intelligence, Jeonbuk National University, Center for Advanced Image Information Technology, Jeonbuk National University)
  • Received : 2023.12.06
  • Accepted : 2023.12.20
  • Published : 2024.02.28

Abstract

It is widely recognized that for 3D human pose estimation (HPE), dataset acquisition is expensive and the effectiveness of augmentation techniques of conventional visual recognition tasks is limited. We address these difficulties by presenting a simple but effective method that augments input images in terms of viewpoints when training a 3D human pose estimation (HPE) model. Our intuition is that meaningful variants of the input images for HPE could be obtained by viewing a human instance in the images from an arbitrary viewpoint different from that in the original images. The core idea is to synthesize new images that have self-occlusion and thus are difficult to predict at different viewpoints even with the same pose of the original example. We incorporate this idea into the training procedure of the 3D HPE model as an augmentation stage of the input samples. We show that a strategy for augmenting the synthesized example should be carefully designed in terms of the frequency of performing the augmentation and the selection of viewpoints for synthesizing the samples. To this end, we propose a new metric to measure the prediction difficulty of input images for 3D HPE in terms of the distance between corresponding keypoints on both sides of a human body. Extensive exploration of the space of augmentation probability choices and example selection according to the proposed distance metric leads to a performance gain of up to 6.2% on Human3.6M, the well-known pose estimation dataset.

Keywords

Acknowledgement

이 논문은 정부 (과학기술정보통신부)의 재원으로 한국연구재단의 지원을 받아 수행된 연구임 (No. 2022R1A2C1011013).

References

  1. J. Wang, S. Jin, W. Liu, W. Liu, C. Qian, P. Luo, "When Human Pose Estimation Meets Robustness: Adversarial Algorithms and Benchmarks," In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11855-11864, 2021. 
  2. Y. Kwon, S. Petrangeli, D. Kim, H. Wang, H. Fuchs, V. Swaminathan, "Rotationally-consistent Novel View Synthesis for Humans," In Proceedings of the Proceedings of the 28th ACM International Conference on Multimedia, pp. 2308-2316, 2020. 
  3. K. Olszewski, S. Tulyakov, O. Woodford, H. Li, L. Luo, "Transformable Bottleneck Networks," In Proceedings of the Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7648-7657, 2019. 
  4. L. Ke, M. C. Chang, H. Qi, S. Lyu, "Multi-scale Structure-aware Network for Human Pose Estimation," In Proceedings of the Proceedings of the European Conference on Computer Vision (ECCV), pp. 713-728, 2018. 
  5. Y. Bin, X. Cao, X. Chen, Y. Ge, Y. Tai, C. Wang, J. Li, F. Huang, C. Gao, N. Sang, "Adversarial Semantic Data Augmentation for Human Pose Estimation," In Proceedings of the Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XIX 16. Springer, pp. 606-622, 2020. 
  6. R. Pytel, O. S. Kayhan, J. C. van Gemert, "Tilting at Windmills: Data Augmentation for Deep Pose Estimation does not Help with Occlusions," In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, pp. 10568-10575, 2021. 
  7. H. Rhodin, M. Salzmann, P. Fua, "Unsupervised Geometry-aware Representation for 3D Human Pose Estimation," In Proceedings of the Proceedings of the European Conference on Computer Vision (ECCV), pp. 750-767, 2018. 
  8. P. Buehler, M. Everingham, D. P. Huttenlocher, A. Zisserman, "Upper Body Detection and Tracking in Extended Signing Sequences," International Journal of Computer Vision, Vol. 95, pp. 180-197,
  9. B. Sapp, D. Weiss, B. Taskar, "Parsing Human Motion with Stretchable Models," In Proceedings of the CVPR 2011. IEEE, pp. 1281-1288, 2011. 
  10. P. F. Felzenszwalb, R. B. Girshick, D. McAllester, D. Ramanan, "Object Detection with Discriminatively Trained Part-based Models," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 32, pp. 1627-1645, 2009. 
  11. V. Belagiannis, X. Wang, B. Schiele, P. Fua, S. Ilic, N. Navab, "Multiple Human Pose Estimation with Temporally Consistent 3D Pictorial Structures," In Proceedings of the Computer Vision-ECCV 2014 Workshops: Zurich, Switzerland, September 6-7 and 12, 2014, Proceedings, Part I 13. Springer, pp. 742-754, 2015. 
  12. A. Toshev, C. Szegedy, "Deeppose: Human Pose Estimation Via Deep Neural Networks," In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653-1660, 2014. 
  13. W. Ouyang, X. Chu, X. Wang, "Multi-source Deep Learning for Human Pose Estimation," In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2329-2336, 2014. 
  14. J. J. Tompson, A. Jain, Y. LeCun, C. Bregler, "Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation," Advances in Neural Information Processing Systems, Vol. 27, 2014. 
  15. X. Chu, W. Yang, W. Ouyang, C. Ma, A. L. Yuille, X. Wang, "Multi-context Attention for Human Pose Estimation," In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1831-1840, 2017. 
  16. G. Tian, Y. Yi, Z. Meng, Z. Li, J. Song, "PRM: Pose Recalibration Module for Action Recognition," In Proceedings of the The International Conference on Image, Vision and Intelligent Systems (ICIVIS 2021). Springer, pp. 757-766, 2022. 
  17. W. Yang, S. Li, W. Ouyang, H. Li, X. Wang, "Learning Feature Pyramids for Human Pose Estimation," In Proceedings of the Proceedings of the IEEE International Conference on Computer Vision, pp. 1281-1290, 2017. 
  18. Y. Cai, Z. Wang, Z. Luo, B. Yin, A. Du, H. Wang, X. Zhang, X. Zhou, E. Zhou, J. Sun, "Learning Delicate Local Representations for Multi-person Pose Estimation," In Proceedings of the Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part III 16. Springer, pp. 455-472, 2020. 
  19. W. Tang, Y. Wu, "Does Learning Specific Features for Related Parts Help Human Pose Estimation?," In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1107-1116, 2019. 
  20. Y. Chen, Z. Wang, Y. Peng, Z. Zhang, G. Yu, J. Sun, "Cascaded Pyramid Network for Multi-person Pose Estimation," In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7103-7112, 2018. 
  21. Z. Cao, T. Simon, S. E. Wei, Y. Sheikh, "Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields," In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291-7299, 2017. 
  22. K. Sun, Y. Zhao, B. Jiang, T. Cheng, B. Xiao, D. Liu, Y. Mu, X. Wang, W. Liu, J. Wang, "High-resolution Representations for Labeling Pixels and Regions," arXiv preprint arXiv:1904.04514, 2019. 
  23. D. Ji, J. Kwon, M. McFarland, S. Savarese, "Deep View Morphing," In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2155-2163, 2017. 
  24. T. D. Kulkarni, W. F. Whitney, P. Kohli, J. Tenenbaum, "Deep Convolutional Inverse Graphics Network," Advances in Neural Information Processing Systems, Vol. 28, 2015. 
  25. E. Park, J. Yang, E. Yumer, D. Ceylan, A. C. Berg, "Transformation-grounded Image Generation Network for Novel 3D View Synthesis," In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3500-3509, 2017. 
  26. K. Rematas, C. H. Nguyen, T. Ritschel, M. Fritz, T. Tuytelaars, "Novel Views of Objects from a Single Image," IEEE Transactions on Pattern Analysis and Machine Intelligence , Vol. 39, pp. 1576-1590, 2016. 
  27. R. Zhang, P. Isola, A. A. Efros, "Colorful Image Colorization," In Proceedings of the Computer Vision-ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III 14. Springer, pp. 649-666, 2016. 
  28. T. Zhou, S. Tulsiani, W. Sun, J. Malik, A. A. Efros, "View Synthesis by Appearance Flow," In Proceedings of the Computer Vision-ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part IV 14. Springer, pp. 286-301, 2016. 
  29. M. Tatarchenko, A. Dosovitskiy, T. Brox, "Single-view to Multi-view: Reconstructing Unseen Views with a Convolutional Network," CoRR abs/1511.06702, Vol. 1, No. 2, 2015. 
  30. M. Tatarchenko, A. Dosovitskiy, T. Brox, "Multi-view 3D Models from Single Images with a Convolutional Network," In Proceedings of the Computer Vision-ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VII 14. Springer, pp. 322-337, 2016. 
  31. J. Yang, S. E. Reed, M. H. Yang, H. Lee, "Weakly-supervised Disentangling with Recurrent Transformations for 3D View Synthesis," Advances in Neural Information Processing Systems, Vol. 28, 2015. 
  32. J. Flynn, I. Neulander, J. Philbin, N. Snavely, "Deepstereo: Learning to Predict New Views from the World's Imagery," In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5515-5524, 2016. 
  33. G. E. Hinton, A. Krizhevsky, S. D. Wang, "Transforming Auto-encoders," In Proceedings of the Artificial Neural Networks and Machine Learning-ICANN 2011: 21st International Conference on Artificial Neural Networks, Espoo, Finland, June 14-17, 2011, Proceedings, Part I 21. Springer, pp. 44-51, 2011. 
  34. D. Jimenez Rezende, S. M. Eslami, S. Mohamed, P. Battaglia, M. Jaderberg, N. Heess, "Unsupervised Learning of 3D Structure from Images," Advances in Neural Information Processing Systems, Vol. 29, 2016. 
  35. X. Yan, J. Yang, E. Yumer, Y. Guo, H. Lee, "Perspective Transformer Nets: Learning Single-view 3D Object Reconstruction Without 3D Supervision," Advances in Neural Information Processing Systems, Vol. 29, 2016. 
  36. Z. Zhu, T. Huang, M. Xu, B. Shi, W. Cheng, X. Bai, "Progressive and Aligned Pose Attention Transfer for Person Image Generation," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 44, No. 8, pp. 4306-4320, 2021. 
  37. V. Sitzmann, J. Thies, F. Heide, M. Niessner, G. Wetzstein, M. Zollhofer, "Deepvoxels: Learning Persistent 3D Feature Embeddings," In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2437-2446, 2019. 
  38. L. Ma, X. Jia, Q. Sun, B. Schiele, T. Tuytelaars, L. V. Gool, "Pose Guided Person Image Generation," Advances in Neural Information Processing Systems, Vol. 30, 2017. 
  39. C. Lassner, G. Pons-Moll, P. V. Gehler, "A Generative Model of People in Clothing," In Proceedings of the Proceedings of the IEEE International Conference on Computer Vision, pp. 853-862, 2017. 
  40. J. Walker, K. Marino, A. Gupta, M. Hebert, "The Pose Knows: Video Forecasting by Generating Pose Futures," In Proceedings of the Proceedings of the IEEE International Conference on Computer Vision, pp. 3332-3341, 2017. 
  41. A. Siarohin, E. Sangineto, S. Lathuiliere, N. Sebe, "Deformable Gans for Pose-based Human Image Generation," In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3408-3416, 2018. 
  42. L. Ma, Q. Sun, S. Georgoulis, L. V. Gool, B. Schiele, M. Fritz, "Disentangled Person Image Generation," In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 99-108, 2018. 
  43. P. Esser, E. Sutter, B. Ommer, "A Variational U-net for Conditional Appearance and Shape Generation," In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8857-8866, 2018. 
  44. W. Zhu, X. Ma, Z. Liu, L. Liu, W. Wu, Y. Wang, "Motionbert: A Unified Perspective on Learning Human Motion Representations," In Proceedings of the Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15085-15099, 2023. 
  45. J. Zhang, Z. Tu, J. Yang, Y. Chen, J. Yuan, "MixSTE: Seq2seq Mixed Spatio-temporal Encoder for 3D Human Pose Estimation in Video," In Proceedings of the CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13222-13232, 2022. 
  46. C. Zheng, S. Zhu, M. Mendieta, T. Yang, C. Chen, Z. Ding, "3D Human Pose Estimation with Spatial and Temporal Transformers," In Proceedings of the Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11656-11665, 2021. 
  47. W. Shan, Z. Liu, X. Zhang, S. Wang, S. Ma, W. Gao, "P-stmo: Pre-trained Spatial Temporal Many-to-one Model for 3D Human Pose Estimation," In Proceedings of the European Conference on Computer Vision. Springer, pp. 461-478, 2022. 
  48. C. Ionescu, D. Papava, V. Olaru, C. Sminchisescu, "Human3. 6m: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 36, pp. 1325-1339, 2013. 
  49. H. Ci, C. Wang, X. Ma, Y. Wang, "Optimizing Network Structure for 3D Human Pose Estimation," In Proceedings of the Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2262-2271, 2019.