DOI QR코드

DOI QR Code

Recent Trends of Weakly-supervised Deep Learning for Monocular 3D Reconstruction

단일 영상 기반 3차원 복원을 위한 약교사 인공지능 기술 동향

  • Kim, Seungryong (Department of Computer Science and Engineering, Korea University)
  • Received : 2020.11.30
  • Accepted : 2021.01.22
  • Published : 2021.01.30

Abstract

Estimating 3D information from a single image is one of the essential problems in numerous applications. Since a 2D image inherently might originate from an infinite number of different 3D scenes, thus 3D reconstruction from a single image is notoriously challenging. This challenge has been overcame by the advent of recent deep convolutional neural networks (CNNs), by modeling the mapping function between 2D image and 3D information. However, to train such deep CNNs, a massive training data is demanded, but such data is difficult to achieve or even impossible to build. Recent trends thus aim to present deep learning techniques that can be trained in a weakly-supervised manner, with a meta-data without relying on the ground-truth depth data. In this article, we introduce recent developments of weakly-supervised deep learning technique, especially categorized as scene 3D reconstruction and object 3D reconstruction, and discuss limitations and further directions.

2차원 단일 영상에서 3차원 깊이 정보를 복원하는 기술은 다양한 한계 및 산업계에서 활용도가 매우 높은 기술임이 분명하다. 하지만 2차원 영상은 임의의 3차원 정보의 투사의 결과라는 점에서 내재적 깊이 모호성(Depth ambiguity)을 가지고 있고 이를 해결하는 문제는 매우 도전적이다. 이러한 한계점은 최근 인공지능 기술의 발달에 힘입어 2차원 영상과 3차원 깊이 정보간의 대응 관계를 학습하는 알고리즘의 발달로 극복되어 지고 있다. 이러한 3차원 깊이 정보 획득을 위한 인공지능 기술을 학습하기 위해서는 대응 관계를 나타내는 대규모의 학습데이터의 필요성이 절대적인데, 이러한 데이터는 취득 및 가공 과정에서 상당한 노동력을 필요로 하기에 제한적으로 구축이 가능하다. 따라서 최근의 기술 발전 동향은 대규모의 2차원 영상과 메타 데이터를 활용하여 3차원 깊이 정보를 예측하려는 약교사(Weakly-supervised) 인공지능 기술의 발전이 주를 이루고 있다. 본 고에서는 이러한 기술 발전 동향을 장면(Scene) 3차원 복원 기술과 객체(Object) 3차원 복원 기술로 나누어 요약하고 현재의 기술들의 한계점과 향후 나아갈 방향에 대해서 토의한다.

Keywords

Acknowledgement

This research was supported by the MSIT (Ministry of Science and ICT), Korea, under the ICT Creative Consilience program (IITP-2020-0-01819) supervised by the IITP (Institue for Information & Communications Technology Planning & Evaluation).

References

  1. D. Scharstein and R. Szeliski, "A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms," IJCV, Vol. 47, pp. 7-42, April 2002. https://doi.org/10.1023/A:1014573219977
  2. M. Poggi, F. Tosi, K. Batsos, P. Mordohai, and S. Mattoccia, "On the Synergies between Machine Learning and Stereo: a Survey," arXiv:2004.08566, 2020.
  3. R. Ranftl, K. Lasinger, D. Hafner, K. Schindler, and V. Koltun, "Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer," TPAMI, 2020.
  4. C. Godard, O. M. Aodha, and G. J. Browstow, "Unsupervised Monocular Depth Estimation with Left-Right Consistency," CVPR, 2017.
  5. C. Godard, O. M. Aodha, M. Firman, and G. J. Browstow, "Digging into Self-Supervised Monocular Depth Prediction," ICCV, 2019.
  6. A. Kanazawa, S. Tulsiani, A. A. Efros, and J. Malik, "Learning Category-Specific Mesh Reconstruction from Image Collections," ECCV, 2016.
  7. A. Kanazawa, M. J. Black, D. W. Jacobs, and J. Malik, "End-to-end Recovery of Human Shape and Pose," CVPR, 2018.
  8. A. Saxena, M, Sun, A. Y. Ng, "Make3D: Learning 3D Scene Structure from a Single Still Image," TPAMI, Vol. 31, No. 5, pp. 824-840, May 2009. https://doi.org/10.1109/TPAMI.2008.132
  9. D. Eigen, C. Puhrsch, and R. Fergus, "Depth Map Prediction from a Single Image Using a Multi-Scale Deep Network," NeurIPS, 2014.
  10. J. Xie, R. Girshick, and A. Farhadi, "Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks," ECCV, 2016.
  11. R. Garg, V. Kumar, G. Carneiro, I. Reid, "Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue," ECCV, 2016.
  12. T. Zhou, M. Brown, N. Snavely, and D. G. Lowe, "Unsupervised Learning of Depth and Ego-Motion frrom Video," CVPR, 2017.
  13. A. Ranjan, V. Jampani, L. Balles, K. Kim, D. Sun, J. Wulff, M. J. Black, "Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera, Motion, Optical Flow and Motion Segmentation," CVPR, 2019.
  14. S. Zhu, G. Brazil, X. Liu, "The Edge of Depth: Explicit Constraints between Segmentation and Depth," CVPR, 2020.
  15. N. Kulkarni, A. Gupta, S. Tulsiani, "Canonical Surface Mapping via Geometric Cycle Consistency," ICCV, 2019.
  16. S. Goel, A. Kanazawa, and J. Malik, "Shape and Viewpoint without Keypoints," ECCV, 2020.
  17. N. Kolotouros, G. Pavlakos, M. J. Black, K. Daniilidis, "Learning to Reconstruct 3D Human Pose and Shape via Model-fitting in the Loop," ICCV, 2019.