DOI QR코드

DOI QR Code

SINGLE PANORAMA DEPTH ESTIMATION USING DOMAIN ADAPTATION

도메인 적응을 이용한 단일 파노라마 깊이 추정

  • Received : 2020.06.19
  • Accepted : 2020.06.25
  • Published : 2020.07.01

Abstract

In this paper, we propose a deep learning framework for predicting a depth map of a 360° panorama image. Previous works use synthetic 360° panorama datasets to train networks due to the lack of realistic datasets. However, the synthetic nature of the datasets induces features extracted by the networks to differ from those of real 360° panorama images, which inevitably leads previous methods to fail in depth prediction of real 360° panorama images. To address this gap, we use domain adaptation to learn features shared by real and synthetic panorama images. Experimental results show that our approach can greatly improve the accuracy of depth estimation on real panorama images while achieving the state-of-the-art performance on synthetic images.

본 연구에서는 360° 파노라마의 깊이 영상을 추정하는 딥러닝 구조를 제안한다. 이전 연구들에서는 딥러닝 네트워크를 학습시키기 위해 렌더링된 360° 파노라마 데이터 셋을 사용했다. 하지만, 렌더링된 파노라마 데이터 셋은 실제로 촬영된 파노라마 데이터 셋과 다르기 때문에, 이전 연구들의 네트워크는 실제로 촬영된 파노라마에 대해선 깊이 영상을 정확히 추정할 수가 없었다. 이 문제를 해결하기 위해 본 연구에서는 도메인 적응을 사용해서 렌더링된 파노라마와 실제로 촬영된 파노라마가 공유하는 특징들을 네트워크가 학습하게 했다. 실험을 통해 우리의 방식이 렌더링된 파노라마에 대해선 우수한 성능을 유지하면서 실제로 촬영된 파노라마에 대해서도 정확한 깊이 영상을 추정하는 것을 볼 수 있다.

Keywords

References

  1. K. Tateno, F. Tombari, I. Laina, and N. Navab, "Cnn-slam: Real-time dense monocular slam with learned depth prediction," in Proc. CVPR, 2017.
  2. K. Karsch, K. Sunkavalli, S. Hadap, N. Carr, H. Jin, R. da Fonte, M. Sittig, and D. Forsyth, "Automatic scene inference for 3d object compositing," ACM TOG, vol. 33, no. 3, 2014.
  3. J. Huang, Z. Chen, D. Ceylan, and H. Jin, "6-dof vr videos with a single 360-camera," in Proc. IEEE VR, 2017.
  4. X. Ren, L. Bo, and D. Fox, "Rgb-(d) scene labeling: Features and algorithms," in Proc. CVPR, 2012.
  5. S. Im, H. Ha, F. Rameau, H.-G. Jeon, G. Choe, and I.-S. Kweon, "All-around depth from small motion with a spherical panoramic camera," in Proc. ECCV, 2016.
  6. R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, 2nd ed. USA: Cambridge University Press, 2003.
  7. R. T. Collins, "A space-sweep approach to true multi-image matching," in Proc. CVPR, 1996.
  8. N. Zioulis, A. Karakottas, D. Zarpalas, and P. Daras, "Omnidepth: Dense depth estimation for indoors spherical panoramas," in Proc. ECCV, 2018.
  9. M. Eder, P. Moulon, and L. Guan, "Pano popups: Indoor 3d reconstruction with a plane-aware network," in Proc. 3DV, 2019.
  10. M. Eder, T. Price, T. Vu, A. Bapat, and J. Frahm, "Mapped convolutions," ArXiv, 2019.
  11. N. Zioulis, A. Karakottas, D. Zarpalas, F. Alvarez, and P. Daras, "Spherical view synthesis for self-supervised $360^{\circ}$ depth estimation," in Proc. 3DV, 2019.
  12. D. Huber and L. Tchapmi, "The sumo challenge," The 2019 SUMO Workshop $360^{\circ}$ Indoor Scene Understanding and Modeling.
  13. J. Hoffman, E. Tzeng, T. Park, J.-Y. Zhu, P. Isola, K. Saenko, A. A. Efros, and T. Darrell, "CyCADA: Cycle-consistent adversarial domain adaptation," in Proc. ICML, 2018.
  14. J. Xiao, K. A. Ehinger, A. Oliva, and A. Torralba, "Recognizing scene viewpoint using panoramic place representation," in Proc. CVPR, 2012.
  15. C. Godard, O. M. Aodha, and G. J. Brostow, "Unsupervised monocular depth estimation with left-right consistency," in Proc. CVPR, 2016.
  16. I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, and N. Navab, "Deeper depth prediction with fully convolutional residual networks," in Proc. 3DV, 2016.
  17. F. Liu, C. Shen, G. Lin, and I. D. Reid, "Learning depth from single monocular images using deep convolutional neural fields," IEEE TPAMI, vol. 38, no. 10, pp. 2024-2039, 2016. https://doi.org/10.1109/TPAMI.2015.2505283
  18. J. H. Lee, M. Han, D. W. Ko, and I. H. Suh, "From big to small: Multi-scale local planar guidance for monocular depth estimation," ArXiv, 2019.
  19. W. Yin, Y. Liu, C. Shen, and Y. Yan, "Enforcing geometric constraints of virtual normal for depth prediction," in Proc. CVPR, 2019.
  20. D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," in Proc. ICLR, 2015.
  21. I. Armeni, S. Sax, A. R. Zamir, and S. Savarese, "Joint 2d-3d-semantic data for indoor scene understanding," ArXiv, vol. abs/1702.01105, 2017.
  22. A. X. Chang, A. Dai, T. A. Funkhouser, M. Halber, M. NieBner, M. Savva, S. Song, A. Zeng, and Y. Zhang, "Matterport3d: Learning from rgb-d data in indoor environments," 2017 International Conference on 3D Vision (3DV), pp. 667-676, 2017.
  23. S. Song, F. Yu, A. Zeng, A. X. Chang, M. Savva, and T. A. Funkhouser, "Semantic scene completion from a single depth image," Proc. CVPR, pp. 190-198, 2016.
  24. A. Geiger, P. Lenz, and R. Urtasun, "Are we ready for autonomous driving? the kitti vision benchmark suite," 2012.
  25. N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, "Indoor segmentation and support inference from rgbd images," in Proc. ECCV, 2012.
  26. J. Hu, M. Ozay, Y. Zhang, and T. Okatani, "Revisiting single image depth estimation: Toward higher resolution maps with accurate object boundaries," 2018.
  27. Y.-C. Su and K. Grauman, "Learning spherical convolution for fast features from 360◦ imagery," ArXiv, vol. abs/1708.00919, 2017.
  28. A. Handa, V. Patraucean, S. Stent, and R. Cipolla, "Scenenet: An annotated model generator for indoor scene understanding," 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 5737-5743, 2016.
  29. I. Armeni, O. Sener, A. R. Zamir, H. Jiang, I. K. Brilakis, M. Fischer, and S. Savarese, "3d semantic parsing of large-scale indoor spaces," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1534-1543, 2016.
  30. I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, and Y. Bengio, "Generative adversarial nets," in NIPS, 2014.