DOI QR코드

DOI QR Code

비지도학습 기반의 뎁스 추정을 위한 지식 증류 기법

Knowledge Distillation for Unsupervised Depth Estimation

  • 투고 : 2022.07.04
  • 심사 : 2022.08.03
  • 발행 : 2022.08.31

초록

This paper proposes a novel approach for training an unsupervised depth estimation algorithm. The objective of unsupervised depth estimation is to estimate pixel-wise distances from camera without external supervision. While most previous works focus on model architectures, loss functions, and masking methods for considering dynamic objects, this paper focuses on the training framework to effectively use depth cue. The main loss function of unsupervised depth estimation algorithms is known as the photometric error. In this paper, we claim that direct depth cue is more effective than the photometric error. To obtain the direct depth cue, we adopt the technique of knowledge distillation which is a teacher-student learning framework. We train a teacher network based on a previous unsupervised method, and its depth predictions are utilized as pseudo labels. The pseudo labels are employed to train a student network. In experiments, our proposed algorithm shows a comparable performance with the state-of-the-art algorithm, and we demonstrate that our teacher-student framework is effective in the problem of unsupervised depth estimation.

키워드

과제정보

본 연구는 삼성전자의 지원을 받아 수행된 결과임.

참고문헌

  1. D. Eigen, C. Puhrsch, R. Fergus, "Depth map Prediction from a Single Image using a Multi-scale deep Network," Advances in Neural Information Processing Systems, pp. 2366-2374, 2014.
  2. J. Uhrig, N. Schneider, L. Schneider, U. Franke, T. Brox, A. Geiger, "Sparsity Invariant cnns." International Conference on 3D Vision, pp. 11-20, 2017.
  3. H. Fu, M. Gong, C. Wang, K. Batmanghelich, D. Tao, "Deep Ordinal Regression Network for Monocular Depth Estimation," Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition, pp. 2002-2011, 2018.
  4. J. H. Lee, M. K. Han, D. W. Ko, I. H. Suh, "From big to Small: Multi-scale Local Planar Guidance for Monocular Depth Estimation," arXiv preprint arXiv:1907.10326, 2019.
  5. G. Huang, Z. Liu, L. V. D. Maaten, K. Q. Weinberger, "Densely Connected Convolutional Networks," Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition, pp. 4700-4708, 2017.
  6. C. Godard, O. M. Aodha, G. J. Brostow, "Unsupervised Monocular Depth Estimation with Left-right Consistency," Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition, pp. 270-279, 2017.
  7. C. Godard, O. M. Aodha, M. Firman, G. J. Brostow, "Digging into Self-supervised Monocular Depth Estimation," Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition, pp. 3828-3838, 2019.
  8. M. Jaderberg, K. Simonyan, A. Zisserman, K. kavukcuoglu, "Spatial Transformer Networks," Advances in Neural Information Processing Systems, pp. 2017-2025, 2015.
  9. H. Jiang, L. Ding, Z. Sun, R. Huang, "Dipe: Deeper into Photometric Errors for Unsupervised Learning of Depth and Ego-motion from Monocular Videos," International Conference on Intelligent Robots and Systems, pp. 10061-10067, 2020.
  10. A. Varma, H. Chawla, B. Zonooz, E. Arani, "Transformers in Self-Supervised Monocular Depth Estimation with Unknown Camera Intrinsics," arXiv preprint arXiv:2202.03131, 2022.
  11. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby, "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale," arXiv preprint arXiv:2010.11929, 2020.
  12. R. Ranftl, A. Bochkovskiy, V. Koltun, "Vision Transformers for Dense Prediction," Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition, pp. 12179-12188, 2021.
  13. Y. Wang, X. Li, M. Shi, K. Xian, Z. Cao, "Knowledge Distillation for fast and Accurate Monocular Depth Estimation on Mobile Devices," Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition, pp. 2457-2465, 2021.
  14. G. Hinton, O. Vinyals, J. Dean, "Distilling the Knowledge in a Neural Network," arXiv preprint arXiv:1503.02531, 2015.
  15. K. S. Song, K. J. Yoon, "Learning Monocular Depth Estimation via Selective Distillation of Stereo Knowledge," arXiv preprint arXiv:2205.08668, 2022.
  16. A. Pilzer, S. Lathuiliere, N. Sebe, E. Ricci, "Refine and Distill: Exploiting Cycle-inconsistency and Knowledge Distillation for Unsupervised Monocular Depth Estimation," Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition, pp. 9768-9777, 2019.
  17. J. Hu, C. Fan, H. Jiang, Xi. Guo, Y. Gao, X. Lu, T. L. Lam, "Boosting Light-weight Depth Estimation via Knowledge Distillation," arXiv preprint arXiv:2105.06143, 2021.
  18. H. Chawla, A. Varma, E. Arani, B. Zonooz, "Multimodal Scale Consistency and Awareness for Monocular Self-supervised Depth Estimation," International Conference on Robotics and Automation, pp. 5140-5146, 2021.
  19. J. Yan, H. Zhao, P. Bu, Y. S. Jin, "Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation," International conference on 3D Vision, pp. 464-473, 2021.
  20. Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli, "Image Quality Assessment: from Error Visibility to Structural Similarity," Transactions on Image Processing, Vol. 13, No. 4, pp. 600-612, 2004. https://doi.org/10.1109/TIP.2003.819861
  21. C. Wang, J. M. Buenaposada, R. Zhu, S. Lucey, "Learning Depth from Monocular Videos using Direct Methods." Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition, pp. 2022-2030, 2018.
  22. W. Yuan, X. Gu, Z. Dai, S. Zhu, P. Tan, "New Crfs: Neural Window Fully-connected Crfs for Monocular Depth Estimation," arXiv preprint arXiv:2203.01502, 2022.