DOI QR코드

DOI QR Code

딥러닝 기반 영상 주행기록계와 단안 깊이 추정 및 기술을 위한 벤치마크

Benchmark for Deep Learning based Visual Odometry and Monocular Depth Estimation

  • Choi, Hyukdoo (Department of Electronics and Information Engineering)
  • 투고 : 2018.12.14
  • 심사 : 2019.01.25
  • 발행 : 2019.05.31

초록

This paper presents a new benchmark system for visual odometry (VO) and monocular depth estimation (MDE). As deep learning has become a key technology in computer vision, many researchers are trying to apply deep learning to VO and MDE. Just a couple of years ago, they were independently studied in a supervised way, but now they are coupled and trained together in an unsupervised way. However, before designing fancy models and losses, we have to customize datasets to use them for training and testing. After training, the model has to be compared with the existing models, which is also a huge burden. The benchmark provides input dataset ready-to-use for VO and MDE research in 'tfrecords' format and output dataset that includes model checkpoints and inference results of the existing models. It also provides various tools for data formatting, training, and evaluation. In the experiments, the exsiting models were evaluated to verify their performances presented in the corresponding papers and we found that the evaluation result is inferior to the presented performances.

키워드

참고문헌

  1. L. Deng and D. Yu, "Deep Learning: Methods and Applications," Now Foundations and Trends, vol. 7, no. 3, pp. 197-387, 2014.
  2. Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, vol. 521, pp. 436-444, 2015. https://doi.org/10.1038/nature14539
  3. S. Wang, R. Clark, H. Wen, and N. Trigoni, "Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks," 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, pp. 2043-2050, 2017.
  4. R. Li, S. Wang, Z. Long, and D. Gu, "Undeepvo: Monocular visual odometry through unsupervised deep learning," 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, pp. 7286-7291, 2018.
  5. T. Zhou, M. Brown, N. Snavely, and D. G. Lowe, "Unsupervised learning of depth and ego-motion from video," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, DOI: 10.1109/CVPR.2017.700.
  6. H. Zhan, R. Garg, C. S. Weerasekera, K. Li, H. Agarwal, and I. M. Reid, "Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp. 340-349, 2018.
  7. Z. Yin and J. Shi, "GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, DOI: 10.1109/CVPR.2018.00212.
  8. J. Fuentes-Pacheco, J. Ruiz-Ascencio, and J. M. Rendon-Mancha, "Visual simultaneous localization and mapping: a survey," Artificial Intelligence Review, vol. 43, no. 1, pp. 55-81, Jan., 2015. https://doi.org/10.1007/s10462-012-9365-8
  9. T. Taketomi, H. Uchiyama, and S. Ikeda, "Visual SLAM algorithms: a survey from 2010 to 2016," IPSJ Transactions on Computer Vision and Applications, vol. 9, no. 16, 2017, DOI: DOI 10.1186/s41074-017-0027-2.
  10. R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos, "ORB-SLAM: a versatile and accurate monocular SLAM system," IEEE Transactions on Robotics, vol. 31, no. 5, pp. 1147-1163, Oct., 2015. https://doi.org/10.1109/TRO.2015.2463671
  11. J. Engel, T. Schops, and D. Cremers, "LSD-SLAM: Large-scale direct monocular SLAM," European Conference on Computer Vision, pp. 834-849, 2014.
  12. R. Mahjourian, M. Wicke, and A. Angelova, "Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp. 5667-5675, 2018.
  13. A. Saxena, S. H. Chung, and A. Y. Ng, "Learning depth from single monocular images," 18th International Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, pp. 1161-1168, 2005.
  14. A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, "Vision meets robotics: The KITTI dataset," The International Journal of Robotics Research, vol. 32, no. 11, pp. 1231-1237, 2013. https://doi.org/10.1177/0278364913491297
  15. D. Eigen, C. Puhrsch, and R. Fergus, "Depth map prediction from a single image using a multi-scale deep network," arXiv: 1406.2283 [cs.CV], 2014..
  16. R. Mur-Artal and J. D. Tardos, "Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras," IEEE Transactions on Robotics, vol. 33, no. 5, pp. 1255-1262, Oct., 2017. https://doi.org/10.1109/TRO.2017.2705103
  17. J. Engel, V. Koltun, and D. Cremers, "Direct sparse odometry," arXiv:1607.02565 [cs.CV], 2016.
  18. J. Engel, J. Stuckler, and D. Cremers, "Large-scale direct SLAM with stereo cameras," 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, pp. 1935-1942, 2015.
  19. J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, "A benchmark for the evaluation of RGB-D SLAM systems," 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura, Portugal, pp. 573-580, 2012.