DOI QR코드

DOI QR Code

Monocular Camera based Real-Time Object Detection and Distance Estimation Using Deep Learning

딥러닝을 활용한 단안 카메라 기반 실시간 물체 검출 및 거리 추정

  • Kim, Hyunwoo (Mobility Platform Research Center, Korea Electronics Technology Institute) ;
  • Park, Sanghyun (Mobility Platform Research Center, Korea Electronics Technology Institute)
  • Received : 2019.10.10
  • Accepted : 2019.11.13
  • Published : 2019.11.30

Abstract

This paper proposes a model and train method that can real-time detect objects and distances estimation based on a monocular camera by applying deep learning. It used YOLOv2 model which is applied to autonomous or robot due to the fast image processing speed. We have changed and learned the loss function so that the YOLOv2 model can detect objects and distances at the same time. The YOLOv2 loss function added a term for learning bounding box values x, y, w, h, and distance values z as 클래스ification losses. In addition, the learning was carried out by multiplying the distance term with parameters for the balance of learning. we trained the model location, recognition by camera and distance data measured by lidar so that we enable the model to estimate distance and objects from a monocular camera, even when the vehicle is going up or down hill. To evaluate the performance of object detection and distance estimation, MAP (Mean Average Precision) and Adjust R square were used and performance was compared with previous research papers. In addition, we compared the original YOLOv2 model FPS (Frame Per Second) for speed measurement with FPS of our model.

Keywords

References

  1. P. S. Heo, "A Study on the automotive ADAS market diffusion factors," Symposium of the Korean Institute of communications and Information Sciences, pp. 942-945, 2009.
  2. I. M. Elzayat, M. A. Saad, M. M. Mostafa, R. M. Hassan, H. A. E. Munim, M. Ghoneima, M. S. Darweesh, and H. Mostafa, "Real-Time Car Detection-Based Depth Estimation Using Mono Camera," 2018 30th International Conference on Microelectronics (ICM), Sousse, Tunisia, 2018, DOI: 10.1109/ICM.2018.8704024.
  3. Y. Cao, Z. Wu, and C. Shen, "Estimating depth from monocular images as classification using deep fully convolutional residual networks," IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 11, pp 3174-3182, 2018. https://doi.org/10.1109/TCSVT.2017.2740321
  4. A. Geiger, P. Lenz, and R. Urtasun, "Are we ready for autonomous driving? the KITTI vision benchmark suite," 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, pp. 3354-3361, 2012.
  5. A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet Classification with Deep Convolutional," Advances in Neural Information Processing Systems 25 (NIPS 2012), pp. 1097-1105, 2012.
  6. K. Simonyan and A. Zisserman. "Very deep convolutional networks for large-scale image recognition," arXiv:1409.1556 [cs.CV], 2015.
  7. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, "Going deeper with convolutions," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015, DOI: 10.1109/CVPR.2015.7298594.
  8. K. He, X. Zhang, S. Ren, and J. Sun. "Deep residual learning for image recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, DOI: 10.1109/CVPR.2016.90.
  9. W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, "SSD: Single shot multibox detector," Computer Vision - ECCV 2016, pp. 21-37, 2016.
  10. S. Ren, K. He, R. Girshick, and J. Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks," Advances in Neural Information Processing Systems 28 (NIPS 2015), pp. 91-99, 2015.
  11. J. Redmon and A. Farhadi. "Yolo9000: Better, faster, stronger," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 6517-6525, 2017.
  12. L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, "Rethinking atrous convolution for semantic image segmentation," arXiv: 1706.05587 [cs.CV], 2017.
  13. H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, "Pyramid scene parsing network," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, DOI: 10.1109/CVPR.2017.660.
  14. A. Toshev and C. Szegedy. "Deeppose: Human pose estimation via deep neural networks," 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, pp 1653-1660, 2014.
  15. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: unified, real-time object detection," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, DOI: 10.1109/CVPR.2016.91.