DOI QR코드

DOI QR Code

단-단계 물체 탐지기 학습을 위한 고난도 예들의 온라인 마이닝

Online Hard Example Mining for Training One-Stage Object Detectors

  • 투고 : 2018.02.28
  • 심사 : 2018.04.11
  • 발행 : 2018.05.31

초록

본 논문에서는 심층 합성 곱 신경망 모델 기반의 단-단계 물체 탐지기들의 탐지 성능을 향상시킬 수 있는 새로운 손실 함수와 온라인 고난도 예 마이닝 방식을 제안한다. 본 논문에서 제안하는 손실 함수와 온라인 고난도 예 마이닝 방식은 물체와 배경 간의 학습 데이터 불균형 문제를 해결할 뿐만 아니라, 각 물체의 위치 추정 정확도를 더 개선시킬 수 있다. 따라서 물체 탐지 속도가 빠른 단-단계 물체 탐지기들에 이-단계 물체 탐지기들과 비슷하거나 더 우수한 탐지 성능을 제공할 수 있다. PASCAL VOC 2007 벤치마크 데이터 집합을 이용한 다양한 실험들을 통해, 본 논문에서 제안하는 손실 함수와 온라인 고난도 예 마이닝 방식이 단-단계 물체 탐지기들의 성능 개선에 도움이 된다는 것을 입증해 보인다.

In this paper, we propose both a new loss function and an online hard example mining scheme for improving the performance of single-stage object detectors which use deep convolutional neural networks. The proposed loss function and the online hard example mining scheme can not only overcome the problem of imbalance between the number of annotated objects and the number of background examples, but also improve the localization accuracy of each object. Therefore, the loss function and the mining scheme can provide intrinsically fast single-stage detectors with detection performance higher than or similar to that of two-stage detectors. In experiments conducted with the PASCAL VOC 2007 benchmark dataset, we show that the proposed loss function and the online hard example mining scheme can improve the performance of single-stage object detectors.

키워드

참고문헌

  1. A. Berg, J. Deng, S. Satheesh, H. Su, and Li Fei-Fei, "IMAGENET Large Scale Visual Recognition Challenge 2017," http://www.image-net.org/challenges/LSVRC/
  2. M. Everingham, L. Van Gool, C. K. Williams, et al., "The PASCAL Visual Object Classes Challenge 2018," http://host.robots.ox.ac.uk/pascal/VOC/
  3. T. Lin, M. Maire, S. Belongie, R. Girshicj, et al., "Microsoft COCO: Common Objects in Context," Proceedings of the European Conference on Computer Vision (ECCV), 2014, http://cocodataset.org.
  4. P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan, "Object Detection with Discriminatively Trained Part-Based Models," IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol.32, No.9, pp.1627-1645, 2010. https://doi.org/10.1109/TPAMI.2009.167
  5. R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
  6. K. He, X. Zhang, S. Ren, and J. Sun, "Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition," Proceedings of the European Conference on Computer Vision (ECCV), 2014.
  7. J. F. Henriques, J. Carreira, R. Caseiro, and J. Batista, "Beyond Hard Negative Mining: Efficient Detector Learning via Block-Circulant Decomposition," Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2013.
  8. A. Kanezaki, S. Inaba, Y. Ushiku, et al., "Hard Negative Classes for Multiple Object Detection," Proceedings of IEEE International Conference on Robotics and Automation (ICRA), pp.3066-3073, 2014.
  9. O. Canevet and F. Fleuret, "Large Scale Hard Sample Mining with Monte Carlo Tree Search," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  10. A. Krizhevsky, I. Sutskever, and G. Hinton, "ImageNet Classification with Deep Convolutional neural Networks," Advances in Neural Information Processing Systems (NIPS), 2012.
  11. K. Simonyan and A. Zisserman, "Very Deep Convolutional Neetworks for Large-Scale Image Recognition," ICLR 2015.
  12. K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016
  13. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, "Rethinking the Inception Architecture for Computer Vision," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  14. S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," Advances Neural Information Processing Systems (NIPS), pp.91-99, 2015.
  15. Y. Li, K. He, J. Sun, et al., "R-FCN: Object Detection via Region-Based Fully Convolutional Networks," Advances in Neural Information Processing Systems (NIPS), pp. 379-387, 2016.
  16. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  17. J. Redmon and A. Farhadi, "YOLO9000: Better, Faster, Stronger," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  18. W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu, and A. C. Berg, "SSD: Single Shot Multibox Detector," Proceedings of the European Conference on Computer Vision (ECCV), 2016.
  19. K. He, G. Gkioxari, P. Dollar, and R. Girshick, "Mask R-CNN," arXiv:1703.06870.
  20. A. Shrivastava, A. Gupta, and R. Girshick, "Training Region-Based Object Detectors with Online Hard Example Mining," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.761-769, 2016.
  21. M. Li, Z. Zhang H. Yu, X. Chen, and D. Li, "S-OHEM: Stratified Online Hard Example Mining for Object Detection," Proceedings of the Second CCF Chinese Conference on Computer Vision (CCCV), pp.166-177, 2017.
  22. T. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, "Focal Loss for Dense Object Detection," Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017.