Browse > Article
http://dx.doi.org/10.3745/KTSDE.2018.7.5.195

Online Hard Example Mining for Training One-Stage Object Detectors  

Kim, Incheol (경기대학교 컴퓨터과학과)
Publication Information
KIPS Transactions on Software and Data Engineering / v.7, no.5, 2018 , pp. 195-204 More about this Journal
Abstract
In this paper, we propose both a new loss function and an online hard example mining scheme for improving the performance of single-stage object detectors which use deep convolutional neural networks. The proposed loss function and the online hard example mining scheme can not only overcome the problem of imbalance between the number of annotated objects and the number of background examples, but also improve the localization accuracy of each object. Therefore, the loss function and the mining scheme can provide intrinsically fast single-stage detectors with detection performance higher than or similar to that of two-stage detectors. In experiments conducted with the PASCAL VOC 2007 benchmark dataset, we show that the proposed loss function and the online hard example mining scheme can improve the performance of single-stage object detectors.
Keywords
One-Stage Object Detection; Deep Convolutional Neural Network; Online Hard Example Mining; Loss Function;
Citations & Related Records
연도 인용수 순위
  • Reference
1 A. Berg, J. Deng, S. Satheesh, H. Su, and Li Fei-Fei, "IMAGENET Large Scale Visual Recognition Challenge 2017," http://www.image-net.org/challenges/LSVRC/
2 M. Everingham, L. Van Gool, C. K. Williams, et al., "The PASCAL Visual Object Classes Challenge 2018," http://host.robots.ox.ac.uk/pascal/VOC/
3 P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan, "Object Detection with Discriminatively Trained Part-Based Models," IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol.32, No.9, pp.1627-1645, 2010.   DOI
4 R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
5 K. He, X. Zhang, S. Ren, and J. Sun, "Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition," Proceedings of the European Conference on Computer Vision (ECCV), 2014.
6 J. F. Henriques, J. Carreira, R. Caseiro, and J. Batista, "Beyond Hard Negative Mining: Efficient Detector Learning via Block-Circulant Decomposition," Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2013.
7 A. Kanezaki, S. Inaba, Y. Ushiku, et al., "Hard Negative Classes for Multiple Object Detection," Proceedings of IEEE International Conference on Robotics and Automation (ICRA), pp.3066-3073, 2014.
8 O. Canevet and F. Fleuret, "Large Scale Hard Sample Mining with Monte Carlo Tree Search," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
9 A. Krizhevsky, I. Sutskever, and G. Hinton, "ImageNet Classification with Deep Convolutional neural Networks," Advances in Neural Information Processing Systems (NIPS), 2012.
10 K. Simonyan and A. Zisserman, "Very Deep Convolutional Neetworks for Large-Scale Image Recognition," ICLR 2015.
11 T. Lin, M. Maire, S. Belongie, R. Girshicj, et al., "Microsoft COCO: Common Objects in Context," Proceedings of the European Conference on Computer Vision (ECCV), 2014, http://cocodataset.org.
12 C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, "Rethinking the Inception Architecture for Computer Vision," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
13 S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," Advances Neural Information Processing Systems (NIPS), pp.91-99, 2015.
14 Y. Li, K. He, J. Sun, et al., "R-FCN: Object Detection via Region-Based Fully Convolutional Networks," Advances in Neural Information Processing Systems (NIPS), pp. 379-387, 2016.
15 K. He, G. Gkioxari, P. Dollar, and R. Girshick, "Mask R-CNN," arXiv:1703.06870.
16 J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
17 J. Redmon and A. Farhadi, "YOLO9000: Better, Faster, Stronger," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
18 W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu, and A. C. Berg, "SSD: Single Shot Multibox Detector," Proceedings of the European Conference on Computer Vision (ECCV), 2016.
19 A. Shrivastava, A. Gupta, and R. Girshick, "Training Region-Based Object Detectors with Online Hard Example Mining," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.761-769, 2016.
20 K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016
21 M. Li, Z. Zhang H. Yu, X. Chen, and D. Li, "S-OHEM: Stratified Online Hard Example Mining for Object Detection," Proceedings of the Second CCF Chinese Conference on Computer Vision (CCCV), pp.166-177, 2017.
22 T. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, "Focal Loss for Dense Object Detection," Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017.