Browse > Article
http://dx.doi.org/10.6109/jkiice.2020.24.2.204

Grad-CAM based deep learning network for location detection of the main object  

Kim, Seon-Jin (Department of Information and Communication Engineering, Chung-buk National University)
Lee, Jong-Keun (Department of Information and Communication Engineering, Chung-buk National University)
Kwak, Nae-Jung (Department of Information and Communication Engineering, Chung-buk National University)
Ryu, Sung-Pil (Department of Information and Communication Engineering, Chung-buk National University)
Ahn, Jae-Hyeong (Department of Information and Communication Engineering, Chung-buk National University)
Abstract
In this paper, we propose an optimal deep learning network architecture for main object location detection through weak supervised learning. The proposed network adds convolution blocks for improving the localization accuracy of the main object through weakly-supervised learning. The additional deep learning network consists of five additional blocks that add a composite product layer based on VGG-16. And the proposed network was trained by the method of weakly-supervised learning that does not require real location information for objects. In addition, Grad-CAM to compensate for the weakness of GAP in CAM, which is one of weak supervised learning methods, was used. The proposed network was tested through the CUB-200-2011 data set, we could obtain 50.13% in top-1 localization error. Also, the proposed network shows higher accuracy in detecting the main object than the existing method.
Keywords
Deep learning; Object detection; VGG-16; Weakly-supervised learning;
Citations & Related Records
연도 인용수 순위
  • Reference
1 O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. "ImageNet Large Scale Visual Recognition Challenge," arXiv:1409.0575v3, 2015.
2 S. Ren, K. He, R. Girshick, and J. Sun. "Faster R-CNN: towards real-time object detection with region proposal networks," arXiv:1506.01497v3, 2016.
3 W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.Y. Fu, and A. C. Berg., "SSD: Single Shot MultiBox Detector," arXiv:1512.02325v5, 2016.
4 J. Choe, and H. Shim, "ADL:Attention-based Dropout Layer for Weakly Supervised Object Localization," arXiv:1908.10028v1, 2019.
5 Y. Wei, J. Feng, X. Liang, M. M. Cheng, Y. Zhao, and S. Yan, "Object region mining with adversarial erasing: A simple classification to semantic segmentation approach," arXiv:1703.08448v3, 2018.
6 B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba. "Learning Deep Features for Discriminative Localization," arXiv:1512.04150, 2015.
7 R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, "Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization," arXiv: 1610.02391, 2016.
8 K. K. Singh, and Y. J. Lee, "Hide-and-Seek: Forcing a network to be meticulous for weakly-supervised object and action localization," arXiv:1704.04232v2, 2017.
9 M. Lin, Q. Chen, and S. Yan, "Network In Network," arXiv:1312.4400, 2013.
10 K. Simonyan, and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," arXiv: 1409.1556, 2014.
11 X. Zhang, Y. Wei, J. Feng, Y. Yang, and T. Huang, "Adversarial complementary learning for weakly supervised object localization," arXiv:1804.06962v1, 2018.
12 C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, "The Caltech-UCSD Birds-200-2011 Dataset," California Institute of Technology, 2011.
13 K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," arXiv:1512.03385, 2015.
14 J. Lee, E. Kim, S. Lee, J. Lee, and S. Yoon, "FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stochastic Inference," arXiv:1902.10421, 2019.