Browse > Article
http://dx.doi.org/10.6109/jkiice.2020.24.12.1581

Object Detection Model Using Attention Mechanism  

Kim, Geun-Sik (Department of Information Convergence Engineering, Pusan National University)
Bae, Jung-Soo (College of Software Convergence, Dongseo University)
Cha, Eui-Young (Department of Computer Engineering, Pusan National University)
Abstract
With the emergence of convolutional neural network in the field of machine learning, the model for solving image processing problems has seen rapid development. However, the computing resources required are also rising, making it difficult to learn from a typical environment. Attention mechanism is originally proposed to prevent the gradient vanishing problem of the recurrent neural network, but this can also be used in a direction favorable to learning of the convolutional neural network. In this paper, attention mechanism is applied to convolutional neural network, and the excellence of the proposed method is demonstrated through the comparison of learning time and performance difference at this time. The proposed model showed that both learning time and performance were superior in object detection based on YOLO compared to models without attention mechanism, and experimentally demonstrated that learning time could be significantly reduced. In addition, this is expected to increase accessibility to machine learning by end users.
Keywords
Machine learning; Object detection; Attention mechanism; CNN(Convolutional neural network);
Citations & Related Records
연도 인용수 순위
  • Reference
1 J. Redmon and A. Farhadi, "YOLOv3: An Incremental Improvement," University of Washington, Washington: WA, Technical Report, 2018.
2 K. Xu, J. Ba, R. Kiros, K. Cho, and A. Courville, "Show, attend and tell: Neural image caption generation with visual attention," in International conference on machine learning, France: FR, pp. 2048-2057, 2015.
3 H. Nam, J. Ha, and J. Kim, "Dual attention networks for multimodal reasoning and matching," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hawaii: HI, pp. 299-307, 2017.
4 F. Wang, M. Jiang, C. Qian, S. Yang, C. Li, H. Zhang, and X. Tang, "Residual attention network for image classification," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hawaii: HI, pp. 3156-3164, 2017.
5 J. Hu, L. Shen, and G. Sun, "Squeeze-and-excitation networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Utah: UT, pp. 7132-7141, 2018.
6 S. Woo, J. Park, J. Lee, and K. So, "Convolutional block attention module," in Proceedings of the European conference on computer vision (ECCV), Germany: DE, pp. 3-19, 2018.
7 Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, and Q. Hu, "ECA-net: Efficient channel attention for deep convolutional neural networks," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, pp. 11534-11542, 2020.
8 Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, "Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression," in Proceeding of the AAAI Conference on Artificial Intelligence, New York: NY, vol. 34, no. 7, pp. 12993-13000, 2020.
9 K. He, G. Gkioxari, P. Dollar, and R. Girshick, "Mask r-cnn," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hawaii: HI, pp. 2961-2969, 2017.
10 H. Qassim, A. Verma, and D. Feinzimer, "Compressed residual-VGG16 CNN model for big data places image recognition," in 2018 IEEE 8th Annual Computing and Communication Workshop and Conference (CCWC), Nevada: NV, pp. 169-175, 2018.
11 K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Nevada: NV, pp. 770-778, 2016.
12 G. Huang, Z. Liu, L. V. D. Maaten, and K. Q. Weinberger, "Densely connected convolutional networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hawaii: HI, pp. 4700-4708, 2017.
13 T. Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, "Focal loss for dense object detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hawaii: HI, pp. 2980-2988, 2017.
14 H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese, "Generalized intersection over union: A metric and a loss for bounding box regression," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, California: CA, pp. 658-666, 2019.
15 A. Mittal, A. Zisserman, and P. Torr. Hand Dataset [Internet]. Available: http://www.robots.ox.ac.uk/-vgg/data/hands/.
16 T. Dozat, "Incorporating nesterov momentum into adam," in ICLR 2016 workshop submission, Puerto Rico: PR, 2016.
17 D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," in Proceedings of the 3rd International Conference on Learning Representations (ICLR), California: CA, pp. 1-15, 2015.