DOI QR코드

DOI QR Code

Modified YOLOv4S based on Deep learning with Feature Fusion and Spatial Attention

특징 융합과 공간 강조를 적용한 딥러닝 기반의 개선된 YOLOv4S

  • 황범연 (광운대학교 플라즈마바이오디스플레이학부) ;
  • 이상훈 (광운대학교 인제니움학부) ;
  • 이승현 (광운대학교 인제니움학부)
  • Received : 2021.09.06
  • Accepted : 2021.12.20
  • Published : 2021.12.28

Abstract

In this paper proposed a feature fusion and spatial attention-based modified YOLOv4S for small and occluded detection. Conventional YOLOv4S is a lightweight network and lacks feature extraction capability compared to the method of the deep network. The proposed method first combines feature maps of different scales with feature fusion to enhance semantic and low-level information. In addition expanding the receptive field with dilated convolution, the detection accuracy for small and occluded objects was improved. Second by improving the conventional spatial information with spatial attention, the detection accuracy of objects classified and occluded between objects was improved. PASCAL VOC and COCO datasets were used for quantitative evaluation of the proposed method. The proposed method improved mAP by 2.7% in the PASCAL VOC dataset and 1.8% in the COCO dataset compared to the Conventional YOLOv4S.

본 논문은 특징 융합과 공간 강조를 적용하여 작고 페색된 객체 검출을 위한 개선된 YOLOv4S를 제안하였다. 기존 YOLOv4S은 경량 네트워크로 깊은 네트워크 대비 특징 추출 능력 부족하다. 제안하는 방법은 먼저 feature fusion으로 서로 다른 크기의 특징맵을 결합하여 의미론적 정보 및 저수준 정보를 개선하였다. 또한, dilated convolution으로 수용 영역을 확장하여 작고 폐색된 객체에 대한 검출 정확도를 향상시켰다. 두 번째로 spatial attention으로 기존 공간 정보 개선하여 객체간 구분되어 폐색된 객체의 검출 정확도를 향상시켰다. 제안하는 방법의 정량적 평가를 위해 PASCAL VOC 및 COCO 데이터세트를 사용하였다. 실험을 통해 제안하는 방법은 기존 YOLOv4S 대비 PASCAL VOC 데이터세트에서 mAP 2.7% 및 COCO 데이터세트에서 mAP 1.8% 향상되었다.

Keywords

References

  1. S. Shin, S. Lee & H. Han. (2021). A Study on Residual U-Net for Semantic Segmentation based on Deep Learning. Journal of Digital Convergence, 19(6), 251-258. https://doi.org/10.14400/JDC.2021.19.6.251
  2. C. S. Park, S. H. Lee & H. H. Han. (2021). A Study on Lightweight Model with Attention Process for Efficient Object Detection. Journal of Digital Convergence, 19(5), 307-313. https://doi.org/10.14400/JDC.2021.19.5.307
  3. K. He, G. Gkioxari, P. Dollar & R. Girshick. (2020). Mask R-CNN. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(2), 386-397. https://doi.org/10.1109/tpami.2018.2844175
  4. J. Qu, C. Su, Z. Zhang & A. Razi. (2020). Dilated convolution and feature fusion SSD network for small object detection in remote sensing images. IEEE Access, 8, 82832-82843. https://doi.org/10.1109/access.2020.2991439
  5. S. Shin, H. Han & S. H. Lee. (2021). Improved YOLOv3 with duplex FPN for object detection based on deep learning. The International Journal of Electrical Engineering & Education.
  6. M. Tan, R. Pang & Q. V. Le. (2020). EfficientDet: Scalable and Efficient Object Detection. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 10778-10787.
  7. C. Y. Wang, A. Bochkovskiy & H. Y. M. Liao. (2020). Scaled-YOLOv4: Scaling Cross Stage Partial Network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 13029-13038).
  8. A. Bochkovskiy, C. Y. Wang & H. Y. M. Liao. (2020). YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv preprint arXiv:2004.10934.
  9. S. Liu, L. Qi, H. Qin, J. Shi & J. Jia. (2018). Path Aggregation Network for Instance Segmentation. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8759-8768.
  10. Woo, S., Park, J., Lee, J. Y., & Kweon, I. S. (2018). CBAM: Convolutional block attention module. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): Vol. 11211 LNCS (pp. 3-19).
  11. J. Redmon & A. Farhadi. (2018). YOLOv3: An Incremental Improvement. ArXiv.
  12. C. Wang, H. M. Liao, I. H. Yeh, Y. Wu, P. Chen & J. W. Hsieh. (2019). CSPNet: A New Backbone that can Enhance Learning Capability of CNN. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops (pp. 390-391).
  13. Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye & D. Ren. (2020). Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. Proceedings of the AAAI Conference on Artificial Intelligence, 34(7), 12993-13000.
  14. M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn & A. Zisserman. (2010). The Pascal Visual Object Classes (VOC) Challenge. International Journal of Computer Vision, 88(2), 303-338. https://doi.org/10.1007/s11263-009-0275-4
  15. T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar & C. L. Zitnick. (2014). Microsoft COCO: Common Objects in Context. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): Vol. 8693 LNCS (Issue PART 5, pp. 740-755).