DOI QR코드

DOI QR Code

효율적인 객체 검출을 위해 Attention Process를 적용한 경량화 모델에 대한 연구

A Study on Lightweight Model with Attention Process for Efficient Object Detection

  • 박찬수 (광운대학교 대학원 플라즈마바이오디스플레이) ;
  • 이상훈 (광운대학교 인제니움학부) ;
  • 한현호 (울산대학교 교양학부)
  • Park, Chan-Soo (Dept. of Plasma Bio Display, KwangWoon University) ;
  • Lee, Sang-Hun (Ingenium College of Liberal Arts, KwangWoon University) ;
  • Han, Hyun-Ho (College of General Education, University of Ulsan)
  • 투고 : 2021.04.21
  • 심사 : 2021.05.20
  • 발행 : 2021.05.28

초록

본 논문에서는 기존 객체 검출 방법 대비 매개변수를 감소시킨 경량화 네트워크를 제안하였다. 현재 사용되는 검출 모델의 경우 정확도 향상을 위해 네트워크 복잡도를 크게 늘렸다. 따라서, 제안하는 네트워크는 EfficientNet을 특징 추출 네트워크로 사용하였으며, 후속 레이어는 저수준 세부 특징과 고수준의 의미론적 특징을 활용하기 위해 피라미드 구조로 형성하였다. 피라미드 구조 사이에 attention process를 적용하여 예측에 불필요한 노이즈를 억제하였다. 네트워크의 모든 연산 과정은 depth-wise 및 point-wise 컨볼루션으로 대체하여 연산량을 최소화하였다. 제안하는 네트워크는 PASCAL VOC 데이터셋으로 학습 및 평가하였다. 실험을 통해 융합된 특징은 정제 과정을 거쳐 다양한 객체에 대해 견고한 특성을 보였다. CNN 기반 검출 모델과 비교하였을 때 적은 연산량으로 검출 정확도가 향상되었다. 향후 연구로 객체의 크기에 맞게 앵커의 비율을 조절할 필요성이 사료된다.

In this paper, a lightweight network with fewer parameters compared to the existing object detection method is proposed. In the case of the currently used detection model, the network complexity has been greatly increased to improve accuracy. Therefore, the proposed network uses EfficientNet as a feature extraction network, and the subsequent layers are formed in a pyramid structure to utilize low-level detailed features and high-level semantic features. An attention process was applied between pyramid structures to suppress unnecessary noise for prediction. All computational processes of the network are replaced by depth-wise and point-wise convolutions to minimize the amount of computation. The proposed network was trained and evaluated using the PASCAL VOC dataset. The features fused through the experiment showed robust properties for various objects through a refinement process. Compared with the CNN-based detection model, detection accuracy is improved with a small amount of computation. It is considered necessary to adjust the anchor ratio according to the size of the object as a future study.

키워드

참고문헌

  1. Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580-587).
  2. He, K., Zhang, X., Ren, S., & Sun, J. (2015). Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE transactions on pattern analysis and machine intelligence, 37(9), 1904-1916. https://doi.org/10.1109/TPAMI.2015.2389824
  3. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016, October). Ssd: Single shot multibox detector. In European conference on computer vision (pp. 21-37). Springer, Cham.
  4. Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).
  5. Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., ... & Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.
  6. Tan, M., & Le, Q. (2019, May). Efficientnet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning (pp. 6105-6114). PMLR.
  7. Lin, T. Y., Dollar, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2117-2125).
  8. Neubeck, A., & Van Gool, L. (2006, August). Efficient non-maximum suppression. In 18th International Conference on Pattern Recognition (ICPR'06) (Vol. 3, pp. 850-855). IEEE.
  9. Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1251-1258).
  10. Lin, M., Chen, Q., & Yan, S. (2013). Network in network. arXiv preprint arXiv:1312.4400.
  11. Fu, C. Y., Liu, W., Ranga, A., Tyagi, A., & Berg, A. C. (2017). Dssd: Deconvolutional single shot detector. arXiv preprint arXiv:1701.06659.
  12. Cao, J., Pang, Y., Han, J., & Li, X. (2019). Hierarchical shot detector. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 9705-9714).
  13. Zhang, S., Wen, L., Bian, X., Lei, Z., & Li, S. Z. (2018). Single-shot refinement neural network for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4203-4212).
  14. Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497.
  15. Redmon, J., & Farhadi, A. (2017). YOLO9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7263-7271).
  16. Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767.