DOI QR코드

DOI QR Code

A Study on Attention Mechanism in DeepLabv3+ for Deep Learning-based Semantic Segmentation

딥러닝 기반의 Semantic Segmentation을 위한 DeepLabv3+에서 강조 기법에 관한 연구

  • Shin, SeokYong (Department of Plasma Bio Display, Kwangwoon University) ;
  • Lee, SangHun (Ingenium College of Liberal Arts, Kwangwoon University) ;
  • Han, HyunHo (College of General Education, University of Ulsan)
  • 신석용 (광운대학교 플라즈마바이오디스플레이학과) ;
  • 이상훈 (광운대학교 인제니움학부) ;
  • 한현호 (울산대학교 교양대학)
  • Received : 2021.09.06
  • Accepted : 2021.10.20
  • Published : 2021.10.28

Abstract

In this paper, we proposed a DeepLabv3+ based encoder-decoder model utilizing an attention mechanism for precise semantic segmentation. The DeepLabv3+ is a semantic segmentation method based on deep learning and is mainly used in applications such as autonomous vehicles, and infrared image analysis. In the conventional DeepLabv3+, there is little use of the encoder's intermediate feature map in the decoder part, resulting in loss in restoration process. Such restoration loss causes a problem of reducing segmentation accuracy. Therefore, the proposed method firstly minimized the restoration loss by additionally using one intermediate feature map. Furthermore, we fused hierarchically from small feature map in order to effectively utilize this. Finally, we applied an attention mechanism to the decoder to maximize the decoder's ability to converge intermediate feature maps. We evaluated the proposed method on the Cityscapes dataset, which is commonly used for street scene image segmentation research. Experiment results showed that our proposed method improved segmentation results compared to the conventional DeepLabv3+. The proposed method can be used in applications that require high accuracy.

본 논문에서는 정밀한 semantic segmentation을 위해 강조 기법을 활용한 DeepLabv3+ 기반의 인코더-디코더 모델을 제안하였다. DeepLabv3+는 딥러닝 기반 semantic segmentation 방법이며 자율주행 자동차, 적외선 이미지 분석 등의 응용 분야에서 주로 사용된다. 기존 DeepLabv3+는 디코더 부분에서 인코더의 중간 특징맵 활용이 적어 복원 과정에서 손실이 발생한다. 이러한 복원 손실은 분할 정확도를 감소시키는 문제를 초래한다. 따라서 제안하는 방법은 하나의 중간 특징맵을 추가로 활용하여 복원 손실을 최소화하였다. 또한, 추가 중간 특징맵을 효과적으로 활용하기 위해 작은 크기의 특징맵부터 계층적으로 융합하였다. 마지막으로, 디코더에 강조 기법을 적용하여 디코더의 중간 특징맵 융합 능력을 극대화하였다. 본 논문은 거리 영상 분할연구에 공통으로 사용되는 Cityscapes 데이터셋에서 제안하는 방법을 평가하였다. 실험 결과는 제안하는 방법이 기존 DeepLabv3+와 비교하여 향상된 분할 결과를 보였다. 이를 통해 제안하는 방법은 높은 정확도가 필요한 응용 분야에서 활용될 수 있다.

Keywords

References

  1. S. Y. Shin, S. H. Lee & H. H. Han (2021). A Study on Residual U-Net for Semantic Segmentation based on Deep Learning. Journal of Digital Convergence, 19(6), 251-258. DOI : 10.14400/JDC.2021.19.6.251
  2. S. Y. Shin, S. H. Lee & J. S. Kim (2021) Modified DeepLabV3+ for Semantic Segmentation based on Deep Learning. The 11th International Conference on Convergence Technology. (pp.266-367). Jeju : KCS.
  3. S. Y. Shin, H. H. Han & S. H. Lee (2021). Improved YOLOv3 with duplex FPN for object detection based on deep learning. The International Journal of Electrical Engineering & Education, 002072092098352. DOI : 10.1177/0020720920983524
  4. E. Shelhamer, J. Long & T. Darrell. (2017). Fully Convolutional Networks for Semantic Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4), 640-651. DOI : 10.1109/TPAMI.2016.2572683
  5. O. Ronneberger, P. Fischer & T. Brox. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9351, Issue Cvd, pp. 234-241). DOI : 10.1007/978-3-319-24574-4_28
  6. V. Badrinarayanan, A. Kendall & R. Cipolla. (2017). SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12), 2481-2495. DOI : 10.1109/TPAMI.2016.2644615
  7. L. Chen, G. Papandreou, I. Kokkinos, K. Murphy & A. L. Yuille. (2014). Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. arXiv preprint arXiv:1412.7062. 1-14. http://arxiv.org/abs/1412.7062
  8. L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy & A. L. Yuille. (2018). DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834-848. DOI : 10.1109/TPAMI.2017.2699184
  9. L. Chen, G. Papandreou, F. Schroff & H. Adam. (2017). Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv preprint arXiv:1706.05587. http://arxiv.org/abs/1706.05587
  10. L. C. Chen, Y. Zhu, G. Papandreou, F. Schroff & H. Adam. (2018). Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Pertanika Journal of Tropical Agricultural Science, 34(1), 833-851. DOI : 10.1007/978-3-030-01234-2_49
  11. E. Sovetkin, E. J. Achterberg, T. Weber & B. E. Pieters. (2021). Encoder-Decoder Semantic Segmentation Models for Electroluminescence Images of Thin-Film Photovoltaic Modules. IEEE Journal of Photovoltaics, 11(2), 444-452. DOI : 10.1109/JPHOTOV.2020.3041240
  12. F. Chollet. (2017). Xception: Deep Learning with Depthwise Separable Convolutions. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017-Janua, 1800-1807. DOI : 10.1109/CVPR.2017.195
  13. K. He, X. Zhang, S. Ren & J. Sun. (2016). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  14. S. Estrada, S. Conjeti, M. Ahmad, N. Navab & M. Reuter. (2018). Competition vs. Concatenation in Skip Connections of Fully Convolutional Networks. In International Workshop on Machine Learning in Medical Imaging (pp. 214-222). Springer, Cham. DOI : 10.1007/978-3-030-00919-9_25
  15. M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth & B. Schiele. (2016). The Cityscapes Dataset for Semantic Urban Scene Understanding. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016-Decem, 3213-3223. DOI : 10.1109/CVPR.2016.350
  16. I. Loshchilov & F. Hutter. (2019). Decoupled Weight Decay Regularization. 7th International Conference on Learning Representations, ICLR 2019. http://arxiv.org/abs/1711.05101