DOI QR코드

DOI QR Code

의미론적 분할 기반 모델을 이용한 조선소 사외 적치장 객체 자동 관리 기술

Segmentation Foundation Model-based Automated Yard Management Algorithm

  • 정민규 (부산대학교 정보융합공학과 (AI전공)) ;
  • 노정현 (부산대학교 정보융합공학과 (AI전공)) ;
  • 김장현 (부산대학교 정보융합공학과 (AI전공)) ;
  • 하성헌 (부산대학교 정보융합공학과 (AI전공)) ;
  • 강태선 (삼성중공업) ;
  • 이병학 (삼성중공업) ;
  • 강기룡 (삼성중공업) ;
  • 김준현 (삼성중공업) ;
  • 박진선 (부산대학교 정보컴퓨터공학부)
  • 투고 : 2023.12.18
  • 심사 : 2024.01.30
  • 발행 : 2024.02.29

초록

조선소에서는 사외 적치장의 관리를 위해 일정 주기로 Unmanned Aerial Vehicle (UAV)을 이용해 항공영상을 획득하고, 이를 사람이 판독하여 적치장 현황을 파악한다. 이러한 방법은 넓은 면적의 사외 적치장 현황을 파악하는 데 상당한 시간과 인력을 요구한다. 본 논문에서는 이러한 문제점을 해결하고 정확한 사외 적치장 현황을 파악하기 위해 사전 학습된 의미론적 분할 기반 모델(Foundation Model)을 활용한 자동 관리 기술을 제안한다. 또한, 조선소 사외 적치장의 경우 관련 부품이나 장비를 포함한 공개 데이터셋이 충분하지 않기 때문에, 의미론적 분할 기반 모델에 필요한 객체 프롬프트(Prompt)를 생성하기 위한 소규모 사외 적치장 객체 데이터셋을 직접 구축하였다. 이를 이용해 객체 검출기를 소규모 데이터셋에 추가 학습하여 초기 객체 후보를 추출하고, 의미론적 분할 기반 모델인 Segment Anything Model (SAM)의 프롬프트로 활용해 정확한 의미론적 분할 결과를 얻는다. 더 나아가, 지속적인 적치장 데이터셋 수집을 위해 SAM을 활용한 훈련 데이터 생성 파이프라인을 제안한다. 본 연구에서 제안한 방법은 기존의 의미론적 분할 방법과 비교하여 평균적 4.00%p, SegFormer에 비해 5.08%p 높은 성능을 달성하였다.

In the shipyard, aerial images are acquired at regular intervals using Unmanned Aerial Vehicles (UAVs) for the management of external storage yards. These images are then investigated by humans to manage the status of the storage yards. This method requires a significant amount of time and manpower especially for large areas. In this paper, we propose an automated management technology based on a semantic segmentation foundation model to address these challenges and accurately assess the status of external storage yards. In addition, as there is insufficient publicly available dataset for external storage yards, we collected a small-scale dataset for external storage yards objects and equipment. Using this dataset, we fine-tune an object detector and extract initial object candidates. They are utilized as prompts for the Segment Anything Model(SAM) to obtain precise semantic segmentation results. Furthermore, to facilitate continuous storage yards dataset collection, we propose a training data generation pipeline using SAM. Our proposed method has achieved 4.00%p higher performance compared to those of previous semantic segmentation methods on average. Specifically, our method has achieved 5.08% higher performance than that of SegFormer.

키워드

과제정보

이 논문은 2023년도 정부(산업통상자원부)의 재원으로 한국산업기술진흥원의 지원(P0017006, 2023년 산업혁신인재성장지원사업)과 과학기술정보통신부 및 정보통신기획평가원의 대학ICT연구센터사업의 지원(IITP-2023-RS-2023-00260098)을 받아 수행된 연구임. 또한 삼성중공업의(주) 지원을 받아 수행된 연구임.

참고문헌

  1. J. Yang, H. Li, J. Zou, J. Junzhi, S. Jiang, and R. Li, et. al, "Concrete Crack Segmentation based on UAV-enabled Edge Computing," Neurocomputing, Vol. 485, pp. 233-241, 2022. https://doi.org/10.1016/j.neucom.2021.03.139
  2. H. Kim, J. Kim, S. Jung, and C. Sim, "Implementation of YOLO based Missing Person Search AI Application System," Smart Media Journal, Vol. 12, No. 9, pp. 159-170,
  3. L. Wang, R. Li, D. Wang, C. Duan, T. Wang, and X. Meng, "Transformer Meets Convolution: A Bilateral Awareness Network for Semantic Segmentation of Very Fine Resolution Urban Scene Images," Remote Sensing, Vol. 13, No. 16, 2021.
  4. Zhangruirui, Youjie, D. Kim, S. Lee, and J. Lee, "Searching Damaged Pine Trees by wilt Disease Based on Deep Learning Using Multispectral Image," Smart Media Journal, Vol. 45, No. 11, pp. 1823-1830, 2020.
  5. H. Myung, S. Kim, K. Choi, D. Kim, G. Lee, and H. Ahn, et. al, "Diagnosis of the Rice Loading for the UAV Image using Vision Transformer," Smart Media Journal, Vol. 12, No. 9, pp. 28-37, 2023. https://doi.org/10.30693/SMJ.2023.12.9.28
  6. O. Ronneberger, P. Fischer, and T. Brox, "U-net: Convolutional Networks for Biomedical Image Segmentation," International Conference Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 234-241, 2015.
  7. H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, "Pyramid Scene Parsing Network," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2881-2890, 2017.
  8. S. Park, and Y. S. Heo, "Multi-Path Feature Fusion Module for Semantic Segmentation," Journal of Korea Multimedia Society, Vol. 24, No. 1, pp. 1-12, 2021. https://doi.org/10.9717/KMMS.2020.24.1.001
  9. R. Strudel, R. Garcia, I. Laptev, and C. Schmid, "Segmenter: Transformer for Semantic Segmentation," Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7262-7272, 2021.
  10. E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, "SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers," Advances in Neural Information Processing Systems (NIPS), pp. 12077-12090, 2021.
  11. Y. Lyu, G. Vosselman, G. S. Xia, A. Yilmax, and Y. Michael, "UAVid: A Sementic Segmentation Dataset for UAV Imagery," ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 165, pp. 108-119, 2020. https://doi.org/10.1016/j.isprsjprs.2020.05.009
  12. Z. Shao, K. Yang, and W. Zhou, "Performance Evaluation of Single-Label and Multi-Label Remote Sensig Image Retrieval Using a Dense Labeling Dataset," Remote Sensing, Vol. 10, No. 6, pp. 964-976,
  13. Z. Shao, W. Zhou, X. Deng, M. Zhang, and Q. Cheng, "Multilabel Remote Sensing Image Retrieval based on Fully Convolutional Network," Remote Sensing, Vol. 13, pp. 318-328, 2020. https://doi.org/10.1109/JSTARS.2019.2961634
  14. A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, and L. Gustafson, et. al, "Segment Anything," arXiv preprint arXiv:2304.02643, doi: https://doi.org/10.48550/arXiv.2304.02643,
  15. Yolov8 https://github.com/ultralytics/ultralytics (2023), (accessed, August, 14, 2023).
  16. J. Long, E. Shelhamer, and T. Darrell, "Fully Convolutional Networks for Semantic Segmentation," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431-3440, 2015.
  17. Y. Yuan, X. Chen, and J. Wang, "Object-Contextual Representations for Semantic Segmentation," Proceedings of European Conference on Computer Vision (ECCV), pp. 173-190, 2020.
  18. L. C. Chen, G. Papandreou, F. Schroff, and H. Adam, "Rethinking Atrous Convolutional for Sem antic Segmentation," arXiv preprint arXiv:1706.05 587, doi: https://doi.org/10.48550/arXiv.1706.05587, 2017.
  19. R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580-587, 2014.
  20. R. Girshick, "Fast R-CNN," Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1440-1448, 2015.
  21. S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," Advances in Neural Information Processing Systems (NIPS), 2018.
  22. K. He, G. Gkioxari, P. Dollar, and R. Girshick, "Mask R-CNN," Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2961-2969, 2017.
  23. P. Sun, R. Zhang, Y. Jiang, T. Kong, C. Xu, and W. Zhan, "Sparse R-CNN: End-to-End Object Detection with Learnable Proposals," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14454-14463, 2021.
  24. X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. D ai, "Deformable DETR: Deformable Transformers for End-to-End Object Detection," arXiv preprint arXiv:2010.04159, doi: https://doi.org/10.48550/arXi v.2010.04159, 2020.
  25. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779-788, 2016.
  26. J. Redmon, and A. Farhadi, "Yolov3: An Increm ental Improvement," arXiv preprint arXiv:1804.02 767, doi: https://doi.org/10.48550/arXiv.1804.02767, 2018.
  27. A. Bochkovskiy, C. Y. Wang, and H. Y. Liao, "Yolov4: Optimal Speed and Accuracy of Object Detection," arXiv preprint arXiv:2004.10934, doi: https://doi.org/10.48550/arXiv.2004.10934, 2020.
  28. C. Y. Wang, A. Bochkovskiy, and H. Y. Liao, "Yolov7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detection," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7464-7475, 2023. 
  29. K. He, X. Chen S, Xie, Y. Li, P. Dollar, and R. Girshick, "Masked Autoencoders are Scalable Vision Learners," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16000-16009, 2022.
  30. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Wei ssenborn, X. Zhai, and T. Unterthiner, et al, "An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale," arXiv preprint arXi v:2010.11929, doi: https://doi.org/10.48550/arXiv.20 10.11929, 2020.
  31. T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, and D. Ramanan, et. al, "Microsoft COCO: Common Objects in Context," Proceedings of European Conference on Computer Vision (ECCV), pp. 740-755, 2014.
  32. J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, and Y. Zhao, et. al, "Deep High-Resolution Representation Learning for Visual Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 43, No. 10, pp. 3349-3364, 2020.
  33. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, and Z. Zhang, et. al, "Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10012-10022, 2021.
  34. T. Xiao, Y. Liu, B. Zhou, Y. Jiang, and J. Sun, "Unified Perceptual Parsing for Scene Understanding," Proceedings of European Conference on Computer Vision (ECCV), pp. 418-434, 2018.