Acknowledgement
이 논문은 2023년도 정부(산업통상자원부)의 재원으로 한국산업기술진흥원의 지원(P0017006, 2023년 산업혁신인재성장지원사업)과 과학기술정보통신부 및 정보통신기획평가원의 대학ICT연구센터사업의 지원(IITP-2023-RS-2023-00260098)을 받아 수행된 연구임. 또한 삼성중공업의(주) 지원을 받아 수행된 연구임.
References
- J. Yang, H. Li, J. Zou, J. Junzhi, S. Jiang, and R. Li, et. al, "Concrete Crack Segmentation based on UAV-enabled Edge Computing," Neurocomputing, Vol. 485, pp. 233-241, 2022. https://doi.org/10.1016/j.neucom.2021.03.139
- H. Kim, J. Kim, S. Jung, and C. Sim, "Implementation of YOLO based Missing Person Search AI Application System," Smart Media Journal, Vol. 12, No. 9, pp. 159-170,
- L. Wang, R. Li, D. Wang, C. Duan, T. Wang, and X. Meng, "Transformer Meets Convolution: A Bilateral Awareness Network for Semantic Segmentation of Very Fine Resolution Urban Scene Images," Remote Sensing, Vol. 13, No. 16, 2021.
- Zhangruirui, Youjie, D. Kim, S. Lee, and J. Lee, "Searching Damaged Pine Trees by wilt Disease Based on Deep Learning Using Multispectral Image," Smart Media Journal, Vol. 45, No. 11, pp. 1823-1830, 2020.
- H. Myung, S. Kim, K. Choi, D. Kim, G. Lee, and H. Ahn, et. al, "Diagnosis of the Rice Loading for the UAV Image using Vision Transformer," Smart Media Journal, Vol. 12, No. 9, pp. 28-37, 2023. https://doi.org/10.30693/SMJ.2023.12.9.28
- O. Ronneberger, P. Fischer, and T. Brox, "U-net: Convolutional Networks for Biomedical Image Segmentation," International Conference Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 234-241, 2015.
- H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, "Pyramid Scene Parsing Network," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2881-2890, 2017.
- S. Park, and Y. S. Heo, "Multi-Path Feature Fusion Module for Semantic Segmentation," Journal of Korea Multimedia Society, Vol. 24, No. 1, pp. 1-12, 2021. https://doi.org/10.9717/KMMS.2020.24.1.001
- R. Strudel, R. Garcia, I. Laptev, and C. Schmid, "Segmenter: Transformer for Semantic Segmentation," Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7262-7272, 2021.
- E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, "SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers," Advances in Neural Information Processing Systems (NIPS), pp. 12077-12090, 2021.
- Y. Lyu, G. Vosselman, G. S. Xia, A. Yilmax, and Y. Michael, "UAVid: A Sementic Segmentation Dataset for UAV Imagery," ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 165, pp. 108-119, 2020. https://doi.org/10.1016/j.isprsjprs.2020.05.009
- Z. Shao, K. Yang, and W. Zhou, "Performance Evaluation of Single-Label and Multi-Label Remote Sensig Image Retrieval Using a Dense Labeling Dataset," Remote Sensing, Vol. 10, No. 6, pp. 964-976,
- Z. Shao, W. Zhou, X. Deng, M. Zhang, and Q. Cheng, "Multilabel Remote Sensing Image Retrieval based on Fully Convolutional Network," Remote Sensing, Vol. 13, pp. 318-328, 2020. https://doi.org/10.1109/JSTARS.2019.2961634
- A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, and L. Gustafson, et. al, "Segment Anything," arXiv preprint arXiv:2304.02643, doi: https://doi.org/10.48550/arXiv.2304.02643,
- Yolov8 https://github.com/ultralytics/ultralytics (2023), (accessed, August, 14, 2023).
- J. Long, E. Shelhamer, and T. Darrell, "Fully Convolutional Networks for Semantic Segmentation," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431-3440, 2015.
- Y. Yuan, X. Chen, and J. Wang, "Object-Contextual Representations for Semantic Segmentation," Proceedings of European Conference on Computer Vision (ECCV), pp. 173-190, 2020.
- L. C. Chen, G. Papandreou, F. Schroff, and H. Adam, "Rethinking Atrous Convolutional for Sem antic Segmentation," arXiv preprint arXiv:1706.05 587, doi: https://doi.org/10.48550/arXiv.1706.05587, 2017.
- R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580-587, 2014.
- R. Girshick, "Fast R-CNN," Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1440-1448, 2015.
- S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," Advances in Neural Information Processing Systems (NIPS), 2018.
- K. He, G. Gkioxari, P. Dollar, and R. Girshick, "Mask R-CNN," Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2961-2969, 2017.
- P. Sun, R. Zhang, Y. Jiang, T. Kong, C. Xu, and W. Zhan, "Sparse R-CNN: End-to-End Object Detection with Learnable Proposals," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14454-14463, 2021.
- X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. D ai, "Deformable DETR: Deformable Transformers for End-to-End Object Detection," arXiv preprint arXiv:2010.04159, doi: https://doi.org/10.48550/arXi v.2010.04159, 2020.
- J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779-788, 2016.
- J. Redmon, and A. Farhadi, "Yolov3: An Increm ental Improvement," arXiv preprint arXiv:1804.02 767, doi: https://doi.org/10.48550/arXiv.1804.02767, 2018.
- A. Bochkovskiy, C. Y. Wang, and H. Y. Liao, "Yolov4: Optimal Speed and Accuracy of Object Detection," arXiv preprint arXiv:2004.10934, doi: https://doi.org/10.48550/arXiv.2004.10934, 2020.
- C. Y. Wang, A. Bochkovskiy, and H. Y. Liao, "Yolov7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detection," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7464-7475, 2023.
- K. He, X. Chen S, Xie, Y. Li, P. Dollar, and R. Girshick, "Masked Autoencoders are Scalable Vision Learners," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16000-16009, 2022.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Wei ssenborn, X. Zhai, and T. Unterthiner, et al, "An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale," arXiv preprint arXi v:2010.11929, doi: https://doi.org/10.48550/arXiv.20 10.11929, 2020.
- T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, and D. Ramanan, et. al, "Microsoft COCO: Common Objects in Context," Proceedings of European Conference on Computer Vision (ECCV), pp. 740-755, 2014.
- J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, and Y. Zhao, et. al, "Deep High-Resolution Representation Learning for Visual Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 43, No. 10, pp. 3349-3364, 2020.
- Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, and Z. Zhang, et. al, "Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10012-10022, 2021.
- T. Xiao, Y. Liu, B. Zhou, Y. Jiang, and J. Sun, "Unified Perceptual Parsing for Scene Understanding," Proceedings of European Conference on Computer Vision (ECCV), pp. 418-434, 2018.