• Title/Summary/Keyword: Swin 모델

Search Result 12, Processing Time 0.023 seconds

The Detection of Multi-class Vehicles using Swin Transformer (Swin Transformer를 이용한 항공사진에서 다중클래스 차량 검출)

  • Lee, Ki-chun;Jeong, Yu-seok;Lee, Chang-woo
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.112-114
    • /
    • 2021
  • In order to detect urban conditions, the number of means of transportation and traffic flow are essential factors to be identified. This paper improved the detection system capabilities shown in previous studies using the SwinTransformer model, which showed higher performance than existing convolutional neural networks, by learning various vehicle types using existing Mask R-CNN and introducing today's widely used transformer model to detect certain types of vehicles in urban aerial images.

  • PDF

Semantic Segmentation of the Habitats of Ecklonia Cava and Sargassum in Undersea Images Using HRNet-OCR and Swin-L Models (HRNet-OCR과 Swin-L 모델을 이용한 조식동물 서식지 수중영상의 의미론적 분할)

  • Kim, Hyungwoo;Jang, Seonwoong;Bak, Suho;Gong, Shinwoo;Kwak, Jiwoo;Kim, Jinsoo;Lee, Yangwon
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.5_3
    • /
    • pp.913-924
    • /
    • 2022
  • In this paper, we presented a database construction of undersea images for the Habitats of Ecklonia cava and Sargassum and conducted an experiment for semantic segmentation using state-of-the-art (SOTA) models such as High Resolution Network-Object Contextual Representation (HRNet-OCR) and Shifted Windows-L (Swin-L). The result showed that our segmentation models were superior to the existing experiments in terms of the 29% increased mean intersection over union (mIOU). Swin-L model produced better performance for every class. In particular, the information of the Ecklonia cava class that had small data were also appropriately extracted by Swin-L model. Target objects and the backgrounds were well distinguished owing to the Transformer backbone better than the legacy models. A bigger database under construction will ensure more accuracy improvement and can be utilized as deep learning database for undersea images.

Cloud Detection from Sentinel-2 Images Using DeepLabV3+ and Swin Transformer Models (DeepLabV3+와 Swin Transformer 모델을 이용한 Sentinel-2 영상의 구름탐지)

  • Kang, Jonggu;Park, Ganghyun;Kim, Geunah;Youn, Youjeong;Choi, Soyeon;Lee, Yangwon
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.6_2
    • /
    • pp.1743-1747
    • /
    • 2022
  • Sentinel-2 can be used as proxy data for the Korean Compact Advanced Satellite 500-4 (CAS500-4), also known as Agriculture and Forestry Satellite, in terms of spectral wavelengths and spatial resolution. This letter examined cloud detection for later use in the CAS500-4 based on deep learning technologies. DeepLabV3+, a traditional Convolutional Neural Network (CNN) model, and Shifted Windows (Swin) Transformer, a state-of-the-art (SOTA) Transformer model, were compared using 22,728 images provided by Radiant Earth Foundation (REF). Swin Transformer showed a better performance with a precision of 0.886 and a recall of 0.875, which is a balanced result, unbiased between over- and under-estimation. Deep learning-based cloud detection is expected to be a future operational module for CAS500-4 through optimization for the Korean Peninsula.

Detection of video editing points using facial keypoints (얼굴 특징점을 활용한 영상 편집점 탐지)

  • Joshep Na;Jinho Kim;Jonghyuk Park
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.4
    • /
    • pp.15-30
    • /
    • 2023
  • Recently, various services using artificial intelligence(AI) are emerging in the media field as well However, most of the video editing, which involves finding an editing point and attaching the video, is carried out in a passive manner, requiring a lot of time and human resources. Therefore, this study proposes a methodology that can detect the edit points of video according to whether person in video are spoken by using Video Swin Transformer. First, facial keypoints are detected through face alignment. To this end, the proposed structure first detects facial keypoints through face alignment. Through this process, the temporal and spatial changes of the face are reflected from the input video data. And, through the Video Swin Transformer-based model proposed in this study, the behavior of the person in the video is classified. Specifically, after combining the feature map generated through Video Swin Transformer from video data and the facial keypoints detected through Face Alignment, utterance is classified through convolution layers. In conclusion, the performance of the image editing point detection model using facial keypoints proposed in this paper improved from 87.46% to 89.17% compared to the model without facial keypoints.

Waterbody Detection for the Reservoirs in South Korea Using Swin Transformer and Sentinel-1 Images (Swin Transformer와 Sentinel-1 영상을 이용한 우리나라 저수지의 수체 탐지)

  • Soyeon Choi;Youjeong Youn;Jonggu Kang;Seoyeon Kim;Yemin Jeong;Yungyo Im;Youngmin Seo;Wanyub Kim;Minha Choi;Yangwon Lee
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_3
    • /
    • pp.949-965
    • /
    • 2023
  • In this study, we propose a method to monitor the surface area of agricultural reservoirs in South Korea using Sentinel-1 synthetic aperture radar images and the deep learning model, Swin Transformer. Utilizing the Google Earth Engine platform, datasets from 2017 to 2021 were constructed for seven agricultural reservoirs, categorized into 700 K-ton, 900 K-ton, and 1.5 M-ton capacities. For four of the reservoirs, a total of 1,283 images were used for model training through shuffling and 5-fold cross-validation techniques. Upon evaluation, the Swin Transformer Large model, configured with a window size of 12, demonstrated superior semantic segmentation performance, showing an average accuracy of 99.54% and a mean intersection over union (mIoU) of 95.15% for all folds. When the best-performing model was applied to the datasets of the remaining three reservoirsfor validation, it achieved an accuracy of over 99% and mIoU of over 94% for all reservoirs. These results indicate that the Swin Transformer model can effectively monitor the surface area of agricultural reservoirs in South Korea.

Performance Evaluation of Object Detection Deep Learning Model for Paralichthys olivaceus Disease Symptoms Classification (넙치 질병 증상 분류를 위한 객체 탐지 딥러닝 모델 성능 평가)

  • Kyung won Cho;Ran Baik;Jong Ho Jeong;Chan Jin Kim;Han Suk Choi;Seok Won Jung;Hvun Seung Son
    • Smart Media Journal
    • /
    • v.12 no.10
    • /
    • pp.71-84
    • /
    • 2023
  • Paralichthys olivaceus accounts for a large proportion, accounting for more than half of Korea's aquaculture industry. However, about 25-30% of the total breeding volume throughout the year occurs due to diseases, which has a very bad impact on the economic feasibility of fish farms. For the economic growth of Paralichthys olivaceus farms, it is necessary to quickly and accurately diagnose disease symptoms by automating the diagnosis of Paralichthys olivaceus diseases. In this study, we create training data using innovative data collection methods, refining data algorithms, and techniques for partitioning dataset, and compare the Paralichthys olivaceus disease symptom detection performance of four object detection deep learning models(such as YOLOv8, Swin, Vitdet, MvitV2). The experimental findings indicate that the YOLOv8 model demonstrates superiority in terms of average detection rate (mAP) and Estimated Time of Arrival (ETA). If the performance of the AI model proposed in this study is verified, Paralichthys olivaceus farms can diagnose disease symptoms in real time, and it is expected that the productivity of the farm will be greatly improved by rapid preventive measures according to the diagnosis results.

Method for improving hair detection for hair loss diagnosis in Phototrichogram (모발 정밀검사에서 탈모 진단을 위한 머리카락 검출 개선 방법)

  • Bomin Kim;Byung-Cheol Park;Sang-Il Choi
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2023.01a
    • /
    • pp.89-90
    • /
    • 2023
  • 본 논문은 모발 정밀검사(Phototrichogram)를 통해 일정 간격을 두고 촬영된 환자의 모발 두피 사진을 이용하여 머리카락 검출 및 개수 변화 추이에 따른 환자의 탈모 진단에 도움을 줄 방법을 제안한다. 기존의 탈모 진단을 위해 제안하였던 머리카락 검출 방법에서 사용한 환자의 모발 두피 사진에 Color Slicing을 적용하여 환자의 두피 모발 사진의 픽셀값을 통일성 있게 구성하였다. 또한, 머리카락 검출하기 위한 방법으로 Swin Transformer를 사용하고, 딥러닝 기반의 영상 분할 기법(Image Segmentation)의 하나인 HTC(Hybrid Task Cascade) 모델을 활용하여 좀 더 효과적으로 머리카락을 검출할 수 있는 모델을 제안한다.

  • PDF

Analyzing DNN Model Performance Depending on Backbone Network (백본 네트워크에 따른 사람 속성 검출 모델의 성능 변화 분석)

  • Chun-Su Park
    • Journal of the Semiconductor & Display Technology
    • /
    • v.22 no.2
    • /
    • pp.128-132
    • /
    • 2023
  • Recently, with the development of deep learning technology, research on pedestrian attribute recognition technology using deep neural networks has been actively conducted. Existing pedestrian attribute recognition techniques can be obtained in such a way as global-based, regional-area-based, visual attention-based, sequential prediction-based, and newly designed loss function-based, depending on how pedestrian attributes are detected. It is known that the performance of these pedestrian attribute recognition technologies varies greatly depending on the type of backbone network that constitutes the deep neural networks model. Therefore, in this paper, several backbone networks are applied to the baseline pedestrian attribute recognition model and the performance changes of the model are analyzed. In this paper, the analysis is conducted using Resnet34, Resnet50, Resnet101, Swin-tiny, and Swinv2-tiny, which are representative backbone networks used in the fields of image classification, object detection, etc. Furthermore, this paper analyzes the change in time complexity when inferencing each backbone network using a CPU and a GPU.

  • PDF

Performance Analysis for Accuracy of Personality Recognition Models based on Setting of Margin Values at Face Region Extraction (얼굴 영역 추출 시 여유값의 설정에 따른 개성 인식 모델 정확도 성능 분석)

  • Qiu Xu;Gyuwon Han;Bongjae Kim
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.24 no.1
    • /
    • pp.141-147
    • /
    • 2024
  • Recently, there has been growing interest in personalized services tailored to an individual's preferences. This has led to ongoing research aimed at recognizing and leveraging an individual's personality traits. Among various methods for personality assessment, the OCEAN model stands out as a prominent approach. In utilizing OCEAN for personality recognition, a multi modal artificial intelligence model that incorporates linguistic, paralinguistic, and non-linguistic information is often employed. This paper examines the impact of the margin value set for extracting facial areas from video data on the accuracy of a personality recognition model that uses facial expressions to determine OCEAN traits. The study employed personality recognition models based on 2D Patch Partition, R2plus1D, 3D Patch Partition, and Video Swin Transformer technologies. It was observed that setting the facial area extraction margin to 60 resulted in the highest 1-MAE performance, scoring at 0.9118. These findings indicate the importance of selecting an optimal margin value to maximize the efficiency of personality recognition models.

Development of Deep Learning Model for Detecting Road Cracks Based on Drone Image Data (드론 촬영 이미지 데이터를 기반으로 한 도로 균열 탐지 딥러닝 모델 개발)

  • Young-Ju Kwon;Sung-ho Mun
    • Land and Housing Review
    • /
    • v.14 no.2
    • /
    • pp.125-135
    • /
    • 2023
  • Drones are used in various fields, including land survey, transportation, forestry/agriculture, marine, environment, disaster prevention, water resources, cultural assets, and construction, as their industrial importance and market size have increased. In this study, image data for deep learning was collected using a mavic3 drone capturing images at a shooting altitude was 20 m with ×7 magnification. Swin Transformer and UperNet were employed as the backbone and architecture of the deep learning model. About 800 sheets of labeled data were augmented to increase the amount of data. The learning process encompassed three rounds. The Cross-Entropy loss function was used in the first and second learning; the Tversky loss function was used in the third learning. In the future, when the crack detection model is advanced through convergence with the Internet of Things (IoT) through additional research, it will be possible to detect patching or potholes. In addition, it is expected that real-time detection tasks of drones can quickly secure the detection of pavement maintenance sections.