• Title/Summary/Keyword: Swin Transformer

Search Result 9, Processing Time 0.036 seconds

The Detection of Multi-class Vehicles using Swin Transformer (Swin Transformer를 이용한 항공사진에서 다중클래스 차량 검출)

  • Lee, Ki-chun;Jeong, Yu-seok;Lee, Chang-woo
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.112-114
    • /
    • 2021
  • In order to detect urban conditions, the number of means of transportation and traffic flow are essential factors to be identified. This paper improved the detection system capabilities shown in previous studies using the SwinTransformer model, which showed higher performance than existing convolutional neural networks, by learning various vehicle types using existing Mask R-CNN and introducing today's widely used transformer model to detect certain types of vehicles in urban aerial images.

  • PDF

Detection of video editing points using facial keypoints (얼굴 특징점을 활용한 영상 편집점 탐지)

  • Joshep Na;Jinho Kim;Jonghyuk Park
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.4
    • /
    • pp.15-30
    • /
    • 2023
  • Recently, various services using artificial intelligence(AI) are emerging in the media field as well However, most of the video editing, which involves finding an editing point and attaching the video, is carried out in a passive manner, requiring a lot of time and human resources. Therefore, this study proposes a methodology that can detect the edit points of video according to whether person in video are spoken by using Video Swin Transformer. First, facial keypoints are detected through face alignment. To this end, the proposed structure first detects facial keypoints through face alignment. Through this process, the temporal and spatial changes of the face are reflected from the input video data. And, through the Video Swin Transformer-based model proposed in this study, the behavior of the person in the video is classified. Specifically, after combining the feature map generated through Video Swin Transformer from video data and the facial keypoints detected through Face Alignment, utterance is classified through convolution layers. In conclusion, the performance of the image editing point detection model using facial keypoints proposed in this paper improved from 87.46% to 89.17% compared to the model without facial keypoints.

Cloud Detection from Sentinel-2 Images Using DeepLabV3+ and Swin Transformer Models (DeepLabV3+와 Swin Transformer 모델을 이용한 Sentinel-2 영상의 구름탐지)

  • Kang, Jonggu;Park, Ganghyun;Kim, Geunah;Youn, Youjeong;Choi, Soyeon;Lee, Yangwon
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.6_2
    • /
    • pp.1743-1747
    • /
    • 2022
  • Sentinel-2 can be used as proxy data for the Korean Compact Advanced Satellite 500-4 (CAS500-4), also known as Agriculture and Forestry Satellite, in terms of spectral wavelengths and spatial resolution. This letter examined cloud detection for later use in the CAS500-4 based on deep learning technologies. DeepLabV3+, a traditional Convolutional Neural Network (CNN) model, and Shifted Windows (Swin) Transformer, a state-of-the-art (SOTA) Transformer model, were compared using 22,728 images provided by Radiant Earth Foundation (REF). Swin Transformer showed a better performance with a precision of 0.886 and a recall of 0.875, which is a balanced result, unbiased between over- and under-estimation. Deep learning-based cloud detection is expected to be a future operational module for CAS500-4 through optimization for the Korean Peninsula.

Semantic Segmentation of the Habitats of Ecklonia Cava and Sargassum in Undersea Images Using HRNet-OCR and Swin-L Models (HRNet-OCR과 Swin-L 모델을 이용한 조식동물 서식지 수중영상의 의미론적 분할)

  • Kim, Hyungwoo;Jang, Seonwoong;Bak, Suho;Gong, Shinwoo;Kwak, Jiwoo;Kim, Jinsoo;Lee, Yangwon
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.5_3
    • /
    • pp.913-924
    • /
    • 2022
  • In this paper, we presented a database construction of undersea images for the Habitats of Ecklonia cava and Sargassum and conducted an experiment for semantic segmentation using state-of-the-art (SOTA) models such as High Resolution Network-Object Contextual Representation (HRNet-OCR) and Shifted Windows-L (Swin-L). The result showed that our segmentation models were superior to the existing experiments in terms of the 29% increased mean intersection over union (mIOU). Swin-L model produced better performance for every class. In particular, the information of the Ecklonia cava class that had small data were also appropriately extracted by Swin-L model. Target objects and the backgrounds were well distinguished owing to the Transformer backbone better than the legacy models. A bigger database under construction will ensure more accuracy improvement and can be utilized as deep learning database for undersea images.

Waterbody Detection for the Reservoirs in South Korea Using Swin Transformer and Sentinel-1 Images (Swin Transformer와 Sentinel-1 영상을 이용한 우리나라 저수지의 수체 탐지)

  • Soyeon Choi;Youjeong Youn;Jonggu Kang;Seoyeon Kim;Yemin Jeong;Yungyo Im;Youngmin Seo;Wanyub Kim;Minha Choi;Yangwon Lee
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_3
    • /
    • pp.949-965
    • /
    • 2023
  • In this study, we propose a method to monitor the surface area of agricultural reservoirs in South Korea using Sentinel-1 synthetic aperture radar images and the deep learning model, Swin Transformer. Utilizing the Google Earth Engine platform, datasets from 2017 to 2021 were constructed for seven agricultural reservoirs, categorized into 700 K-ton, 900 K-ton, and 1.5 M-ton capacities. For four of the reservoirs, a total of 1,283 images were used for model training through shuffling and 5-fold cross-validation techniques. Upon evaluation, the Swin Transformer Large model, configured with a window size of 12, demonstrated superior semantic segmentation performance, showing an average accuracy of 99.54% and a mean intersection over union (mIoU) of 95.15% for all folds. When the best-performing model was applied to the datasets of the remaining three reservoirsfor validation, it achieved an accuracy of over 99% and mIoU of over 94% for all reservoirs. These results indicate that the Swin Transformer model can effectively monitor the surface area of agricultural reservoirs in South Korea.

Method for improving hair detection for hair loss diagnosis in Phototrichogram (모발 정밀검사에서 탈모 진단을 위한 머리카락 검출 개선 방법)

  • Bomin Kim;Byung-Cheol Park;Sang-Il Choi
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2023.01a
    • /
    • pp.89-90
    • /
    • 2023
  • 본 논문은 모발 정밀검사(Phototrichogram)를 통해 일정 간격을 두고 촬영된 환자의 모발 두피 사진을 이용하여 머리카락 검출 및 개수 변화 추이에 따른 환자의 탈모 진단에 도움을 줄 방법을 제안한다. 기존의 탈모 진단을 위해 제안하였던 머리카락 검출 방법에서 사용한 환자의 모발 두피 사진에 Color Slicing을 적용하여 환자의 두피 모발 사진의 픽셀값을 통일성 있게 구성하였다. 또한, 머리카락 검출하기 위한 방법으로 Swin Transformer를 사용하고, 딥러닝 기반의 영상 분할 기법(Image Segmentation)의 하나인 HTC(Hybrid Task Cascade) 모델을 활용하여 좀 더 효과적으로 머리카락을 검출할 수 있는 모델을 제안한다.

  • PDF

Development of Deep Learning Model for Detecting Road Cracks Based on Drone Image Data (드론 촬영 이미지 데이터를 기반으로 한 도로 균열 탐지 딥러닝 모델 개발)

  • Young-Ju Kwon;Sung-ho Mun
    • Land and Housing Review
    • /
    • v.14 no.2
    • /
    • pp.125-135
    • /
    • 2023
  • Drones are used in various fields, including land survey, transportation, forestry/agriculture, marine, environment, disaster prevention, water resources, cultural assets, and construction, as their industrial importance and market size have increased. In this study, image data for deep learning was collected using a mavic3 drone capturing images at a shooting altitude was 20 m with ×7 magnification. Swin Transformer and UperNet were employed as the backbone and architecture of the deep learning model. About 800 sheets of labeled data were augmented to increase the amount of data. The learning process encompassed three rounds. The Cross-Entropy loss function was used in the first and second learning; the Tversky loss function was used in the third learning. In the future, when the crack detection model is advanced through convergence with the Internet of Things (IoT) through additional research, it will be possible to detect patching or potholes. In addition, it is expected that real-time detection tasks of drones can quickly secure the detection of pavement maintenance sections.

Performance Analysis for Accuracy of Personality Recognition Models based on Setting of Margin Values at Face Region Extraction (얼굴 영역 추출 시 여유값의 설정에 따른 개성 인식 모델 정확도 성능 분석)

  • Qiu Xu;Gyuwon Han;Bongjae Kim
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.24 no.1
    • /
    • pp.141-147
    • /
    • 2024
  • Recently, there has been growing interest in personalized services tailored to an individual's preferences. This has led to ongoing research aimed at recognizing and leveraging an individual's personality traits. Among various methods for personality assessment, the OCEAN model stands out as a prominent approach. In utilizing OCEAN for personality recognition, a multi modal artificial intelligence model that incorporates linguistic, paralinguistic, and non-linguistic information is often employed. This paper examines the impact of the margin value set for extracting facial areas from video data on the accuracy of a personality recognition model that uses facial expressions to determine OCEAN traits. The study employed personality recognition models based on 2D Patch Partition, R2plus1D, 3D Patch Partition, and Video Swin Transformer technologies. It was observed that setting the facial area extraction margin to 60 resulted in the highest 1-MAE performance, scoring at 0.9118. These findings indicate the importance of selecting an optimal margin value to maximize the efficiency of personality recognition models.

A high-density gamma white spots-Gaussian mixture noise removal method for neutron images denoising based on Swin Transformer UNet and Monte Carlo calculation

  • Di Zhang;Guomin Sun;Zihui Yang;Jie Yu
    • Nuclear Engineering and Technology
    • /
    • v.56 no.2
    • /
    • pp.715-727
    • /
    • 2024
  • During fast neutron imaging, besides the dark current noise and readout noise of the CCD camera, the main noise in fast neutron imaging comes from high-energy gamma rays generated by neutron nuclear reactions in and around the experimental setup. These high-energy gamma rays result in the presence of high-density gamma white spots (GWS) in the fast neutron image. Due to the microscopic quantum characteristics of the neutron beam itself and environmental scattering effects, fast neutron images typically exhibit a mixture of Gaussian noise. Existing denoising methods in neutron images are difficult to handle when dealing with a mixture of GWS and Gaussian noise. Herein we put forward a deep learning approach based on the Swin Transformer UNet (SUNet) model to remove high-density GWS-Gaussian mixture noise from fast neutron images. The improved denoising model utilizes a customized loss function for training, which combines perceptual loss and mean squared error loss to avoid grid-like artifacts caused by using a single perceptual loss. To address the high cost of acquiring real fast neutron images, this study introduces Monte Carlo method to simulate noise data with GWS characteristics by computing the interaction between gamma rays and sensors based on the principle of GWS generation. Ultimately, the experimental scenarios involving simulated neutron noise images and real fast neutron images demonstrate that the proposed method not only improves the quality and signal-to-noise ratio of fast neutron images but also preserves the details of the original images during denoising.