• Title/Summary/Keyword: 이미지 데이터 셋

Search Result 283, Processing Time 0.027 seconds

Sound Event Classification Based on Concatenated Residual Network Applicable to Closed Captioning Services for the Hearing Impaired (청각장애인용 자막방송 서비스를 위한 연쇄잔차 신경망 기반 음향 사건 분류 기법)

  • Kim, Nam Kyun;Park, Dong Keun;Kim, Jun Ho;Kim, Hong Kook;Ahn, Chung Hyun
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2020.07a
    • /
    • pp.472-475
    • /
    • 2020
  • 본 논문에서는 청각장애인에게 자막방송을 제공하기 위하여 오디오 콘텐츠에 등장하는 음향 사건을 분류하는 기법을 제안한다. 제안된 기법은 복수의 잔차 신경망(ResNet)을 연결하는 연쇄잔차(concatenated residual) 신경망 구조를 갖는다. 신경망의 입력 특징을 위해 음성의 멜-주파수 켑스트럼 벡터를 다수의 프레임으로 결합하여 형성한 2 차원 이미지와 전체 프레임에 대한 멜-주파수 켑스트럼 벡터들로부터 얻은 1 차원의 통계 특징벡터를 얻는다. 각각의 입력은 2 차원 잔차 신경망과 1 차원 잔차 신경망으로 모델링되고, 두 개의 잔차 신경망을 연쇄연결(concatenation)하는 구조를 가진 연쇄잔차 신경망으로 구성된다. 성능평가를 위해 수집된 데이터셋으로부터 6-fold 교차검증을 통해 평가한 결과, 85.48%의 분류 정확도를 얻을 수 있었다.

  • PDF

Algorithm for Classifiation of Alzheimer's Dementia based on MRI Image (MRI 이미지 기반의 알츠하이머 치매분류 알고리즘)

  • Lee, Jae-kyung;Seo, Jin-beom;Cho, Young-bok
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.97-99
    • /
    • 2021
  • As the aging society continues in recent years, interest in dementia is increasing. Among them, Alzheimer's disease is a degenerative brain disease that accounts for the largest percentage of all dementia patients, with the medical community currently not offering clear prevention and treatment for Alzheimer's disease, and the importance of early treatment and early prevention is emphasized. In this paper, we intend to find the most efficient activation function by combining various activation functions centering on convolutional neural networks using MRI datasets of normal people and patients with Alzheimer's disease. In addition, it is intended to be used as a dementia classification modeling suitable for the medical field in the future through Alzheimer's dementia classification modeling.

  • PDF

Comparative Analysis of CNN Techniques designed for Rotated Object Classifiation (회전된 객체 분류를 위한 CNN 기법들의 성능 비교 분석)

  • Hee-Il Hahn
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.24 no.1
    • /
    • pp.181-187
    • /
    • 2024
  • There are two kinds of well-known CNN methods, the group equivariant CNN and the CNN using steerable filters, which have excellent classification performances for randomly rotated objects in image space. This paper describes their mathematical structures and introduces implementation methods. We implement them, including the existing CNN, which have the same number of filters, then compare and analyze their performances by simulating them with the randomly rotated MNIST. According to the experimental results, the steerable CNN, which shows a classification improvement over the others, has a relatively small number of parameters to learn, so performance degradation is relatively small even when the size of the training dataset is reduced.

The Comparison of Segmentation Performance between SegFormer and U-Net on Railway Components (SegFormer 및 U-Net의 철도 구성요소 객체 분할 성능 비교)

  • Jaehyun Lee;Changjoon Park;Namjung Kim;Junhwi Park;Jeonghwan Gwak
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2024.01a
    • /
    • pp.347-348
    • /
    • 2024
  • 본 논문에서는 철도 구성요소 모니터링을 위한 효율적인 객체 분할 기법으로 사전학습된 SegFormer 모델의 적용을 제안하고, 객체 분할을 위해 보편적으로 사용되는 U-Net 모델과의 성능 비교 분석을 진행하였다. 철도의 주요 구성요소인 선로, 침목, 고정 장치, 배경을 분할할 수 있도록 라벨링된 데이터셋을 학습에 사용하였다. SegFormer 모델이 대조군인 U-Net보다 성능이 Jaccard Score 기준 5.29% 향상됨에 따라 Vision Transformer 기반의 모델이 기존 CNN 기반 모델의 이미지의 전역적인 문맥을 파악하기 상대적으로 어렵다는 한계를 극복하고, 철도 구성요소 객체 분할에 더욱 효율적인 모델임을 확인한다.

  • PDF

A Study about Learning Graph Representation on Farmhouse Apple Quality Images with Graph Transformer (그래프 트랜스포머 기반 농가 사과 품질 이미지의 그래프 표현 학습 연구)

  • Ji Hun Bae;Ju Hwan Lee;Gwang Hyun Yu;Gyeong Ju Kwon;Jin Young Kim
    • Smart Media Journal
    • /
    • v.12 no.1
    • /
    • pp.9-16
    • /
    • 2023
  • Recently, a convolutional neural network (CNN) based system is being developed to overcome the limitations of human resources in the apple quality classification of farmhouse. However, since convolutional neural networks receive only images of the same size, preprocessing such as sampling may be required, and in the case of oversampling, information loss of the original image such as image quality degradation and blurring occurs. In this paper, in order to minimize the above problem, to generate a image patch based graph of an original image and propose a random walk-based positional encoding method to apply the graph transformer model. The above method continuously learns the position embedding information of patches which don't have a positional information based on the random walk algorithm, and finds the optimal graph structure by aggregating useful node information through the self-attention technique of graph transformer model. Therefore, it is robust and shows good performance even in a new graph structure of random node order and an arbitrary graph structure according to the location of an object in an image. As a result, when experimented with 5 apple quality datasets, the learning accuracy was higher than other GNN models by a minimum of 1.3% to a maximum of 4.7%, and the number of parameters was 3.59M, which was about 15% less than the 23.52M of the ResNet18 model. Therefore, it shows fast reasoning speed according to the reduction of the amount of computation and proves the effect.

Stereo Semi-direct Visual Odometry with Adaptive Motion Prior Weights of Lunar Exploration Rover (달 탐사 로버의 적응형 움직임 가중치에 따른 스테레오 준직접방식 비주얼 오도메트리)

  • Jung, Jae Hyung;Heo, Se Jong;Park, Chan Gook
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.46 no.6
    • /
    • pp.479-486
    • /
    • 2018
  • In order to ensure reliable navigation performance of a lunar exploration rover, navigation algorithms using additional sensors such as inertial measurement units and cameras are essential on lunar surface in the absence of a global navigation satellite system. Unprecedentedly, Visual Odometry (VO) using a stereo camera has been successfully implemented at the US Mars rovers. In this paper, we estimate the 6-DOF pose of the lunar exploration rover from gray images of a lunar-like terrains. The proposed algorithm estimates relative pose of consecutive images by sparse image alignment based semi-direct VO. In order to overcome vulnerability to non-linearity of direct VO, we add adaptive motion prior weights calculated from a linear function of the previous pose to the optimization cost function. The proposed algorithm is verified in lunar-like terrain dataset recorded by Toronto University reflecting the characteristics of the actual lunar environment.

Study on Detection Technique for Sea Fog by using CCTV Images and Convolutional Neural Network (CCTV 영상과 합성곱 신경망을 활용한 해무 탐지 기법 연구)

  • Kim, Na-Kyeong;Bak, Su-Ho;Jeong, Min-Ji;Hwang, Do-Hyun;Enkhjargal, Unuzaya;Park, Mi-So;Kim, Bo-Ram;Yoon, Hong-Joo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.15 no.6
    • /
    • pp.1081-1088
    • /
    • 2020
  • In this paper, the method of detecting sea fog through CCTV image is proposed based on convolutional neural networks. The study data randomly extracted 1,0004 images, sea-fog and not sea-fog, from a total of 11 ports or beaches (Busan Port, Busan New Port, Pyeongtaek Port, Incheon Port, Gunsan Port, Daesan Port, Mokpo Port, Yeosu Gwangyang Port, Ulsan Port, Pohang Port, and Haeundae Beach) based on 1km of visibility. 80% of the total 1,0004 datasets were extracted and used for learning the convolutional neural network model. The model has 16 convolutional layers and 3 fully connected layers, and a convolutional neural network that performs Softmax classification in the last fully connected layer is used. Model accuracy evaluation was performed using the remaining 20%, and the accuracy evaluation result showed a classification accuracy of about 96%.

Adaptive Face Mask Detection System based on Scene Complexity Analysis

  • Kang, Jaeyong;Gwak, Jeonghwan
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.5
    • /
    • pp.1-8
    • /
    • 2021
  • Coronavirus disease 2019 (COVID-19) has affected the world seriously. Every person is required for wearing a mask properly in a public area to prevent spreading the virus. However, many people are not wearing a mask properly. In this paper, we propose an efficient mask detection system. In our proposed system, we first detect the faces of input images using YOLOv5 and classify them as the one of three scene complexity classes (Simple, Moderate, and Complex) based on the number of detected faces. After that, the image is fed into the Faster-RCNN with the one of three ResNet (ResNet-18, 50, and 101) as backbone network depending on the scene complexity for detecting the face area and identifying whether the person is wearing the mask properly or not. We evaluated our proposed system using public mask detection datasets. The results show that our proposed system outperforms other models.

A Study on the Generation of Webtoons through Fine-Tuning of Diffusion Models (확산모델의 미세조정을 통한 웹툰 생성연구)

  • Kyungho Yu;Hyungju Kim;Jeongin Kim;Chanjun Chun;Pankoo Kim
    • Smart Media Journal
    • /
    • v.12 no.7
    • /
    • pp.76-83
    • /
    • 2023
  • This study proposes a method to assist webtoon artists in the process of webtoon creation by utilizing a pretrained Text-to-Image model to generate webtoon images from text. The proposed approach involves fine-tuning a pretrained Stable Diffusion model using a webtoon dataset transformed into the desired webtoon style. The fine-tuning process, using LoRA technique, completes in a quick training time of approximately 4.5 hours with 30,000 steps. The generated images exhibit the representation of shapes and backgrounds based on the input text, resulting in the creation of webtoon-like images. Furthermore, the quantitative evaluation using the Inception score shows that the proposed method outperforms DCGAN-based Text-to-Image models. If webtoon artists adopt the proposed Text-to-Image model for webtoon creation, it is expected to significantly reduce the time required for the creative process.

A Study on the Production of 3D Datasets for Stone Pagodas by Period in Korea

  • Byong-Kwon Lee;Eun-Ji Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.9
    • /
    • pp.105-111
    • /
    • 2023
  • Currently, most of content restoration using artificial intelligence learning is 2D learning. However, 3D form of artificial intelligence learning is in an incomplete state due to the disadvantage of requiring a lot of computation and learning speed from the existing 2 axes (X, Y) to 3 axes (X, Y, Z). The purpose of this paper is to secure a data-set for artificial intelligence learning by analyzing and 3D modeling the stone pagodas of ourinari by era based on the two-dimensional information (image) of cultural assets. In addition, we analyzed the differences and characteristics of towers in each era in Korea, and proposed a feature modeling method suitable for artificial intelligence learning. Restoration of cultural properties relies on a variety of materials, expert techniques and historical archives. By recording and managing the information necessary for the restoration of cultural properties through this study, it is expected that it will be used as an important documentary heritage for restoring and maintaining Korean traditional pagodas in the future.