• 제목/요약/키워드: Image Training Dataset

검색결과 236건 처리시간 0.022초

GAN 적대적 생성 신경망과 이미지 생성 및 변환 기술 동향 (Research Trends of Generative Adversarial Networks and Image Generation and Translation)

  • 조영주;배강민;박종열
    • 전자통신동향분석
    • /
    • 제35권4호
    • /
    • pp.91-102
    • /
    • 2020
  • Recently, generative adversarial networks (GANs) is a field of research that has rapidly emerged wherein many studies conducted shows overwhelming results. Initially, this was at the level of imitating the training dataset. However, the GAN is currently useful in many fields, such as transformation of data categories, restoration of erased parts of images, copying facial expressions of humans, and creation of artworks depicting a dead painter's style. Although many outstanding research achievements have been attracting attention recently, GANs have encountered many challenges. First, they require a large memory facility for research. Second, there are still technical limitations in processing high-resolution images over 4K. Third, many GAN learning methods have a problem of instability in the training stage. However, recent research results show images that are difficult to distinguish whether they are real or fake, even with the naked eye, and the resolution of 4K and above is being developed. With the increase in image quality and resolution, many applications in the field of design and image and video editing are now available, including those that draw a photorealistic image as a simple sketch or easily modify unnecessary parts of an image or a video. In this paper, we discuss how GANs started, including the base architecture and latest technologies of GANs used in high-resolution, high-quality image creation, image and video editing, style translation, content transfer, and technology.

전이학습을 이용한 UNet 기반 건물 추출 딥러닝 모델의 학습률에 따른 성능 향상 분석 (Performance Improvement Analysis of Building Extraction Deep Learning Model Based on UNet Using Transfer Learning at Different Learning Rates)

  • 예철수;안영만;백태웅;김경태
    • 대한원격탐사학회지
    • /
    • 제39권5_4호
    • /
    • pp.1111-1123
    • /
    • 2023
  • 원격탐사 영상을 이용한 지표 속성의 변화를 모니터링 하기 위해서 딥러닝(deep learning) 모델을 이용한 의미론적 영상 분할 방법이 최근에 널리 사용되고 있다. 대표적인 의미론적 영상 분할 딥러닝 모델인 UNet 모델을 비롯하여 다양한 종류의 UNet 기반의 딥러닝 모델들의 성능 향상을 위해서는 학습 데이터셋의 크기가 충분해야 한다. 학습 데이터셋의 크기가 커지면 이를 처리하는 하드웨어 요구 사항도 커지고 학습에 소요되는 시간도 크게 증가되는 문제점이 발생한다. 이런 문제를 해결할 수 있는 방법인 전이학습은 대규모의 학습 데이터 셋이 없어도 모델 성능을 향상시킬 수 있는 효과적인 방법이다. 본 논문에서는 UNet 기반의 딥러닝 모델들을 대표적인 사전 학습 모델(pretrained model)인 VGG19 모델 및 ResNet50 모델과 결합한 세 종류의 전이학습 모델인 UNet-ResNet50 모델, UNet-VGG19 모델, CBAM-DRUNet-VGG19 모델을 제시하고 이를 건물 추출에 적용하여 전이학습 적용에 따른 정확도 향상을 분석하였다. 딥러닝 모델의 성능이 학습률의 영향을 많이 받는 점을 고려하여 학습률 설정에 따른 각 모델별 성능 변화도 함께 분석하였다. 건물 추출 결과의 성능 평가를 위해서 Kompsat-3A 데이터셋, WHU 데이터셋, INRIA 데이터셋을 사용하였으며 세 종류의 데이터셋에 대한 정확도 향상의 평균은 UNet 모델 대비 UNet-ResNet50 모델이 5.1%, UNet-VGG19 모델과 CBAM-DRUNet-VGG19 모델은 동일하게 7.2%의 결과를 얻었다.

Preliminary study of artificial intelligence-based fuel-rod pattern analysis of low-quality tomographic image of fuel assembly

  • Seong, Saerom;Choi, Sehwan;Ahn, Jae Joon;Choi, Hyung-joo;Chung, Yong Hyun;You, Sei Hwan;Yeom, Yeon Soo;Choi, Hyun Joon;Min, Chul Hee
    • Nuclear Engineering and Technology
    • /
    • 제54권10호
    • /
    • pp.3943-3948
    • /
    • 2022
  • Single-photon emission computed tomography is one of the reliable pin-by-pin verification techniques for spent-fuel assemblies. One of the challenges with this technique is to increase the total fuel assembly verification speed while maintaining high verification accuracy. The aim of the present study, therefore, was to develop an artificial intelligence (AI) algorithm-based tomographic image analysis technique for partial-defect verification of fuel assemblies. With the Monte Carlo (MC) simulation technique, a tomographic image dataset consisting of 511 fuel-rod patterns of a 3 × 3 fuel assembly was generated, and with these images, the VGG16, GoogLeNet, and ResNet models were trained. According to an evaluation of these models for different training dataset sizes, the ResNet model showed 100% pattern estimation accuracy. And, based on the different tomographic image qualities, all of the models showed almost 100% pattern estimation accuracy, even for low-quality images with unrecognizable fuel patterns. This study verified that an AI model can be effectively employed for accurate and fast partial-defect verification of fuel assemblies.

다단계 전이 학습을 이용한 유방암 초음파 영상 분류 응용 (Proper Base-model and Optimizer Combination Improves Transfer Learning Performance for Ultrasound Breast Cancer Classification)

  • 겔란 아야나;박진형;최세운
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국정보통신학회 2021년도 추계학술대회
    • /
    • pp.655-657
    • /
    • 2021
  • 인공지능 알고리즘을 이용한 유방암의 조기진단에 관련된 연구는 최근들어 활발하게 진행되고 있으나, 사용자의 목적에 맞는 처리속도 및 정확도 등에 다양한 한계점을 보인다. 이러한 문제를 해결하기 위해, 본 논문에서는 ImageNet에서 학습된 ResNet 모델을 현미경 기반 암세포 이미지에서 활용이 가능한 다단계 전이 학습을 제안하고, 이를 다시 전이 학습하여 초음파 유방암 영상을 양성 및 악성으로 분류하는 실험을 진행하였다. 제안된 다단계 전이 학습 알고리즘은 초음파 유방암 영상을 분류하였을 때 96% 이상의 정확도를 보였으며, 향후 암 세포주 및 실시간 영상처리 등의 추가를 통해 보다 높은 활용도와 정확도를 보일 것으로 기대한다.

  • PDF

이미지 캡셔닝 기반의 새로운 위험도 측정 모델 (A Novel Image Captioning based Risk Assessment Model)

  • 전민성;고재필;최경주
    • 한국정보시스템학회지:정보시스템연구
    • /
    • 제32권4호
    • /
    • pp.119-136
    • /
    • 2023
  • Purpose We introduce a groundbreaking surveillance system explicitly designed to overcome the limitations typically associated with conventional surveillance systems, which often focus primarily on object-centric behavior analysis. Design/methodology/approach The study introduces an innovative approach to risk assessment in surveillance, employing image captioning to generate descriptive captions that effectively encapsulate the interactions among objects, actions, and spatial elements within observed scenes. To support our methodology, we developed a distinctive dataset comprising pairs of [image-caption-danger score] for training purposes. We fine-tuned the BLIP-2 model using this dataset and utilized BERT to decipher the semantic content of the generated captions for assessing risk levels. Findings In a series of experiments conducted with our self-constructed datasets, we illustrate that these datasets offer a wealth of information for risk assessment and display outstanding performance in this area. In comparison to models pre-trained on established datasets, our generated captions thoroughly encompass the necessary object attributes, behaviors, and spatial context crucial for the surveillance system. Additionally, they showcase adaptability to novel sentence structures, ensuring their versatility across a range of contexts.

Deep learning framework for bovine iris segmentation

  • Heemoon Yoon;Mira Park;Hayoung Lee;Jisoon An;Taehyun Lee;Sang-Hee Lee
    • Journal of Animal Science and Technology
    • /
    • 제66권1호
    • /
    • pp.167-177
    • /
    • 2024
  • Iris segmentation is an initial step for identifying the biometrics of animals when establishing a traceability system for livestock. In this study, we propose a deep learning framework for pixel-wise segmentation of bovine iris with a minimized use of annotation labels utilizing the BovineAAEyes80 public dataset. The proposed image segmentation framework encompasses data collection, data preparation, data augmentation selection, training of 15 deep neural network (DNN) models with varying encoder backbones and segmentation decoder DNNs, and evaluation of the models using multiple metrics and graphical segmentation results. This framework aims to provide comprehensive and in-depth information on each model's training and testing outcomes to optimize bovine iris segmentation performance. In the experiment, U-Net with a VGG16 backbone was identified as the optimal combination of encoder and decoder models for the dataset, achieving an accuracy and dice coefficient score of 99.50% and 98.35%, respectively. Notably, the selected model accurately segmented even corrupted images without proper annotation data. This study contributes to the advancement of iris segmentation and the establishment of a reliable DNN training framework.

ETLi: Efficiently annotated traffic LiDAR dataset using incremental and suggestive annotation

  • Kang, Jungyu;Han, Seung-Jun;Kim, Nahyeon;Min, Kyoung-Wook
    • ETRI Journal
    • /
    • 제43권4호
    • /
    • pp.630-639
    • /
    • 2021
  • Autonomous driving requires a computerized perception of the environment for safety and machine-learning evaluation. Recognizing semantic information is difficult, as the objective is to instantly recognize and distinguish items in the environment. Training a model with real-time semantic capability and high reliability requires extensive and specialized datasets. However, generalized datasets are unavailable and are typically difficult to construct for specific tasks. Hence, a light detection and ranging semantic dataset suitable for semantic simultaneous localization and mapping and specialized for autonomous driving is proposed. This dataset is provided in a form that can be easily used by users familiar with existing two-dimensional image datasets, and it contains various weather and light conditions collected from a complex and diverse practical setting. An incremental and suggestive annotation routine is proposed to improve annotation efficiency. A model is trained to simultaneously predict segmentation labels and suggest class-representative frames. Experimental results demonstrate that the proposed algorithm yields a more efficient dataset than uniformly sampled datasets.

전이 학습 기반의 생성 이미지 판별 모델 설계 (Transfer Learning-based Generated Synthetic Images Identification Model)

  • 김채원;윤성연;한명은;박민서
    • 문화기술의 융합
    • /
    • 제10권2호
    • /
    • pp.465-470
    • /
    • 2024
  • 인공지능(Artificial Intelligence, AI) 기반 이미지 생성 기술의 발달로 다양한 이미지가 생성되고 있으며, 이를 정확하게 판별하는 기술이 필요하다. 생성된 이미지 데이터의 양에는 한계가 있으며, 한정된 데이터로 높은 성능을 내기 위해 본 연구에서는 전이 학습(Transfer Learning)을 활용한 생성 이미지를 판별하는 모델을 제안한다. ImageNet 데이터 셋으로 사전학습 된 모델을 입력 데이터 셋인 CIFAKE 데이터 셋에 그대로 적용하여 학습의 시간 비용을 줄인 후, 3개의 은닉층과 1개의 출력층을 더해 모델을 튜닝한다. 모델링 결과, 최종 레이어를 조정한 모델의 성능이 높아짐을 확인하였다. 딥러닝에서 전이 학습을 통해 학습한 후 출력층과 가까운 레이어를 데이터의 특성에 맞게 추가 및 조정하는 과정을 통해 적은 이미지 데이터로 인한 학습 정확도 이슈를 줄이고 생성된 이미지 판별을 할수 있다는 데 의의가 있다.

Joint Demosaicing and Super-resolution of Color Filter Array Image based on Deep Image Prior Network

  • Kurniawan, Edwin;Lee, Suk-Ho
    • International journal of advanced smart convergence
    • /
    • 제11권2호
    • /
    • pp.13-21
    • /
    • 2022
  • In this paper, we propose a learning based joint demosaicing and super-resolution framework which uses only the mosaiced color filter array(CFA) image as the input. As the proposed method works only on the mosaicied CFA image itself, there is no need for a large dataset. Based on our framework, we proposed two different structures, where the first structure uses one deep image prior network, while the second uses two. Experimental results show that even though we use only the CFA image as the training image, the proposed method can result in better visual quality than other bilinear interpolation combined demosaicing methods, and therefore, opens up a new research area for joint demosaicing and super-resolution on raw images.

Arabic Words Extraction and Character Recognition from Picturesque Image Macros with Enhanced VGG-16 based Model Functionality Using Neural Networks

  • Ayed Ahmad Hamdan Al-Radaideh;Mohd Shafry bin Mohd Rahim;Wad Ghaban;Majdi Bsoul;Shahid Kamal;Naveed Abbas
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제17권7호
    • /
    • pp.1807-1822
    • /
    • 2023
  • Innovation and rapid increased functionality in user friendly smartphones has encouraged shutterbugs to have picturesque image macros while in work environment or during travel. Formal signboards are placed with marketing objectives and are enriched with text for attracting people. Extracting and recognition of the text from natural images is an emerging research issue and needs consideration. When compared to conventional optical character recognition (OCR), the complex background, implicit noise, lighting, and orientation of these scenic text photos make this problem more difficult. Arabic language text scene extraction and recognition adds a number of complications and difficulties. The method described in this paper uses a two-phase methodology to extract Arabic text and word boundaries awareness from scenic images with varying text orientations. The first stage uses a convolution autoencoder, and the second uses Arabic Character Segmentation (ACS), which is followed by traditional two-layer neural networks for recognition. This study presents the way that how can an Arabic training and synthetic dataset be created for exemplify the superimposed text in different scene images. For this purpose a dataset of size 10K of cropped images has been created in the detection phase wherein Arabic text was found and 127k Arabic character dataset for the recognition phase. The phase-1 labels were generated from an Arabic corpus of quotes and sentences, which consists of 15kquotes and sentences. This study ensures that Arabic Word Awareness Region Detection (AWARD) approach with high flexibility in identifying complex Arabic text scene images, such as texts that are arbitrarily oriented, curved, or deformed, is used to detect these texts. Our research after experimentations shows that the system has a 91.8% word segmentation accuracy and a 94.2% character recognition accuracy. We believe in the future that the researchers will excel in the field of image processing while treating text images to improve or reduce noise by processing scene images in any language by enhancing the functionality of VGG-16 based model using Neural Networks.