• Title/Summary/Keyword: Dataset Augmentation

Search Result 104, Processing Time 0.022 seconds

Comparative study of data augmentation methods for fake audio detection (음성위조 탐지에 있어서 데이터 증강 기법의 성능에 관한 비교 연구)

  • KwanYeol Park;Il-Youp Kwak
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.2
    • /
    • pp.101-114
    • /
    • 2023
  • The data augmentation technique is effectively used to solve the problem of overfitting the model by allowing the training dataset to be viewed from various perspectives. In addition to image augmentation techniques such as rotation, cropping, horizontal flip, and vertical flip, occlusion-based data augmentation methods such as Cutmix and Cutout have been proposed. For models based on speech data, it is possible to use an occlusion-based data-based augmentation technique after converting a 1D speech signal into a 2D spectrogram. In particular, SpecAugment is an occlusion-based augmentation technique for speech spectrograms. In this study, we intend to compare and study data augmentation techniques that can be used in the problem of false-voice detection. Using data from the ASVspoof2017 and ASVspoof2019 competitions held to detect fake audio, a dataset applied with Cutout, Cutmix, and SpecAugment, an occlusion-based data augmentation method, was trained through an LCNN model. All three augmentation techniques, Cutout, Cutmix, and SpecAugment, generally improved the performance of the model. In ASVspoof2017, Cutmix, in ASVspoof2019 LA, Mixup, and in ASVspoof2019 PA, SpecAugment showed the best performance. In addition, increasing the number of masks for SpecAugment helps to improve performance. In conclusion, it is understood that the appropriate augmentation technique differs depending on the situation and data.

3D Medical Image Data Augmentation for CT Image Segmentation (CT 이미지 세그멘테이션을 위한 3D 의료 영상 데이터 증강 기법)

  • Seonghyeon Ko;Huigyu Yang;Moonseong Kim;Hyunseung Choo
    • Journal of Internet Computing and Services
    • /
    • v.24 no.4
    • /
    • pp.85-92
    • /
    • 2023
  • Deep learning applications are increasingly being leveraged for disease detection tasks in medical imaging modalities such as X-ray, Computed Tomography (CT), and Magnetic Resonance Imaging (MRI). Most data-centric deep learning challenges necessitate the use of supervised learning methodologies to attain high accuracy and to facilitate performance evaluation through comparison with the ground truth. Supervised learning mandates a substantial amount of image and label sets, however, procuring an adequate volume of medical imaging data for training is a formidable task. Various data augmentation strategies can mitigate the underfitting issue inherent in supervised learning-based models that are trained on limited medical image and label sets. This research investigates the enhancement of a deep learning-based rib fracture segmentation model and the efficacy of data augmentation techniques such as left-right flipping, rotation, and scaling. Augmented dataset with L/R flipping and rotations(30°, 60°) increased model performance, however, dataset with rotation(90°) and ⨯0.5 rescaling decreased model performance. This indicates the usage of appropriate data augmentation methods depending on datasets and tasks.

A Deep Learning Approach for Classification of Cloud Image Patches on Small Datasets

  • Phung, Van Hiep;Rhee, Eun Joo
    • Journal of information and communication convergence engineering
    • /
    • v.16 no.3
    • /
    • pp.173-178
    • /
    • 2018
  • Accurate classification of cloud images is a challenging task. Almost all the existing methods rely on hand-crafted feature extraction. Their limitation is low discriminative power. In the recent years, deep learning with convolution neural networks (CNNs), which can auto extract features, has achieved promising results in many computer vision and image understanding fields. However, deep learning approaches usually need large datasets. This paper proposes a deep learning approach for classification of cloud image patches on small datasets. First, we design a suitable deep learning model for small datasets using a CNN, and then we apply data augmentation and dropout regularization techniques to increase the generalization of the model. The experiments for the proposed approach were performed on SWIMCAT small dataset with k-fold cross-validation. The experimental results demonstrated perfect classification accuracy for most classes on every fold, and confirmed both the high accuracy and the robustness of the proposed model.

Human Detection using Real-virtual Augmented Dataset

  • Jongmin, Lee;Yongwan, Kim;Jinsung, Choi;Ki-Hong, Kim;Daehwan, Kim
    • Journal of information and communication convergence engineering
    • /
    • v.21 no.1
    • /
    • pp.98-102
    • /
    • 2023
  • This paper presents a study on how augmenting semi-synthetic image data improves the performance of human detection algorithms. In the field of object detection, securing a high-quality data set plays the most important role in training deep learning algorithms. Recently, the acquisition of real image data has become time consuming and expensive; therefore, research using synthesized data has been conducted. Synthetic data haves the advantage of being able to generate a vast amount of data and accurately label it. However, the utility of synthetic data in human detection has not yet been demonstrated. Therefore, we use You Only Look Once (YOLO), the object detection algorithm most commonly used, to experimentally analyze the effect of synthetic data augmentation on human detection performance. As a result of training YOLO using the Penn-Fudan dataset, it was shown that the YOLO network model trained on a dataset augmented with synthetic data provided high-performance results in terms of the Precision-Recall Curve and F1-Confidence Curve.

Flow Assessment and Prediction in the Asa River Watershed using different Artificial Intelligence Techniques on Small Dataset

  • Kareem Kola Yusuff;Adigun Adebayo Ismail;Park Kidoo;Jung Younghun
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.95-95
    • /
    • 2023
  • Common hydrological problems of developing countries include poor data management, insufficient measuring devices and ungauged watersheds, leading to small or unreliable data availability. This has greatly affected the adoption of artificial intelligence techniques for flood risk mitigation and damage control in several developing countries. While climate datasets have recorded resounding applications, but they exhibit more uncertainties than ground-based measurements. To encourage AI adoption in developing countries with small ground-based dataset, we propose data augmentation for regression tasks and compare performance evaluation of different AI models with and without data augmentation. More focus is placed on simple models that offer lesser computational cost and higher accuracy than deeper models that train longer and consume computer resources, which may be insufficient in developing countries. To implement this approach, we modelled and predicted streamflow data of the Asa River Watershed located in Ilorin, Kwara State Nigeria. Results revealed that adequate hyperparameter tuning and proper model selection improve streamflow prediction on small water dataset. This approach can be implemented in data-scarce regions to ensure timely flood intervention and early warning systems are adopted in developing countries.

  • PDF

Synthetic Data Augmentation for Plant Disease Image Generation using GAN (GAN을 이용한 식물 병해 이미지 합성 데이터 증강)

  • Nazki, Haseeb;Lee, Jaehwan;Yoon, Sook;Park, Dong Sun
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2018.05a
    • /
    • pp.459-460
    • /
    • 2018
  • In this paper, we present a data augmentation method that generates synthetic plant disease images using Generative Adversarial Networks (GANs). We propose a training scheme that first uses classical data augmentation techniques to enlarge the training set and then further enlarges the data size and its diversity by applying GAN techniques for synthetic data augmentation. Our method is demonstrated on a limited dataset of 2789 images of tomato plant diseases (Gray mold, Canker, Leaf mold, Plague, Leaf miner, Whitefly etc.).

  • PDF

Object Detection Accuracy Improvements of Mobility Equipments through Substitution Augmentation of Similar Objects (유사물체 치환증강을 통한 기동장비 물체 인식 성능 향상)

  • Heo, Jiseong;Park, Jihun
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.25 no.3
    • /
    • pp.300-310
    • /
    • 2022
  • A vast amount of labeled data is required for deep neural network training. A typical strategy to improve the performance of a neural network given a training data set is to use data augmentation technique. The goal of this work is to offer a novel image augmentation method for improving object detection accuracy. An object in an image is removed, and a similar object from the training data set is placed in its area. An in-painting algorithm fills the space that is eliminated but not filled by a similar object. Our technique shows at most 2.32 percent improvements on mAP in our testing on a military vehicle dataset using the YOLOv4 object detector.

An Efficient Data Augmentation for 3D Medical Image Segmentation (3차원 의료 영상의 영역 분할을 위한 효율적인 데이터 보강 방법)

  • Park, Sangkun
    • Journal of Institute of Convergence Technology
    • /
    • v.11 no.1
    • /
    • pp.1-5
    • /
    • 2021
  • Deep learning based methods achieve state-of-the-art accuracy, however, they typically rely on supervised training with large labeled datasets. It is known in many medical applications that labeling medical images requires significant expertise and much time, and typical hand-tuned approaches for data augmentation fail to capture the complex variations in such images. This paper proposes a 3D image augmentation method to overcome these difficulties. It allows us to enrich diversity of training data samples that is essential in medical image segmentation tasks, thus reducing the data overfitting problem caused by the fact the scale of medical image dataset is typically smaller. Our numerical experiments demonstrate that the proposed approach provides significant improvements over state-of-the-art methods for 3D medical image segmentation.

Dataset Augmentation Technique for Crack Detection of Wood Building (목조건물 크랙 감지를 위한 데이터셋 증강 기법)

  • Kim, Beom-Jun;Kim, Inki;Lim, Hyunseok;Gwak, Jeonghwan
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2021.07a
    • /
    • pp.645-647
    • /
    • 2021
  • 본 논문에서는 목조건물의 Crack만을 움직여 Data set을 증강하는 기법을 제안한다. 이 기법은 이미지 내 Crack Detection의 학습 데이터를 만들기 위해 이미지의 전체적인 값으로 Flip, Rotation, Shift, Rescale 등의 변환을 통해 Data Augmentation을 진행하는 대신 Crack이라는 하나의 Object만을 가지고 새로운 데이터를 생성한다. 이때 Object는 관심 영역 내에서만 연산되어 기존의 방법보다 더욱 많은 데이터를 얻을 수 있으며, Crack이 관심 영역 밖으로 이동하지 않기 때문에 이상치 혹은 결측치가 존재하지 않는 데이터를 얻을 수 있다. 또한 Crack이 존재하지 않는 이미지에도 임의적으로 Crack을 생성하여 새로운 데이터를 만들 수 있다. 결론적으로 본 논문에서는 Crack Detection의 학습을 위하여 기존 방법보다 우수한 성능의 Data Augmentation을 제안하였다.

  • PDF

A Scheme for Preventing Data Augmentation Leaks in GAN-based Models Using Auxiliary Classifier (보조 분류기를 이용한 GAN 모델에서의 데이터 증강 누출 방지 기법)

  • Shim, Jong-Hwa;Lee, Ji-Eun;Hwang, Een-Jun
    • Journal of IKEEE
    • /
    • v.26 no.2
    • /
    • pp.176-185
    • /
    • 2022
  • Data augmentation is general approach to solve overfitting of machine learning models by applying various data transformations and distortions to dataset. However, when data augmentation is applied in GAN-based model, which is deep learning image generation model, data transformation and distortion are reflected in the generated image, then the generated image quality decrease. To prevent this problem called augmentation leak, we propose a scheme that can prevent augmentation leak regardless of the type and number of augmentations. Specifically, we analyze the conditions of augmentation leak occurrence by type and implement auxiliary augmentation task classifier that can prevent augmentation leak. Through experiments, we show that the proposed technique prevents augmentation leak in the GAN model, and as a result improves the quality of the generated image. We also demonstrate the superiority of the proposed scheme through ablation study and comparison with other representative augmentation leak prevention technique.