• 제목/요약/키워드: Synthetic data generation

검색결과 115건 처리시간 0.032초

Enhanced Regular Expression as a DGL for Generation of Synthetic Big Data

  • Kai, Cheng;Keisuke, Abe
    • Journal of Information Processing Systems
    • /
    • 제19권1호
    • /
    • pp.1-16
    • /
    • 2023
  • Synthetic data generation is generally used in performance evaluation and function tests in data-intensive applications, as well as in various areas of data analytics, such as privacy-preserving data publishing (PPDP) and statistical disclosure limit/control. A significant amount of research has been conducted on tools and languages for data generation. However, existing tools and languages have been developed for specific purposes and are unsuitable for other domains. In this article, we propose a regular expression-based data generation language (DGL) for flexible big data generation. To achieve a general-purpose and powerful DGL, we enhanced the standard regular expressions to support the data domain, type/format inference, sequence and random generation, probability distributions, and resource reference. To efficiently implement the proposed language, we propose caching techniques for both the intermediate and database queries. We evaluated the proposed improvement experimentally.

군용물체탐지 연구를 위한 가상 이미지 데이터 생성 (Synthetic Image Generation for Military Vehicle Detection)

  • 오세윤;양훈민
    • 한국군사과학기술학회지
    • /
    • 제26권5호
    • /
    • pp.392-399
    • /
    • 2023
  • This research paper investigates the effectiveness of using computer graphics(CG) based synthetic data for deep learning in military vehicle detection. In particular, we explore the use of synthetic image generation techniques to train deep neural networks for object detection tasks. Our approach involves the generation of a large dataset of synthetic images of military vehicles, which is then used to train a deep learning model. The resulting model is then evaluated on real-world images to measure its effectiveness. Our experimental results show that synthetic training data alone can achieve effective results in object detection. Our findings demonstrate the potential of CG-based synthetic data for deep learning and suggest its value as a tool for training models in a variety of applications, including military vehicle detection.

물리 기반 인공신경망을 이용한 PIV용 합성 입자이미지 생성 (Generation of Synthetic Particle Images for Particle Image Velocimetry using Physics-Informed Neural Network)

  • 최현조;신명현;박종호;박진수
    • 한국가시화정보학회지
    • /
    • 제21권1호
    • /
    • pp.119-126
    • /
    • 2023
  • Acquiring experimental data for PIV verification or machine learning training data is resource-demanding, leading to an increasing interest in synthetic particle images as simulation data. Conventional synthetic particle image generation algorithms do not follow physical laws, and the use of CFD is time-consuming and requires computing resources. In this study, we propose a new method for synthetic particle image generation, based on a Physics-Informed Neural Networks(PINN). The PINN is utilized to infer the flow fields, enabling the generation of synthetic particle images that follow physical laws with reduced computation time and have no constraints on spatial resolution compared to CFD. The proposed method is expected to contribute to the verification of PIV algorithms.

Game Engine Driven Synthetic Data Generation for Computer Vision-Based Construction Safety Monitoring

  • Lee, Heejae;Jeon, Jongmoo;Yang, Jaehun;Park, Chansik;Lee, Dongmin
    • 국제학술발표논문집
    • /
    • The 9th International Conference on Construction Engineering and Project Management
    • /
    • pp.893-903
    • /
    • 2022
  • Recently, computer vision (CV)-based safety monitoring (i.e., object detection) system has been widely researched in the construction industry. Sufficient and high-quality data collection is required to detect objects accurately. Such data collection is significant for detecting small objects or images from different camera angles. Although several previous studies proposed novel data augmentation and synthetic data generation approaches, it is still not thoroughly addressed (i.e., limited accuracy) in the dynamic construction work environment. In this study, we proposed a game engine-driven synthetic data generation model to enhance the accuracy of the CV-based object detection model, mainly targeting small objects. In the virtual 3D environment, we generated synthetic data to complement training images by altering the virtual camera angles. The main contribution of this paper is to confirm whether synthetic data generated in the game engine can improve the accuracy of the CV-based object detection model.

  • PDF

시계열 생성적 적대 신경망을 이용한 비행체 궤적 합성 데이터 생성 및 비행체 궤적 예측에서의 활용에 관한 연구 (A Study on Synthetic Flight Vehicle Trajectory Data Generation Using Time-series Generative Adversarial Network and Its Application to Trajectory Prediction of Flight Vehicles)

  • 박인희;이창진;정찬호
    • 전기전자학회논문지
    • /
    • 제25권4호
    • /
    • pp.766-769
    • /
    • 2021
  • 딥러닝을 포함한 머신러닝 기법을 기반으로 비행체의 궤적 설계, 제어, 최적화, 예측 등의 작업을 수행하기 위해서는 일정한 양 이상의 비행체 궤적 데이터를 필요로 한다. 그러나 다양한 이유(예를 들어 비행체 궤적 데이터셋 구축에 필요한 비용, 시간, 인력 등)로 일정한 양 이상의 비행체 궤적 데이터를 확보하기 어려운 경우가 존재한다. 이러한 경우 합성 데이터 생성이 머신러닝을 가능하게 하는 방법 중 하나가 될 수 있다. 본 논문에서는 이와 같은 가능성을 탐구하기 위하여 시계열 생성적 적대 신경망을 이용하여 비행체 궤적 합성 데이터를 생성하고 평가하였다. 또한 비행체의 상태를 인식하기 위한 비행체 궤적 예측 작업에서 합성 데이터의 활용 가능성을 탐구하기 위하여 다양한 ablation study(비교 실험)를 수행하였다. 본 논문에서 제시된 생성 평가 및 비교 실험 결과는 비행체 궤적 합성 데이터 생성 및 비행체 궤적 관련 작업에서 합성 데이터의 활용 가능성에 대한 연구를 수행하고자 하는 연구자들에게 실질적인 도움이 될 것으로 예상한다.

SSIM 목적 함수와 CycleGAN을 이용한 적외선 이미지 데이터셋 생성 기법 연구 (Synthetic Infra-Red Image Dataset Generation by CycleGAN based on SSIM Loss Function)

  • 이하늘;이현재
    • 한국군사과학기술학회지
    • /
    • 제25권5호
    • /
    • pp.476-486
    • /
    • 2022
  • Synthetic dynamic infrared image generation from the given virtual environment is being the primary goal to simulate the output of the infra-red(IR) camera installed on a vehicle to evaluate the control algorithm for various search & reconnaissance missions. Due to the difficulty to obtain actual IR data in complex environments, Artificial intelligence(AI) has been used recently in the field of image data generation. In this paper, CycleGAN technique is applied to obtain a more realistic synthetic IR image. We added the Structural Similarity Index Measure(SSIM) loss function to the L1 loss function to generate a more realistic synthetic IR image when the CycleGAN image is generated. From the simulation, it is applicable to the guided-missile flight simulation tests by using the synthetic infrared image generated by the proposed technique.

유용성과 노출 위험성 지표를 이용한 재현자료 기법 비교 연구 (A comparison of synthetic data approaches using utility and disclosure risk measures)

  • 안성빈;트랑 도안;이주희;김지우;김용재;김윤지;윤창원;정성규;김동하;권성훈;김항준;안정연;박철우
    • 응용통계연구
    • /
    • 제36권2호
    • /
    • pp.141-166
    • /
    • 2023
  • 재현자료를 생성하여 배포하는 것은 데이터 공개에 따른 정보 유출의 위험을 방지하는 대표적인 방법이다. 최근 산업에서 데이터의 활용이 중요해진 만큼 한국을 포함한 많은 국가 및 기관에서 재현자료에 관한 연구가 활발히 진행되고 있다. 본 논문에서는 대표적인 재현자료 생성 기법들과 평가 지표들을 소개한다. 전통적인 재현자료 생성 방법인 다중대체와 최근 제시된 인공신경망 기반의 재현자료 생성 방법 등을 활용하여 재현자료를 생성하는 과정을 기술함에 따라 재현자료 생성 방법에 대한 전반적인 이해를 돕는다. 이에 더해 다양한 재현자료 평가 지표를 바탕으로 생성된 재현자료들을 분석 및 비교함에 따라 앞으로의 연구에 대한 방향을 제시하고 그에 대한 토대를 마련하고자 한다.

장기유출량의 추계학적 모의 발생에 관한 연구 (II) (Studies on the Stochastic Generation of Long Term Runoff (2))

  • 이순혁;맹승진;박종국
    • 한국농공학회지
    • /
    • 제35권3호
    • /
    • pp.117-129
    • /
    • 1993
  • This study was conducted to get reasonable and abundant hydrological time series of monthly flows simulated by a best fitting stochastic simulation model for the establishment of rational design and the rationalization of management for agricultural hydraulic structures including reservoirs. Comparative analysis carried out for both statistical characteristics and synthetic monthly flows simulated by the multi-season first order Markov model based on Gamma distribution which is confirmed as good one in the first report of this study and by Harmonic synthetic model analyzed in this report for the six watersheds of Yeong San and Seom Jin river systems. 1.Arithmetic mean values of synthetic monthly flows simulated by Gamma distribution are much closer to the results of the observed data than those of Harmonic synthetic model in the applied watersheds. 2.In comparison with the coefficients of variation, index of fluctuation for monthly flows simulated by two kinds of synthetic models, those based on Gamma distribution are appeared closer to the observed data than those of Harmonic synthetic model both in Yeong San and Seom Jin river systems. 3.It was found that synthetic monthly flows based on Gamma distribution are considered to give better results than those of Harmonic synthetic model in the applied watersheds. 4.Continuation studies by comparison with other simulation techniques are to be desired for getting reasonable generation technique of synthetic monthly flows for the various river systems in Korea.

  • PDF

2차 마르코프 사슬 모델을 이용한 시계열 인공 풍속 자료의 생성 (Generation of Synthetic Time Series Wind Speed Data using Second-Order Markov Chain Model)

  • 유기완
    • 풍력에너지저널
    • /
    • 제14권1호
    • /
    • pp.37-43
    • /
    • 2023
  • In this study, synthetic time series wind data was generated numerically using a second-order Markov chain. One year of wind data in 2020 measured by the AWS on Wido Island was used to investigate the statistics for measured wind data. Both the transition probability matrix and the cumulative transition probability matrix for annual hourly mean wind speed were obtained through statistical analysis. Probability density distribution along the wind speed and autocorrelation according to time were compared with the first- and the second-order Markov chains with various lengths of time series wind data. Probability density distributions for measured wind data and synthetic wind data using the first- and the second-order Markov chains were also compared to each other. For the case of the second-order Markov chain, some improvement of the autocorrelation was verified. It turns out that the autocorrelation converges to zero according to increasing the wind speed when the data size is sufficiently large. The generation of artificial wind data is expected to be useful as input data for virtual digital twin wind turbines.

차분 프라이버시를 만족하는 안전한 GAN 기반 재현 데이터 생성 기술 연구 (A Study on Synthetic Data Generation Based Safe Differentially Private GAN)

  • 강준영;정수용;홍도원;서창호
    • 정보보호학회논문지
    • /
    • 제30권5호
    • /
    • pp.945-956
    • /
    • 2020
  • 많은 응용프로그램들로부터 양질의 서비스를 제공받기 위해서 데이터 공개는 필수적이다. 하지만 원본 데이터를 그대로 공개할 경우 개인의 민감한 정보(정치적 성향, 질병 등)가 드러날 위험이 있기 때문에 원본 데이터가 아닌 재현 데이터를 생성하여 공개함으로써 프라이버시를 보존하는 많은 연구들이 제안되어왔다. 그러나 단순히 재현 데이터를 생성하여 공개하는 것은 여러 공격들(연결공격, 추론공격 등)에 의해 여전히 프라이버시 유출 위험이 존재한다. 본 논문에서는 이러한 민감한 정보의 유출을 방지하기 위해, 재현 데이터 생성 모델로 주목받고 있는 GAN에 최신 프라이버시 보호 기술인 차분 프라이버시를 적용하여 프라이버시가 보존되는 재현 데이터 생성 알고리즘을 제안한다. 생성 모델은 레이블이 있는 데이터의 효율적인 학습을 위해 CGAN을 사용하였고, 데이터의 유용성 측면을 고려하여 기존 차분 프라이버시보다 프라이버시가 완화된 Rényi 차분 프라이버시를 적용하였다. 그리고 생성된 데이터의 유용성에 대한 검증을 다양한 분류기를 통해 실시하고 비교분석하였다.