• Title/Summary/Keyword: synthetic data

Search Result 1,437, Processing Time 0.028 seconds

Generating and Validating Synthetic Training Data for Predicting Bankruptcy of Individual Businesses

  • Hong, Dong-Suk;Baik, Cheol
    • Journal of information and communication convergence engineering
    • /
    • v.19 no.4
    • /
    • pp.228-233
    • /
    • 2021
  • In this study, we analyze the credit information (loan, delinquency information, etc.) of individual business owners to generate voluminous training data to establish a bankruptcy prediction model through a partial synthetic training technique. Furthermore, we evaluate the prediction performance of the newly generated data compared to the actual data. When using conditional tabular generative adversarial networks (CTGAN)-based training data generated by the experimental results (a logistic regression task), the recall is improved by 1.75 times compared to that obtained using the actual data. The probability that both the actual and generated data are sampled over an identical distribution is verified to be much higher than 80%. Providing artificial intelligence training data through data synthesis in the fields of credit rating and default risk prediction of individual businesses, which have not been relatively active in research, promotes further in-depth research efforts focused on utilizing such methods.

Improve object recognition using UWB SAR imaging with compressed sensing

  • Pham, The Hien;Hong, Ic-Pyo
    • Journal of IKEEE
    • /
    • v.25 no.1
    • /
    • pp.76-82
    • /
    • 2021
  • In this paper, the compressed sensing basic pursuit denoise algorithm adopted to synthetic aperture radar imaging is investigated to improve the object recognition. From the incomplete data sets for image processing, the compressed sensing algorithm had been integrated to recover the data before the conventional back- projection algorithm was involved to obtain the synthetic aperture radar images. This method can lead to the reduction of measurement events while scanning the objects. An ultra-wideband radar scheme using a stripmap synthetic aperture radar algorithm was utilized to detect objects hidden behind the box. The Ultra-Wideband radar system with 3.1~4.8 GHz broadband and UWB antenna were implemented to transmit and receive signal data of two conductive cylinders located inside the paper box. The results confirmed that the images can be reconstructed by using a 30% randomly selected dataset without noticeable distortion compared to the images generated by full data using the conventional back-projection algorithm.

A comparison of synthetic data approaches using utility and disclosure risk measures (유용성과 노출 위험성 지표를 이용한 재현자료 기법 비교 연구)

  • Seongbin An;Trang Doan;Juhee Lee;Jiwoo Kim;Yong Jae Kim;Yunji Kim;Changwon Yoon;Sungkyu Jung;Dongha Kim;Sunghoon Kwon;Hang J Kim;Jeongyoun Ahn;Cheolwoo Park
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.2
    • /
    • pp.141-166
    • /
    • 2023
  • This paper investigates synthetic data generation methods and their evaluation measures. There have been increasing demands for releasing various types of data to the public for different purposes. At the same time, there are also unavoidable concerns about leaking critical or sensitive information. Many synthetic data generation methods have been proposed over the years in order to address these concerns and implemented in some countries, including Korea. The current study aims to introduce and compare three representative synthetic data generation approaches: Sequential regression, nonparametric Bayesian multiple imputations, and deep generative models. Several evaluation metrics that measure the utility and disclosure risk of synthetic data are also reviewed. We provide empirical comparisons of the three synthetic data generation approaches with respect to various evaluation measures. The findings of this work will help practitioners to have a better understanding of the advantages and disadvantages of those synthetic data methods.

Enhanced Regular Expression as a DGL for Generation of Synthetic Big Data

  • Kai, Cheng;Keisuke, Abe
    • Journal of Information Processing Systems
    • /
    • v.19 no.1
    • /
    • pp.1-16
    • /
    • 2023
  • Synthetic data generation is generally used in performance evaluation and function tests in data-intensive applications, as well as in various areas of data analytics, such as privacy-preserving data publishing (PPDP) and statistical disclosure limit/control. A significant amount of research has been conducted on tools and languages for data generation. However, existing tools and languages have been developed for specific purposes and are unsuitable for other domains. In this article, we propose a regular expression-based data generation language (DGL) for flexible big data generation. To achieve a general-purpose and powerful DGL, we enhanced the standard regular expressions to support the data domain, type/format inference, sequence and random generation, probability distributions, and resource reference. To efficiently implement the proposed language, we propose caching techniques for both the intermediate and database queries. We evaluated the proposed improvement experimentally.

Perception Ability of Synthetic Vowels in Cochlear Implanted Children (모음의 포먼트 변형에 따른 인공와우 이식 아동의 청각적 인지변화)

  • Huh, Myung-Jin
    • MALSORI
    • /
    • no.64
    • /
    • pp.1-14
    • /
    • 2007
  • The purpose of this study was to examine the acoustic perception different by formants change for profoundly hearing impaired children with cochlear implants. The subjects were 10 children after 15 months of experience with the implant and mean of their chronological age was 8.4 years and Standard deviation was 2.9 years. The ability of auditory perception was assessed using acoustic-synthetic vowels. The acoustic-synthetic vowel was combined with F1, F2, and F3 into a vowel and produced 42 synthetic sound, using Speech GUI(Graphic User Interface) program. The data was deal with clustering analysis and on-line analytical processing for perception ability of acoustic synthetic vowel. The results showed that auditory perception scores of acoustic-synthetic vowels for cochlear implanted children were increased in F2 synthetic vowels compaire to those of F1. And it was found that they perceived the differences of vowels in terms of distance rates between F1 and F2 in specific vowel.

  • PDF

A Study on the Synthetic ECG Generation for User Recognition (사용자 인식을 위한 가상 심전도 신호 생성 기술에 관한 연구)

  • Kim, Min Gu;Kim, Jin Su;Pan, Sung Bum
    • Smart Media Journal
    • /
    • v.8 no.4
    • /
    • pp.33-37
    • /
    • 2019
  • Because the ECG signals are time-series data acquired as time elapses, it is important to obtain comparative data the same in size as the enrolled data every time. This paper suggests a network model of GAN (Generative Adversarial Networks) based on an auxiliary classifier to generate synthetic ECG signals which may address the different data size issues. The Cosine similarity and Cross-correlation are used to examine the similarity of synthetic ECG signals. The analysis shows that the Average Cosine similarity was 0.991 and the Average Euclidean distance similarity based on cross-correlation was 0.25: such results indicate that data size difference issue can be resolved while the generated synthetic ECG signals, similar to real ECG signals, can create synthetic data even when the registered data are not the same as the comparative data in size.

A study on the Conservation of Historic Timber Architecture by Synthetic Resin in Korea (합성수지를 사용한 목조건조물문화재 보존처리 사례 연구 - 한국과 일본의 보존처리 사례를 중심으로 -)

  • Cho, Hyun-Jung;Kim, Wang-Jik
    • Journal of architectural history
    • /
    • v.15 no.1 s.45
    • /
    • pp.41-60
    • /
    • 2006
  • Preservation of wooden architecture by means of synthetic resin, is physical and chemical work. Synthetic resins are using for consolidation and restoration of decayed members. Since 1978, synthetic resin became useful preservation of architectural heritage in Korea. The first object was Chimgyeru of Songgwang-temple in Suncheon city. In the 1980s, have begun the care of materials for conservation on the architectural heritage, it was influenced according to authenticity of UNESCO Venice charter's principle, in 1964. In Korea, preservation of wooden architecture by means of synthetic resin that is sing many kinds of epoxies. Among the specific types of epoxies are araldite XN1023, SV427, etc. The use of synthetic resin have merits and demerits in the restoration for architectural heritage. The merit is that it is more smaller change with new members during preservation work. But the demerit is an irreversibility of the epoxy resin. In 1999, 'ICOMOS International Wood Committee' recommend contemporary materials and techniques, should be chosen and used with the greatest caution. And preservation work should reversible, as possible as technically. Therefore, should be data continous for preservation of wooden architecture by synthetic resin. Because data is very important work about a preservation of wooden architecture by synthetic resin. And should be try to think about new materials and techniques instead of synthetic resin, in the long view.

  • PDF

Review of Data-Driven Multivariate and Multiscale Methods

  • Park, Cheolsoo
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.4 no.2
    • /
    • pp.89-96
    • /
    • 2015
  • In this paper, time-frequency analysis algorithms, empirical mode decomposition and local mean decomposition, are reviewed and their applications to nonlinear and nonstationary real-world data are discussed. In addition, their generic extensions to complex domain are addressed for the analysis of multichannel data. Simulations of these algorithms on synthetic data illustrate the fundamental structure of the algorithms and how they are designed for the analysis of nonlinear and nonstationary data. Applications of the complex version of the algorithms to the synthetic data also demonstrate the benefit of the algorithms for the accurate frequency decomposition of multichannel data.

Synthetic Infra-Red Image Dataset Generation by CycleGAN based on SSIM Loss Function (SSIM 목적 함수와 CycleGAN을 이용한 적외선 이미지 데이터셋 생성 기법 연구)

  • Lee, Sky;Leeghim, Henzeh
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.25 no.5
    • /
    • pp.476-486
    • /
    • 2022
  • Synthetic dynamic infrared image generation from the given virtual environment is being the primary goal to simulate the output of the infra-red(IR) camera installed on a vehicle to evaluate the control algorithm for various search & reconnaissance missions. Due to the difficulty to obtain actual IR data in complex environments, Artificial intelligence(AI) has been used recently in the field of image data generation. In this paper, CycleGAN technique is applied to obtain a more realistic synthetic IR image. We added the Structural Similarity Index Measure(SSIM) loss function to the L1 loss function to generate a more realistic synthetic IR image when the CycleGAN image is generated. From the simulation, it is applicable to the guided-missile flight simulation tests by using the synthetic infrared image generated by the proposed technique.

Performance Analysis of Deep Learning-Based Detection/Classification for SAR Ground Targets with the Synthetic Dataset (합성 데이터를 이용한 SAR 지상표적의 딥러닝 탐지/분류 성능분석)

  • Ji-Hoon Park
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.27 no.2
    • /
    • pp.147-155
    • /
    • 2024
  • Based on the recently developed deep learning technology, many studies have been conducted on deep learning networks that simultaneously detect and classify targets of interest in synthetic aperture radar(SAR) images. Although numerous research results have been derived mainly with the open SAR ship datasets, there is a lack of work carried out on the deep learning network aimed at detecting and classifying SAR ground targets and trained with the synthetic dataset generated from electromagnetic scattering simulations. In this respect, this paper presents the deep learning network trained with the synthetic dataset and applies it to detecting and classifying real SAR ground targets. With experiment results, this paper also analyzes the network performance according to the composition ratio between the real measured data and the synthetic data involved in network training. Finally, the summary and limitations are discussed to give information on the future research direction.