• 제목/요약/키워드: Tabular Data

검색결과 51건 처리시간 0.02초

공공기술 사업화를 위한 CTGAN 기반 데이터 불균형 해소 (Resolving CTGAN-based data imbalance for commercialization of public technology)

  • 황철현
    • 한국정보통신학회논문지
    • /
    • 제26권1호
    • /
    • pp.64-69
    • /
    • 2022
  • 공공기술 사업화는 정부가 주도하는 과학기술의 혁신과 R&D 성과를 민간에 이전하는 것으로 경제 성장을 주도하는 핵심 성과로 인식되고 있다. 따라서 기술 이전을 활성화시키기 위해 성공 요인을 식별하거나 사업화 가능성이 높은 공공기술과 수요기업을 매칭하는 다양한 기계학습의 방법들이 연구되고 있다. 하지만 공공기술 사업화 데이터는 표 형태로 구성되어 있고, 성공-실패 비율이 큰 차이를 보이는 불균형 상태이기 때문에 기계학습 성능이 높지 않는 문제점을 가지고 있다. 이 논문에서는 표 형태로 구성된 공공기술 데이터에서 불균형을 해소하기 위해 CTGAN을 활용하는 방법을 제시한다. 또한 제시된 방법의 효과를 검증하기 위해 실제 공공기술 사업화 데이터를 활용하여 통계적 접근방법인 SMOTE와 비교 실험을 수행하였다. 다수의 실험 사례에서 CTGAN은 공공기술 사업화 성공사례를 안정적으로 예측하는 것을 확인하였다.

Generating and Validating Synthetic Training Data for Predicting Bankruptcy of Individual Businesses

  • Hong, Dong-Suk;Baik, Cheol
    • Journal of information and communication convergence engineering
    • /
    • 제19권4호
    • /
    • pp.228-233
    • /
    • 2021
  • In this study, we analyze the credit information (loan, delinquency information, etc.) of individual business owners to generate voluminous training data to establish a bankruptcy prediction model through a partial synthetic training technique. Furthermore, we evaluate the prediction performance of the newly generated data compared to the actual data. When using conditional tabular generative adversarial networks (CTGAN)-based training data generated by the experimental results (a logistic regression task), the recall is improved by 1.75 times compared to that obtained using the actual data. The probability that both the actual and generated data are sampled over an identical distribution is verified to be much higher than 80%. Providing artificial intelligence training data through data synthesis in the fields of credit rating and default risk prediction of individual businesses, which have not been relatively active in research, promotes further in-depth research efforts focused on utilizing such methods.

Automatic Generation of Machine Readable Context Annotations for SPARQL Results

  • Choi, Ji-Woong
    • 한국컴퓨터정보학회논문지
    • /
    • 제21권10호
    • /
    • pp.1-10
    • /
    • 2016
  • In this paper, we propose an approach to generate machine readable context annotations for SPARQL Results. According to W3C Recommendations, the retrieved data from RDF or OWL data sources are represented in tabular form, in which each cell's data is described by only type and value. The simple query result form is generally useful, but it is not sufficient to explain the semantics of the data in query results. To explain the meaning of the data, appropriate annotations must be added to the query results. In this paper, we generate the annotations from the basic graph patterns in user's queries. We could also manipulate the original queries to complete the annotations. The generated annotations are represented using the RDFa syntax in our study. The RDFa expressions in HTML are machine-understandable. We believe that our work will improve the trustworthiness of query results and contribute to distribute the data to meet the vision of the Semantic Web.

수용가 전력 소비 패턴을 고려한 배전용 변압기 과부하 판정기준 (Overload Criteria of Distribution Transformers Considering the Electric Consumption Patterns of Customers)

  • 윤상윤;김재철
    • 대한전기학회논문지:전력기술부문A
    • /
    • 제53권9호
    • /
    • pp.513-520
    • /
    • 2004
  • In the paper, we summarize the result of the experimental research for the overload criteria of domestic distribution transformers considering the electric consumption patterns of customers. For the basic characteristic data of distribution transformer overload, the actual experiments are accomplished. The field data of loads are surveyed from sample transformers for analyzing the consumption pattern of customer load. The load data acquisition devices are equipped, and the algorithm of load pattern classification is applied. In addition to this efforts, various load pattern data. in past are gathered. Then the representative load pattern of each customer type in domestic is extracted. The final results of overload criterions are presented as tabular form through the results of experiments and survey are combined. The field test of the experiment results is peformed using the special manufactured transformers, which can measure both the load and top-oil temperature of transformer. Through this, we verify that the results of field test are similar to the laboratory one and the Proposed overload criteria can be effectively applied to the real system.

계절성 시계열 자료의 concept drift 탐지를 위한 새로운 창 전략 (A novel window strategy for concept drift detection in seasonal time series)

  • 이도운;배수민;김강섭;안순홍
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2023년도 춘계학술발표대회
    • /
    • pp.377-379
    • /
    • 2023
  • Concept drift detection on data stream is the major issue to maintain the performance of the machine learning model. Since the online stream is to be a function of time, the classical statistic methods are hard to apply. In particular case of seasonal time series, a novel window strategy with Fourier analysis however, gives a chance to adapt the classical methods on the series. We explore the KS-test for an adaptation of the periodic time series and show that this strategy handles a complicate time series as an ordinary tabular dataset. We verify that the detection with the strategy takes the second place in time delay and shows the best performance in false alarm rate and detection accuracy comparing to that of arbitrary window sizes.

Development Of A Windows-Based Predictive Model For Estimating Sediment Resuspension And Contaminant Release From Dredging Operations

  • Je, Chung-Hwan;Kim, Kyung-Sub
    • Water Engineering Research
    • /
    • 제1권2호
    • /
    • pp.137-146
    • /
    • 2000
  • A windows-based software package, named DREDGE, is developed for estimating sediment resuspension and contaminant release during dredging operations. DREDGE allows user to enter the necessary dredge information, site characteristics, operational data, and contaminant characteristics, then calculates an array of concentration using the given values. The program mainly consists of the near-field models, which are obtained empirically, for estimating sediment resuspension and the far-field models, which are obtained analytically, for suspended sediment transport. A linear equilibrium partitioning approach is applied to estimate particulate and dissolved contaminant concentrations. This software package which requires only a minimal amount of data consists of three components; user input, tabular output, and graphical output. Combining the near-field and far-field models into a user-friendly windows-based computer program can greatly save dredge operator's, planners', and regulators' efforts for estimating sediment transports and contaminant distribution.

  • PDF

An Ensemble Model for Credit Default Discrimination: Incorporating BERT-based NLP and Transformer

  • Sophot Ky;Ju-Hong Lee
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2023년도 춘계학술발표대회
    • /
    • pp.624-626
    • /
    • 2023
  • Credit scoring is a technique used by financial institutions to assess the creditworthiness of potential borrowers. This involves evaluating a borrower's credit history to predict the likelihood of defaulting on a loan. This paper presents an ensemble of two Transformer based models within a framework for discriminating the default risk of loan applications in the field of credit scoring. The first model is FinBERT, a pretrained NLP model to analyze sentiment of financial text. The second model is FT-Transformer, a simple adaptation of the Transformer architecture for the tabular domain. Both models are trained on the same underlying data set, with the only difference being the representation of the data. This multi-modal approach allows us to leverage the unique capabilities of each model and potentially uncover insights that may not be apparent when using a single model alone. We compare our model with two famous ensemble-based models, Random Forest and Extreme Gradient Boosting.

Visualization of Dynamic Simulation Data for Power System Stability Assessment

  • Song, Chong-Suk;Jang, Gil-Soo;Park, Chang-Hyun
    • Journal of Electrical Engineering and Technology
    • /
    • 제6권4호
    • /
    • pp.484-492
    • /
    • 2011
  • Power system analyses, which involve the handling of massive data volumes, necessitate the use of effective visualization methods to facilitate analysis and assist the user in obtaining a clear understanding of the present state of the system. This paper introduces an interface that compensates for the limitations of the visualization modules of dynamic security assessment tools, such as PSS/e and TSAT, for power system variables including generator rotor angle and frequency. The compensation is made possible through the automatic provision of dynamic simulation data in visualized and tabular form for better data intuition, thereby considerably reducing the redundant manual operation and time required for data analysis. The interface also determines whether the generators are stable through a generator instability algorithm that scans simulation data and checks for an increase in swing or divergence. The proposed visualization methods are applied to the dynamic simulation results for contingencies in the Korean Electric Power Corporation system, and have been tested by power system researchers to verify the effectiveness of the data visualization interface.

SSR-Primer Generator: A Tool for Finding Simple Sequence Repeats and Designing SSR-Primers

  • Hong, Chang-Pyo;Choi, Su-Ryun;Lim, Yong-Pyo
    • Genomics & Informatics
    • /
    • 제9권4호
    • /
    • pp.189-193
    • /
    • 2011
  • Simple sequence repeats (SSRs) are ubiquitous short tandem duplications found within eukaryotic genomes. Their length variability and abundance throughout the genome has led them to be widely used as molecular markers for crop-breeding programs, facilitating the use of marker-assisted selection as well as estimation of genetic population structure. Here, we report a software application, "SSR-Primer Generator " for SSR discovery, SSR-primer design, and homology-based search of in silico amplicons from a DNA sequence dataset. On submission of multiple FASTA-format DNA sequences, those analyses are batch processed in a Java runtime environment (JRE) platform, in a pipeline, and the resulting data are visualized in HTML tabular format. This application will be a useful tool for reducing the time and costs associated with the development and application of SSR markers.

국내 15개 주요지역의 난방도일 재산정에 관한 연구 (Study on the Revision of HDD for 15 Main Cities of Korea)

  • 조성환;김성수;최창용
    • 설비공학논문집
    • /
    • 제22권7호
    • /
    • pp.436-441
    • /
    • 2010
  • The purpose of this study is to revise the HDD(Heating Degree-Days) of main cities of Korea because the outside temperature rise have been accelerated by global warming recently. Now our HDD(Heating Degree-Days) for the utility design of district heating system had been `established in 20 years ago. Therefore new heating degree days for main cities of korea had been required and determined using long-term measured outside weather temperature data during 30 years. For the analysis of HDD, five different base temperatures ranging from 24 to $16^{\circ}C$ were chosen in the calculation of heating degree days. And new yearly heating degree days of 15 cities of korea were given in the tabular form.