• Title/Summary/Keyword: RFECV

Search Result 3, Processing Time 0.019 seconds

A Study on Accounting Fraud Detection using Neural Network and Random Forest (인공신경망 및 랜덤포레스트 기법을 활용한 기업 분식회계 탐지 성능 평가 연구)

  • Dong-Hyeok Hwang;Yeong-Seok Seo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.692-693
    • /
    • 2023
  • ESG 경영이 중요해짐에 따라 기업의 분식 여부도 중요해졌다. 따라서 본 논문에서는 인공신경망과 랜덤포레스트를 활용하여 기업의 분식회계 여부를 판단 성능을 비교분석하고 그 유용성에 대해 평가하였다. 실제 기업 회계정보를 수집하여 실험을 수행하였고, 실험 결과 F1-Score 기준 랜덤포레스트의 RFECV 기법이 0.81로 분식 기업을, SMOTE 기법을 사용한 모델이 정상 기업을 탐지하였고 Accuracy 기준 랜덤포레스트의 RFECV 기법과 SMOTE 기법을 사용한 모델이 0.77로 가장 효과적인 탐지 성능을 보여주었다.

Anomaly Detection Model Based on Semi-Supervised Learning Using LIME: Focusing on Semiconductor Process (LIME을 활용한 준지도 학습 기반 이상 탐지 모델: 반도체 공정을 중심으로)

  • Kang-Min An;Ju-Eun Shin;Dong Hyun Baek
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.45 no.4
    • /
    • pp.86-98
    • /
    • 2022
  • Recently, many studies have been conducted to improve quality by applying machine learning models to semiconductor manufacturing process data. However, in the semiconductor manufacturing process, the ratio of good products is much higher than that of defective products, so the problem of data imbalance is serious in terms of machine learning. In addition, since the number of features of data used in machine learning is very large, it is very important to perform machine learning by extracting only important features from among them to increase accuracy and utilization. This study proposes an anomaly detection methodology that can learn excellently despite data imbalance and high-dimensional characteristics of semiconductor process data. The anomaly detection methodology applies the LIME algorithm after applying the SMOTE method and the RFECV method. The proposed methodology analyzes the classification result of the anomaly classification model, detects the cause of the anomaly, and derives a semiconductor process requiring action. The proposed methodology confirmed applicability and feasibility through application of cases.

Automatic Augmentation Technique of an Autoencoder-based Numerical Training Data (오토인코더 기반 수치형 학습데이터의 자동 증강 기법)

  • Jeong, Ju-Eun;Kim, Han-Joon;Chun, Jong-Hoon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.5
    • /
    • pp.75-86
    • /
    • 2022
  • This study aims to solve the problem of class imbalance in numerical data by using a deep learning-based Variational AutoEncoder and to improve the performance of the learning model by augmenting the learning data. We propose 'D-VAE' to artificially increase the number of records for a given table data. The main features of the proposed technique go through discretization and feature selection in the preprocessing process to optimize the data. In the discretization process, K-means are applied and grouped, and then converted into one-hot vectors by one-hot encoding technique. Subsequently, for memory efficiency, sample data are generated with Variational AutoEncoder using only features that help predict with RFECV among feature selection techniques. To verify the performance of the proposed model, we demonstrate its validity by conducting experiments by data augmentation ratio.