• Title/Summary/Keyword: K-fold 교차검증

Search Result 48, Processing Time 0.025 seconds

Application of Time-series Cross Validation in Hyperparameter Tuning of a Predictive Model for 2,3-BDO Distillation Process (시계열 교차검증을 적용한 2,3-BDO 분리공정 온도예측 모델의 초매개변수 최적화)

  • An, Nahyeon;Choi, Yeongryeol;Cho, Hyungtae;Kim, Junghwan
    • Korean Chemical Engineering Research
    • /
    • v.59 no.4
    • /
    • pp.532-541
    • /
    • 2021
  • Recently, research on the application of artificial intelligence in the chemical process has been increasing rapidly. However, overfitting is a significant problem that prevents the model from being generalized well to predict unseen data on test data, as well as observed training data. Cross validation is one of the ways to solve the overfitting problem. In this study, the time-series cross validation method was applied to optimize the number of batch and epoch in the hyperparameters of the prediction model for the 2,3-BDO distillation process, and it compared with K-fold cross validation generally used. As a result, the RMSE of the model with time-series cross validation was lower by 9.06%, and the MAPE was higher by 0.61% than the model with K-fold cross validation. Also, the calculation time was 198.29 sec less than the K-fold cross validation method.

Applicability study on urban flooding risk criteria estimation algorithm using cross-validation and SVM (교차검증과 SVM을 이용한 도시침수 위험기준 추정 알고리즘 적용성 검토)

  • Lee, Hanseung;Cho, Jaewoong;Kang, Hoseon;Hwang, Jeonggeun
    • Journal of Korea Water Resources Association
    • /
    • v.52 no.12
    • /
    • pp.963-973
    • /
    • 2019
  • This study reviews a urban flooding risk criteria estimation model to predict risk criteria in areas where flood risk criteria are not precalculated by using watershed characteristic data and limit rainfall based on damage history. The risk criteria estimation model was designed using Support Vector Machine, one of the machine learning algorithms. The learning data consisted of regional limit rainfall and watershed characteristic. The learning data were applied to the SVM algorithm after normalization. We calculated the mean absolute error and standard deviation using Leave-One-Out and K-fold cross-validation algorithms and evaluated the performance of the model. In Leave-One-Out, models with small standard deviation were selected as the optimal model, and models with less folds were selected in the K-fold. The average accuracy of the selected models by rainfall duration is over 80%, suggesting that SVM can be used to estimate flooding risk criteria.

Region of Interest (ROI) Selection of Land Cover Using SVM Cross Validation (SVM 교차검증을 활용한 토지피복 ROI 선정)

  • Jeong, Jong-Chul;Youn, Hyoung-Jin
    • Journal of Cadastre & Land InformatiX
    • /
    • v.50 no.1
    • /
    • pp.75-85
    • /
    • 2020
  • This study examines machine learning cross-validation to utilized create ROI for classification of land cover. The study area located in Sejong and one KOMPSAT-3A image was used in this analysis: procedure on October 28, 2019. We used four bands(Red, Green, Blue, Near infra-red) for learning cross validation process. In this study, we used K-fold method in cross validation and used SVM kernel type with cross validation result. In addition, we used 4 kernels of SVM(Linear, Polynomial, RBF, Sigmoid) for supervised classification land cover map using extracted ROI. During the cross validation process, 1,813 data extracted from 3,500 data, and the most of the building, road and grass class data were removed about 60% during cross validation process. Based on this, the supervised SVM linear technique showed the highest classification accuracy of 91.77% compared to other kernel methods. The grass' producer accuracy showed 79.43% and identified a large mis-classification in forests. Depending on the results of the study, extraction ROI using cross validation may be effective in forest, water and agriculture areas, but it is deemed necessary to improve the distinction of built-up, grass and bare-soil area.

Rubber O-ring defect detection system using K-fold cross validation and support vector machine (K-겹 교차 검증과 서포트 벡터 머신을 이용한 고무 오링결함 검출 시스템)

  • Lee, Yong Eun;Choi, Nak Joon;Byun, Young Hoo;Kim, Dae Won;Kim, Kyung Chun
    • Journal of the Korean Society of Visualization
    • /
    • v.19 no.1
    • /
    • pp.68-73
    • /
    • 2021
  • In this study, the detection of rubber o-ring defects was carried out using k-fold cross validation and Support Vector Machine (SVM) algorithm. The data process was carried out in 3 steps. First, we proceeded with a frame alignment to eliminate unnecessary regions in the learning and secondly, we applied gray-scale changes for computational reduction. Finally, data processing was carried out using image augmentation to prevent data overfitting. After processing data, SVM algorithm was used to obtain normal and defect detection accuracy. In addition, we applied the SVM algorithm through the k-fold cross validation method to compare the classification accuracy. As a result, we obtain results that show better performance by applying the k-fold cross validation method.

Analysis of Credit Approval Data using Machine Learning Model (기계학습 모델을 이용한 신용 승인 데이터 분석)

  • Kim, Dong-Hyun;Kim, Se-Jun;Lee, Byung-Jun;Kim, Kyung-Tae;Youn, Hee-Yong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2019.01a
    • /
    • pp.41-42
    • /
    • 2019
  • 본 논문에서는 다양한 기계학습 모델을 이용한 신용 데이터 분석 기법에 대해 서술한다. 기계학습 모델은 크게 Canonical models, Committee machines, 그리고 Deep learning models로 분류된다. 이러한 다양한 기계학습 모델 중 일부 학습 모델을 기반으로 Benchmark dataset인 Credit Approval 데이터를 분석하고 성능을 평가한다. 성능 평가에는 k-fold evaluation method를 사용하며, k-fold evaluation 결과에 대한 평균 성능을 측정하기 위해 Accuracy, Precision, Recall, 그리고 F1-score가 사용되었다.

  • PDF

Study on fire smoke identification method based on SVM and K fold cross verification fusion algorithm (SVM과 K 접힘 교차 검증 융합 알고리즘 기반의 화재 연기 식별 방법 연구)

  • Wang Yudong;Sangbong Park;Jeonghwa Heo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.5
    • /
    • pp.843-847
    • /
    • 2023
  • In this paper, we propose a model for detecting efficient fire identification to prevent fires that can lead to various industrial accidents, farmland and large forest fires, with the widespread use of various chemicals and flammable substances as modern technology advances. This paper presents an algorithm that can detect fire smoke in a high-efficiency and short time using images, and an algorithm based on SVM(Support Vector Machine) and K fold cross-verification technologies. By analyzing images, fire and smoke detection algorithms have relatively superior detection performance compared to existing algorithms, and the analysis of fire and smoke characteristics detected in this paper is analyzed stably and efficiently and is expected to be used in various fields that may be exposed to fire risks in the future.

Threatening privacy by identifying appliances and the pattern of the usage from electric signal data (스마트 기기 환경에서 전력 신호 분석을 통한 프라이버시 침해 위협)

  • Cho, Jae yeon;Yoon, Ji Won
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.25 no.5
    • /
    • pp.1001-1009
    • /
    • 2015
  • In Smart Grid, smart meter sends our electric signal data to the main server of power supply in real-time. However, the more efficient the management of power loads become, the more likely the user's pattern of usage leaks. This paper points out the threat of privacy and the need of security measures in smart device environment by showing that it's possible to identify the appliances and the specific usage patterns of users from the smart meter's data. Learning algorithm PCA is used to reduce the dimension of the feature space and k-NN Classifier to infer appliances and states of them. Accuracy is validated with 10-fold Cross Validation.

A Study on Deriving the Statistical Weight Estimation Formula for an Aircraft Wing (항공기 날개의 통계적 중량 예측식 도출 연구)

  • Kim, Seok-Beom;Jeong, Han-Gyu;Hwang, Ho-Yon
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.46 no.1
    • /
    • pp.32-40
    • /
    • 2018
  • In this research, a method of deriving statistical weight prediction formula which is used during the conceptual design phase was studied and it was programmed using Microsoft Excel and verified by applying to jet transport aircraft. The database was built while referencing the variables of conventional wing weight estimation formulas and it was used for modeling the jet transport wing weight regression equation. The model was evaluated using the K-fold cross validation method to solve the overfitting problem of the model.

Implementation on the evolutionary machine learning approaches for streamflow forecasting: case study in the Seybous River, Algeria (유출예측을 위한 진화적 기계학습 접근법의 구현: 알제리 세이보스 하천의 사례연구)

  • Zakhrouf, Mousaab;Bouchelkia, Hamid;Stamboul, Madani;Kim, Sungwon;Singh, Vijay P.
    • Journal of Korea Water Resources Association
    • /
    • v.53 no.6
    • /
    • pp.395-408
    • /
    • 2020
  • This paper aims to develop and apply three different machine learning approaches (i.e., artificial neural networks (ANN), adaptive neuro-fuzzy inference systems (ANFIS), and wavelet-based neural networks (WNN)) combined with an evolutionary optimization algorithm and the k-fold cross validation for multi-step (days) streamflow forecasting at the catchment located in Algeria, North Africa. The ANN and ANFIS models yielded similar performances, based on four different statistical indices (i.e., root mean squared error (RMSE), Nash-Sutcliffe efficiency (NSE), correlation coefficient (R), and peak flow criteria (PFC)) for training and testing phases. The values of RMSE and PFC for the WNN model (e.g., RMSE = 8.590 ㎥/sec, PFC = 0.252 for (t+1) day, testing phase) were lower than those of ANN (e.g., RMSE = 19.120 ㎥/sec, PFC = 0.446 for (t+1) day, testing phase) and ANFIS (e.g., RMSE = 18.520 ㎥/sec, PFC = 0.444 for (t+1) day, testing phase) models, while the values of NSE and R for WNN model were higher than those of ANNs and ANFIS models. Therefore, the new approach can be a robust tool for multi-step (days) streamflow forecasting in the Seybous River, Algeria.

Diabetes Predictive Analytics using FCM Clustering based Supervised Learning Algorithm (FCM 클러스터링 기반 지도 학습 알고리즘을 이용한 당뇨병 예측 분석)

  • Park, Tae-eun;Kim, Kwang-baek
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.580-582
    • /
    • 2022
  • 본 논문에서는 데이터를 정량화하여 특징을 분류하기 위한 방법으로 퍼지 클러스터링 기반 지도 학습 방법을 제안한다. 제안된 방법은 FCM 클러스터링을 기법을 적용하여 군집화를 수행한다. 그리고 군집화 된 데이터들 중에서는 정확히 분류되지 않은 데이터가 존재하므로 분류되지 않은 데이터에 대해 지도 학습 방법을 적용한다. 본 논문에서는 당뇨병의 유무를 타겟 데이터로 설정하고 나머지 8개의 속성의 데이터를 FCM 기반 지도 학습 방법을 적용하여 당뇨병의 유무를 예측한다. 당뇨병 예측에 대한 성능을 30회의 K-겹 교차검증 (K-Fold Corss Validation)을 이용하여 평가하였으며, 다층 퍼셉트론의 경우에는 훈련 데이터가 77.88%, 테스트 데이터가 62.78%로 나타났고 제안된 방법의 경우에는 훈련 데이터가 79.96%, 테스트 데이터 74.16%로 나타났다.

  • PDF