• 제목/요약/키워드: Random Forest Regression

검색결과 266건 처리시간 0.031초

Multi-Layer Perceptron과 Random Forest를 이용한 실린더 판재의 성형 조건 예측 (Application of Multi-Layer Perceptron and Random Forest Method for Cylinder Plate Forming)

  • 김성겸;황세윤;이장현
    • 대한조선학회논문집
    • /
    • 제57권5호
    • /
    • pp.297-304
    • /
    • 2020
  • In this study, the prediction method was reviewed to process a cylindrical plate forming using machine learning as a data-driven approach by roll bending equipment. The calculation of the forming variables was based on the analysis using the mechanical relationship between the material properties and the roll bending machine in the bending process. Then, by applying the finite element analysis method, the accuracy of the deformation prediction model was reviewed, and a large number data set was created to apply to machine learning using the finite element analysis model for deformation prediction. As a result of the application of the machine learning model, it was confirmed that the calculation is slightly higher than the linear regression method. Applicable results were confirmed through the machine learning method.

랜덤 포리스트를 이용한 비제어 급성 출혈성 쇼크의 흰쥐에서의 생존 예측 (A Survival Prediction Model of Rats in Uncontrolled Acute Hemorrhagic Shock Using the Random Forest Classifier)

  • 최준열;김성권;구정모;김덕원
    • 대한의용생체공학회:의공학회지
    • /
    • 제33권3호
    • /
    • pp.148-154
    • /
    • 2012
  • Hemorrhagic shock is a primary cause of deaths resulting from injury in the world. Although many studies have tried to diagnose accurately hemorrhagic shock in the early stage, such attempts were not successful due to compensatory mechanisms of humans. The objective of this study was to construct a survival prediction model of rats in acute hemorrhagic shock using a random forest (RF) model. Heart rate (HR), mean arterial pressure (MAP), respiration rate (RR), lactate concentration (LC), and peripheral perfusion (PP) measured in rats were used as input variables for the RF model and its performance was compared with that of a logistic regression (LR) model. Before constructing the models, we performed 5-fold cross validation for RF variable selection, and forward stepwise variable selection for the LR model to examine which variables were important for the models. For the LR model, sensitivity, specificity, accuracy, and area under the receiver operating characteristic curve (ROC-AUC) were 0.83, 0.95, 0.88, and 0.96, respectively. For the RF models, sensitivity, specificity, accuracy, and AUC were 0.97, 0.95, 0.96, and 0.99, respectively. In conclusion, the RF model was superior to the LR model for survival prediction in the rat model.

데이터마이닝을 활용한 한국프로야구 승패예측모형 수립에 관한 연구 (Using Data Mining Techniques to Predict Win-Loss in Korean Professional Baseball Games)

  • 오윤학;김한;윤재섭;이종석
    • 대한산업공학회지
    • /
    • 제40권1호
    • /
    • pp.8-17
    • /
    • 2014
  • In this research, we employed various data mining techniques to build predictive models for win-loss prediction in Korean professional baseball games. The historical data containing information about players and teams was obtained from the official materials that are provided by the KBO website. Using the collected raw data, we additionally prepared two more types of dataset, which are in ratio and binary format respectively. Dividing away-team's records by the records of the corresponding home-team generated the ratio dataset, while the binary dataset was obtained by comparing the record values. We applied seven classification techniques to three (raw, ratio, and binary) datasets. The employed data mining techniques are decision tree, random forest, logistic regression, neural network, support vector machine, linear discriminant analysis, and quadratic discriminant analysis. Among 21(= 3 datasets${\times}$7 techniques) prediction scenarios, the most accurate model was obtained from the random forest technique based on the binary dataset, which prediction accuracy was 84.14%. It was also observed that using the ratio and the binary dataset helped to build better prediction models than using the raw data. From the capability of variable selection in decision tree, random forest, and stepwise logistic regression, we found that annual salary, earned run, strikeout, pitcher's winning percentage, and four balls are important winning factors of a game. This research is distinct from existing studies in that we used three different types of data and various data mining techniques for win-loss prediction in Korean professional baseball games.

비만 폐쇄수면무호흡 환자에서 기계학습을 통한 적정양압 예측모형 (Predictive Model of Optimal Continuous Positive Airway Pressure for Obstructive Sleep Apnea Patients with Obesity by Using Machine Learning)

  • 김승수;양광익
    • Journal of Sleep Medicine
    • /
    • 제15권2호
    • /
    • pp.48-54
    • /
    • 2018
  • Objectives: The aim of this study was to develop a predicting model for the optimal continuous positive airway pressure (CPAP) for obstructive sleep apnea (OSA) patient with obesity by using a machine learning. Methods: We retrospectively investigated the medical records of 162 OSA patients who had obesity [body mass index (BMI) ≥ 25] and undertaken successful CPAP titration study. We divided the data to a training set (90%) and a test set (10%), randomly. We made a random forest model and a least absolute shrinkage and selection operator (lasso) regression model to predict the optimal pressure by using the training set, and then applied our models and previous reported equations to the test set. To compare the fitness of each models, we used a correlation coefficient (CC) and a mean absolute error (MAE). Results: The random forest model showed the best performance {CC 0.78 [95% confidence interval (CI) 0.43-0.93], MAE 1.20}. The lasso regression model also showed the improved result [CC 0.78 (95% CI 0.42-0.93), MAE 1.26] compared to the Hoffstein equation [CC 0.68 (95% CI 0.23-0.89), MAE 1.34] and the Choi's equation [CC 0.72 (95% CI 0.30-0.90), MAE 1.40]. Conclusions: Our random forest model and lasso model ($26.213+0.084{\times}BMI+0.004{\times}$apnea-hypopnea index+$0.004{\times}oxygen$ desaturation index-$0.215{\times}mean$ oxygen saturation) showed the improved performance compared to the previous reported equations. The further study for other subgroup or phenotype of OSA is required.

OLE File Analysis and Malware Detection using Machine Learning

  • Choi, Hyeong Kyu;Kang, Ah Reum
    • 한국컴퓨터정보학회논문지
    • /
    • 제27권5호
    • /
    • pp.149-156
    • /
    • 2022
  • 최근 전 세계적으로 사용되는 Microsoft Office 파일에 악성코드를 삽입하는 문서형 악성코드 사례가 증가하고 있다. 문서형 악성코드는 문서 내에 악성코드를 인코딩하여 숨기는 경우가 많기 때문에 백신 프로그램을 쉽게 우회할 수 있다. 이러한 문서형 악성코드를 탐지하기 위해 먼저 Microsoft Office 파일의 형식인 OLE(Object Linking and Embedding) 파일의 구조를 분석했다. Microsoft Office에서 지원하는 기능인 VBA(Visual Basic for Applications) 매크로에 외부 프로그램을 실행시키는 쉘코드, 외부 URL에서 파일을 다운받는 URL 관련 코드 등 다수의 악성코드가 삽입된 것을 확인했다. 문서형 악성코드에서 반복적으로 등장하는 키워드 354개를 선정하였고, 각 키워드가 본문에 등장하는 횟수를 feature 로 정의했다. SVM, naïve Bayes, logistic regression, random forest 알고리즘으로 머신러닝을 수행하였으며, 각각 0.994, 0.659, 0.995, 0.998의 정확도를 보였다.

머신러닝을 활용한 대학생 중도탈락 위험군의 예측모델 비교 연구 : N대학 사례를 중심으로 (A Comparative Study of Prediction Models for College Student Dropout Risk Using Machine Learning: Focusing on the case of N university)

  • 김소현;조성현
    • 대한통합의학회지
    • /
    • 제12권2호
    • /
    • pp.155-166
    • /
    • 2024
  • Purpose : This study aims to identify key factors for predicting dropout risk at the university level and to provide a foundation for policy development aimed at dropout prevention. This study explores the optimal machine learning algorithm by comparing the performance of various algorithms using data on college students' dropout risks. Methods : We collected data on factors influencing dropout risk and propensity were collected from N University. The collected data were applied to several machine learning algorithms, including random forest, decision tree, artificial neural network, logistic regression, support vector machine (SVM), k-nearest neighbor (k-NN) classification, and Naive Bayes. The performance of these models was compared and evaluated, with a focus on predictive validity and the identification of significant dropout factors through the information gain index of machine learning. Results : The binary logistic regression analysis showed that the year of the program, department, grades, and year of entry had a statistically significant effect on the dropout risk. The performance of each machine learning algorithm showed that random forest performed the best. The results showed that the relative importance of the predictor variables was highest for department, age, grade, and residence, in the order of whether or not they matched the school location. Conclusion : Machine learning-based prediction of dropout risk focuses on the early identification of students at risk. The types and causes of dropout crises vary significantly among students. It is important to identify the types and causes of dropout crises so that appropriate actions and support can be taken to remove risk factors and increase protective factors. The relative importance of the factors affecting dropout risk found in this study will help guide educational prescriptions for preventing college student dropout.

약물유전체학에서 약물반응 예측모형과 변수선택 방법 (Feature selection and prediction modeling of drug responsiveness in Pharmacogenomics)

  • 김규환;김원국
    • 응용통계연구
    • /
    • 제34권2호
    • /
    • pp.153-166
    • /
    • 2021
  • 약물유전체학 연구의 주요 목표는 고차원의 유전 변수를 기반으로 개인의 약물 반응성을 예측하는 것이다. 변수의 개수가 많기 때문에 변수의 개수를 줄이기 위해서는 변수 선택이 필요하며, 선택된 변수들은 머신러닝 알고리즘을 사용하여 예측 모델을 구축하는데 사용된다. 본 연구에서는 400명의 뇌전증 환자의 차세대 염기서열 분석 데이터에 로지스틱 회귀, ReliefF, TurF, 랜덤 포레스트, LASSO의 조합과 같은 여러 가지 혼합 변수 선택 방법을 적용하였다. 선택된 변수들에 랜덤포레스트, 그래디언트 부스팅, 서포트벡터머신을 포함한 머신러닝 방법들을 적용했고 스태킹을 통해 앙상블 모형을 구축하였다. 본 연구의 결과는 랜덤포레스트와 ReliefF의 혼합 변수 선택 방법을 이용한 스태킹 모형이 다른 모형보다 더 좋은 성능을 보인다는 것을 보여주었다. 5-폴드 교차 검증을 기반으로 하여 적합한 최적 모형의 평균 검증 정확도는 0.727이고 평균 검증 AUC 값은 0.761로 나타났다. 또한, 동일한 변수를 사용할 때 스태킹 모델이 단일 머신러닝 예측 모델보다 성능이 우수한 것으로 나타났다.

항공 LiDAR 자료를 이용한 산림재적추정 모델 개발 - 봉화군 춘양면 애당리 혼효림을 대상으로 - (Development of Forest Volume Estimation Model Using Airborne LiDAR Data - A Case Study of Mixed Forest in Aedang-ri, Chunyang-myeon, Bonghwa-gun -)

  • 조승완;김용구;박주원
    • 한국지리정보학회지
    • /
    • 제20권3호
    • /
    • pp.181-194
    • /
    • 2017
  • 본 연구의 목적은 산림재적 현장자료와 항공 LiDAR 자료 기반의 산림재적 추정을 위한 회귀모델의 개발이다. 추정 모델은 경상북도 봉화군 지역에서 임의추출법에 의해 선정된 30개의 원형 표본지로부터 산출한 표본지별 산림재적을 반응변수로 하고, 항공 LiDAR 원자료로부터 개별 표본지의 고도분포 백분위수(Height Percentiles, HP) 및 층위 단위 점 개체수 백분율(Height Bin, HB)을 추출하여 예측변수로 사용하여 구성하였다. 단순선형회귀분석, 이차 다항회귀분석 및 단계적 회귀분석 방법을 이용한 다중회귀분석을 실시하여 적합모델들의 후보들을 도출하였으며, 검증을 위하여 각 모델별로 교차 타당성 검증을 실시하여 PRESS 통계치를 구하였다. 모델의 $R^2$ 및 PRESS을 비교하여 적합성을 검토한 결과, $HB_{5-10}$, $HB_{15-20}$, $HB_{20-25}$, $HBgt_{25}$의 다중회귀모델의 $R^2$이 0.509로 가장 높고, $HP_{25}$ 단순회귀모델의 PRESS 값이 122.352으로 가장 낮은 것으로 나타났다. 수직구조가 복잡한 우리나라 산림재적을 추정하는 모델로는 다양한 수직적 정보를 포함하고 있는 $HB_{5-10}$, $HB_{15-20}$, $HB_{20-25}$, $HBgt_{25}$이 상대적으로 보다 적합하다고 사료된다.

데이터마이닝 기법들을 통한 제주 안개 예측 방안 연구 (A Study on Fog Forecasting Method through Data Mining Techniques in Jeju)

  • 이영미;배주현;박다빈
    • 한국환경과학회지
    • /
    • 제25권4호
    • /
    • pp.603-613
    • /
    • 2016
  • Fog may have a significant impact on road conditions. In an attempt to improve fog predictability in Jeju, we conducted machine learning with various data mining techniques such as tree models, conditional inference tree, random forest, multinomial logistic regression, neural network and support vector machine. To validate machine learning models, the results from the simulation was compared with the fog data observed over Jeju(184 ASOS site) and Gosan(185 ASOS site). Predictive rates proposed by six data mining methods are all above 92% at two regions. Additionally, we validated the performance of machine learning models with WRF (weather research and forecasting) model meteorological outputs. We found that it is still not good enough for operational fog forecast. According to the model assesment by metrics from confusion matrix, it can be seen that the fog prediction using neural network is the most effective method.

기계학습모델을 이용한 저수지 수위 예측 (Reservoir Water Level Forecasting Using Machine Learning Models)

  • 서영민;최은혁;여운기
    • 한국농공학회논문집
    • /
    • 제59권3호
    • /
    • pp.97-110
    • /
    • 2017
  • This study investigates the efficiencies of machine learning models, including artificial neural network (ANN), generalized regression neural network (GRNN), adaptive neuro-fuzzy inference system (ANFIS) and random forest (RF), for reservoir water level forecasting in the Chungju Dam, South Korea. The models' efficiencies are assessed based on model efficiency indices and graphical comparison. The forecasting results of the models are dependent on lead times and the combination of input variables. For lead time t = 1 day, ANFIS1 and ANN6 models yield superior forecasting results to RF6 and GRNN6 models. For lead time t = 5 days, ANN1 and RF6 models produce better forecasting results than ANFIS1 and GRNN3 models. For lead time t = 10 days, ANN3 and RF1 models perform better than ANFIS3 and GRNN3 models. It is found that ANN model yields the best performance for all lead times, in terms of model efficiency and graphical comparison. These results indicate that the optimal combination of input variables and forecasting models depending on lead times should be applied in reservoir water level forecasting, instead of the single combination of input variables and forecasting models for all lead times.