• Title/Summary/Keyword: 의사결정나무회귀분석

Search Result 124, Processing Time 0.026 seconds

Inconsistent Pattern Model for Improving the Performance of Supervised Learning in Data Mining (데이터 마이닝의 지도학습 기법 성능향상을 위한 불일치 패턴 모델)

  • Heo, Jun;Kim, Jong-U
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2007.11a
    • /
    • pp.288-305
    • /
    • 2007
  • 본 논문은 데이터 마이닝의 기법 중 가장 잘 알려진 지도학습 기법의 성능 향상을 위한 새로운 Hybrid 및 Combined 기법인 불일치 패턴 모델(오차 패턴 모델)에 대한 연구 논문이다. 불일치 패턴 모델이란 2개 이상의 기법 중 향후 더 레코드별로 더 잘 맞출 수 있는 기법을 메타 분류하는 불일치 패턴 모델을 개발하여, 최종적으로는 기존의 기법보다 더 좋은 분류 정확도 및 예측 향상율을 기대하기 위한 기법을 의미한다. 본 논문에서는 의사 결정나무 추론 기법인 C5.0과 C&RT 그리고 신경망 분석, 그리고 로지스틱 회귀분석과 같은 대표적인 데이터 마이닝의 지도학습 기법을 이용하여 불일치 패턴 모델을 생성하여 보고, 이들이 기존 단일 기법과 기존의 Combined 모델인 Bagging, Boosting 그리고 Stacking 기법보다 성능이 우수함을 23개의 실제 데이터 및 공신력 있는 공개 데이터를 이용하여 증명하여 보였다. 또한 데이터의 특성에 따라서 불일치 패턴 모델의 성능의 변화 및 더 우수해 지는지를 알아보기 위한 연구포 같이 수행을 하여 본 모델의 활용성을 높이고자 하였다.

  • PDF

Particulate Matter Prediction using Quantile Boosting (분위수 부스팅을 이용한 미세먼지 농도 예측)

  • Kwon, Jun-Hyeon;Lim, Yaeji;Oh, Hee-Seok
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.1
    • /
    • pp.83-92
    • /
    • 2015
  • Concerning the national health, it is important to develop an accurate prediction method of atmospheric particulate matter (PM) because being exposed to such fine dust can trigger not only respiratory diseases as well as dermatoses, ophthalmopathies and cardiovascular diseases. The National Institute of Environmental Research (NIER) employs a decision tree to predict bad weather days with a high PM concentration. However, the decision tree method (even with the inherent unstableness) cannot be a suitable model to predict bad weather days which represent only 4% of the entire data. In this paper, while presenting the inaccuracy and inappropriateness of the method used by the NIER, we present the utility of a new prediction model which adopts boosting with quantile loss functions. We evaluate the performance of the new method over various ${\tau}$-value's and justify the proposed method through comparison.

A Study on the Prediction of the Surface Drifter Trajectories in the Korean Strait (대한해협에서 표층 뜰개 이동 예측 연구)

  • Ha, Seung Yun;Yoon, Han-Sam;Kim, Young-Taeg
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.34 no.1
    • /
    • pp.11-18
    • /
    • 2022
  • In order to improve the accuracy of particle tracking prediction techniques near the Korean Strait, this study compared and analyzed a particle tracking model based on a seawater flow numerical model and a machine learning based on a particle tracking model using field observation data. The data used in the study were the surface drifter buoy movement trajectory data observed in the Korea Strait, prediction data by machine learning (linear regression, decision tree) using the tide and wind data from three observation stations (Gageo Island, Geoje Island, Gyoboncho), and prediciton data by numerical models (ROMS, MOHID). The above three data were compared through three error evaluation methods (Correlation Coefficient (CC), Root Mean Square Errors (RMSE), and Normalized Cumulative Lagrangian Separation (NCLS)). As a final result, the decision tree model had the best prediction accuracy in CC and RMSE, and the MOHID model had the best prediction results in NCLS.

A study on the development of severity-adjusted mortality prediction model for discharged patient with acute stroke using machine learning (머신러닝을 이용한 급성 뇌졸중 퇴원 환자의 중증도 보정 사망 예측 모형 개발에 관한 연구)

  • Baek, Seol-Kyung;Park, Jong-Ho;Kang, Sung-Hong;Park, Hye-Jin
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.11
    • /
    • pp.126-136
    • /
    • 2018
  • The purpose of this study was to develop a severity-adjustment model for predicting mortality in acute stroke patients using machine learning. Using the Korean National Hospital Discharge In-depth Injury Survey from 2006 to 2015, the study population with disease code I60-I63 (KCD 7) were extracted for further analysis. Three tools were used for the severity-adjustment of comorbidity: the Charlson Comorbidity Index (CCI), the Elixhauser comorbidity index (ECI), and the Clinical Classification Software (CCS). The severity-adjustment models for mortality prediction in patients with acute stroke were developed using logistic regression, decision tree, neural network, and support vector machine methods. The most common comorbid disease in stroke patients were hypertension, uncomplicated (43.8%) in the ECI, and essential hypertension (43.9%) in the CCS. Among the CCI, ECI, and CCS, CCS had the highest AUC value. CCS was confirmed as the best severity correction tool. In addition, the AUC values for variables of CCS including main diagnosis, gender, age, hospitalization route, and existence of surgery were 0.808 for the logistic regression analysis, 0.785 for the decision tree, 0.809 for the neural network and 0.830 for the support vector machine. Therefore, the best predictive power was achieved by the support vector machine technique. The results of this study can be used in the establishment of health policy in the future.

Developing a Binary Classification Method for Bankruptcy Prediction (기업도산예측을 위한 이진분류기법의 개발)

  • Min, Jae-Hyeong;Jeong, Cheol-U
    • 한국경영정보학회:학술대회논문집
    • /
    • 2007.06a
    • /
    • pp.619-624
    • /
    • 2007
  • 본 연구는 유전 알고리듬에 기반한 새로운 도산예측기법을 개발하고 그 기법의 타당성 및 예측 우수성을 검증하는데 목적이 있다. 본 연구에서 제안하는 이진분류기법은 도산기업과 비도산기업을 대표할 수 있는 가상기업(virtual company)을 설정하고, 그 가상기업과 분류대상 기업 간의 유사도를 측정하여 도산여부를 분류하는 방법론으로, 가상기업의 변수 값과 각 변수의 가중치는 훈련용 자료의 분류정확도를 극대화할 수 있도록 유전 알고리듬을 이용하여 구하게 된다. 본 연구에서 제안하는 기법의 타당성을 검증하기 위해 기존의 도산예측기법과 예측성과를 실험을 통해 비교한 결과, 본 연구에서 개발한 기법의 예측력이 기존의 다변량판별분석, 로지스틱 회귀모형, 의사결정나무, 인공신경망 모형보다 높은 수준을 보이는 것을 확인하였다.

  • PDF

Comparison of the Prediction Model of Adolescents' Suicide Attempt Using Logistic Regression and Decision Tree: Secondary Data Analysis of the 2019 Youth Health Risk Behavior Web-Based Survey (로지스틱 회귀모형과 의사결정 나무모형을 활용한 청소년 자살 시도 예측모형 비교: 2019 청소년 건강행태 온라인조사를 이용한 2차 자료분석)

  • Lee, Yoonju;Kim, Heejin;Lee, Yesul;Jeong, Hyesun
    • Journal of Korean Academy of Nursing
    • /
    • v.51 no.1
    • /
    • pp.40-53
    • /
    • 2021
  • Purpose: The purpose of this study was to develop and compare the prediction model for suicide attempts by Korean adolescents using logistic regression and decision tree analysis. Methods: This study utilized secondary data drawn from the 2019 Youth Health Risk Behavior web-based survey. A total of 20 items were selected as the explanatory variables (5 of sociodemographic characteristics, 10 of health-related behaviors, and 5 of psychosocial characteristics). For data analysis, descriptive statistics and logistic regression with complex samples and decision tree analysis were performed using IBM SPSS ver. 25.0 and Stata ver. 16.0. Results: A total of 1,731 participants (3.0%) out of 57,303 responded that they had attempted suicide. The most significant predictors of suicide attempts as determined using the logistic regression model were experience of sadness and hopelessness, substance abuse, and violent victimization. Girls who have experience of sadness and hopelessness, and experience of substance abuse have been identified as the most vulnerable group in suicide attempts in the decision tree model. Conclusion: Experiences of sadness and hopelessness, experiences of substance abuse, and experiences of violent victimization are the common major predictors of suicide attempts in both logistic regression and decision tree models, and the predict rates of both models were similar. We suggest to provide programs considering combination of high-risk predictors for adolescents to prevent suicide attempt.

Development of Prediction Model for Prevalence of Metabolic Syndrome Using Data Mining: Korea National Health and Nutrition Examination Study (국민건강영양조사를 활용한 대사증후군 유병 예측모형 개발을 위한 융복합 연구: 데이터마이닝을 활용하여)

  • Kim, Han-Kyoul;Choi, Keun-Ho;Lim, Sung-Won;Rhee, Hyun-Sill
    • Journal of Digital Convergence
    • /
    • v.14 no.2
    • /
    • pp.325-332
    • /
    • 2016
  • The purpose of this study is to investigate the attributes influencing the prevalence of metabolic syndrome and develop the prediction model for metabolic syndrome over 40-aged people from Korea Health and Nutrition Examination Study 2012. The researcher chose the attributes for prediction model through literature review. Also, we used the decision tree, logistic regression, artificial neural network of data mining algorithm through Weka 3.6. As results, social economic status factors of input attributes were ranked higher than health-related factors. Additionally, prediction model using decision tree algorithm showed finally the highest accuracy. This study suggests that, first of all, prevention and management of metabolic syndrome will be approached by aspect of social economic status and health-related factors. Also, decision tree algorithms known from other research are useful in the field of public health due to their usefulness of interpretation.

Study on Predicting the Designation of Administrative Issue in the KOSDAQ Market Based on Machine Learning Based on Financial Data (머신러닝 기반 KOSDAQ 시장의 관리종목 지정 예측 연구: 재무적 데이터를 중심으로)

  • Yoon, Yanghyun;Kim, Taekyung;Kim, Suyeong
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.17 no.1
    • /
    • pp.229-249
    • /
    • 2022
  • This paper investigates machine learning models for predicting the designation of administrative issues in the KOSDAQ market through various techniques. When a company in the Korean stock market is designated as administrative issue, the market recognizes the event itself as negative information, causing losses to the company and investors. The purpose of this study is to evaluate alternative methods for developing a artificial intelligence service to examine a possibility to the designation of administrative issues early through the financial ratio of companies and to help investors manage portfolio risks. In this study, the independent variables used 21 financial ratios representing profitability, stability, activity, and growth. From 2011 to 2020, when K-IFRS was applied, financial data of companies in administrative issues and non-administrative issues stocks are sampled. Logistic regression analysis, decision tree, support vector machine, random forest, and LightGBM are used to predict the designation of administrative issues. According to the results of analysis, LightGBM with 82.73% classification accuracy is the best prediction model, and the prediction model with the lowest classification accuracy is a decision tree with 71.94% accuracy. As a result of checking the top three variables of the importance of variables in the decision tree-based learning model, the financial variables common in each model are ROE(Net profit) and Capital stock turnover ratio, which are relatively important variables in designating administrative issues. In general, it is confirmed that the learning model using the ensemble had higher predictive performance than the single learning model.

Analysis of Factors for Seasonal Meat Color Characteristics in Hanwoo(Korean Cattle) Beef using Decision Tree Method (의사결정나무분석기법을 이용한 계절별 한우육의 육색 특성에 미치는 요인분석)

  • Kim, Seok-Jung;Kim, Yong-Sun;Song, Young-Han;Lee, Sung-Ki
    • Journal of Animal Science and Technology
    • /
    • v.44 no.5
    • /
    • pp.607-616
    • /
    • 2002
  • This study analyzed the effects of pH, sex, backfat thickness, ribeye area, cold carcass weight, shipping month, muscle internal temperature, average daily temperature, and average relative humidity for slaughtered Hanwoo to meat color by season. The analyses focused on interaction and each effect to meat color of the factors. For the result for analysis of multiple linear regressions, meat color values were decreased as pH increased in all meat color, and the meat color values increased as the backfat thickness was increased. As the results of the decision tree analysis by each factor, cow and steer slaughtered in spring and autumn were the highest in the lightness(L*). The redness(a*) was the cases that pH was less than 5.63 and average relative humidity was over than 71.5% for Hanwoo slaughtered in autumn. The chroma(C*) value was the highest for Hanwoo that was slaughtered in summer and autumn, the pH was less than 5.60, and the back fat thickness was over than 8 mm. The hue angle($h^0$) was shown that the muscle internal temperature was less than 4.7$^{\circ}C$ among Hanwoo which was slaughtered in spring, summer, and autumn, the pH was less than 5.66, and the back fat thickness was over than 8 mm.

Study on Detection for Cochlodinium polykrikoides Red Tide using the GOCI image and Machine Learning Technique (GOCI 영상과 기계학습 기법을 이용한 Cochlodinium polykrikoides 적조 탐지 기법 연구)

  • Unuzaya, Enkhjargal;Bak, Su-Ho;Hwang, Do-Hyun;Jeong, Min-Ji;Kim, Na-Kyeong;Yoon, Hong-Joo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.15 no.6
    • /
    • pp.1089-1098
    • /
    • 2020
  • In this study, we propose a method to detect red tide Cochlodinium Polykrikoide using by machine learning and geostationary marine satellite images. To learn the machine learning model, GOCI Level 2 data were used, and the red tide location data of the National Fisheries Research and Development Institute was used. The machine learning model used logistic regression model, decision tree model, and random forest model. As a result of the performance evaluation, compared to the traditional GOCI image-based red tide detection algorithm without machine learning (Son et al., 2012) (75%), it was confirmed that the accuracy was improved by about 13~22%p (88~98%). In addition, as a result of comparing and analyzing the detection performance between machine learning models, the random forest model (98%) showed the highest detection accuracy.It is believed that this machine learning-based red tide detection algorithm can be used to detect red tide early in the future and track and monitor its movement and spread.