• Title/Summary/Keyword: regression tree

Search Result 671, Processing Time 0.044 seconds

Multivariate quantile regression tree (다변량 분위수 회귀나무 모형에 대한 연구)

  • Kim, Jaeoh;Cho, HyungJun;Bang, Sungwan
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.3
    • /
    • pp.533-545
    • /
    • 2017
  • Quantile regression models provide a variety of useful statistical information by estimating the conditional quantile function of the response variable. However, the traditional linear quantile regression model can lead to the distorted and incorrect results when analysing real data having a nonlinear relationship between the explanatory variables and the response variables. Furthermore, as the complexity of the data increases, it is required to analyse multiple response variables simultaneously with more sophisticated interpretations. For such reasons, we propose a multivariate quantile regression tree model. In this paper, a new split variable selection algorithm is suggested for a multivariate regression tree model. This algorithm can select the split variable more accurately than the previous method without significant selection bias. We investigate the performance of our proposed method with both simulation and real data studies.

Development and Evaluation of Electronic Health Record Data-Driven Predictive Models for Pressure Ulcers (전자건강기록 데이터 기반 욕창 발생 예측모델의 개발 및 평가)

  • Park, Seul Ki;Park, Hyeoun-Ae;Hwang, Hee
    • Journal of Korean Academy of Nursing
    • /
    • v.49 no.5
    • /
    • pp.575-585
    • /
    • 2019
  • Purpose: The purpose of this study was to develop predictive models for pressure ulcer incidence using electronic health record (EHR) data and to compare their predictive validity performance indicators with that of the Braden Scale used in the study hospital. Methods: A retrospective case-control study was conducted in a tertiary teaching hospital in Korea. Data of 202 pressure ulcer patients and 14,705 non-pressure ulcer patients admitted between January 2015 and May 2016 were extracted from the EHRs. Three predictive models for pressure ulcer incidence were developed using logistic regression, Cox proportional hazards regression, and decision tree modeling. The predictive validity performance indicators of the three models were compared with those of the Braden Scale. Results: The logistic regression model was most efficient with a high area under the receiver operating characteristics curve (AUC) estimate of 0.97, followed by the decision tree model (AUC 0.95), Cox proportional hazards regression model (AUC 0.95), and the Braden Scale (AUC 0.82). Decreased mobility was the most significant factor in the logistic regression and Cox proportional hazards models, and the endotracheal tube was the most important factor in the decision tree model. Conclusion: Predictive validity performance indicators of the Braden Scale were lower than those of the logistic regression, Cox proportional hazards regression, and decision tree models. The models developed in this study can be used to develop a clinical decision support system that automatically assesses risk for pressure ulcers to aid nurses.

A Study on Determinants of Stockpile Ammunition using Data Mining (데이터 마이닝을 활용한 장기저장탄약 상태 결정요인 분석 연구)

  • Roh, Yu Chan;Cho, Nam-Wook;Lee, Dongnyok
    • Journal of Korean Society for Quality Management
    • /
    • v.48 no.2
    • /
    • pp.297-307
    • /
    • 2020
  • Purpose: The purpose of this study is to analyze the factors that affect ammunition performance by applying data mining techniques to the Ammunition Stockpile Reliability Program (ASRP) data of the 155mm propelling charge. Methods: The ASRP data from 1999 to 2017 have been utilized. Logistic regression and decision tree analysis were used to investigate the factors that affect performance of ammunition. The performance evaluation of each model was conducted through comparison with an artificial neural networks(ANN) model. Results: The results of this study are as follows; logistic regression and the decision tree analysis showed that major defect rate of visual inspection is the most significant factor. Also, muzzle velocity by base charge and muzzle velocity by increment charge are also among the significant factors affecting the performance of 155mm propelling charge. To validate the logistic regression and decision tree models, their classification accuracies have been compared with the results of an ANN model. The results indicate that the logistic regression and decision tree models show sufficient performance which conforms the validity of the models. Conclusion: The main contribution of this paper is that, to our best knowledge, it is the first attempt at identifying the significant factors of ASPR data by using data mining techniques. The approaches suggested in the paper could also be extended to other types ammunition data.

Comparison of the Prediction Model of Adolescents' Suicide Attempt Using Logistic Regression and Decision Tree: Secondary Data Analysis of the 2019 Youth Health Risk Behavior Web-Based Survey (로지스틱 회귀모형과 의사결정 나무모형을 활용한 청소년 자살 시도 예측모형 비교: 2019 청소년 건강행태 온라인조사를 이용한 2차 자료분석)

  • Lee, Yoonju;Kim, Heejin;Lee, Yesul;Jeong, Hyesun
    • Journal of Korean Academy of Nursing
    • /
    • v.51 no.1
    • /
    • pp.40-53
    • /
    • 2021
  • Purpose: The purpose of this study was to develop and compare the prediction model for suicide attempts by Korean adolescents using logistic regression and decision tree analysis. Methods: This study utilized secondary data drawn from the 2019 Youth Health Risk Behavior web-based survey. A total of 20 items were selected as the explanatory variables (5 of sociodemographic characteristics, 10 of health-related behaviors, and 5 of psychosocial characteristics). For data analysis, descriptive statistics and logistic regression with complex samples and decision tree analysis were performed using IBM SPSS ver. 25.0 and Stata ver. 16.0. Results: A total of 1,731 participants (3.0%) out of 57,303 responded that they had attempted suicide. The most significant predictors of suicide attempts as determined using the logistic regression model were experience of sadness and hopelessness, substance abuse, and violent victimization. Girls who have experience of sadness and hopelessness, and experience of substance abuse have been identified as the most vulnerable group in suicide attempts in the decision tree model. Conclusion: Experiences of sadness and hopelessness, experiences of substance abuse, and experiences of violent victimization are the common major predictors of suicide attempts in both logistic regression and decision tree models, and the predict rates of both models were similar. We suggest to provide programs considering combination of high-risk predictors for adolescents to prevent suicide attempt.

Selecting Machine Learning Model Based on Natural Language Processing for Shanghanlun Diagnostic System Classification (자연어 처리 기반 『상한론(傷寒論)』 변병진단체계(辨病診斷體系) 분류를 위한 기계학습 모델 선정)

  • Young-Nam Kim
    • 대한상한금궤의학회지
    • /
    • v.14 no.1
    • /
    • pp.41-50
    • /
    • 2022
  • Objective : The purpose of this study is to explore the most suitable machine learning model algorithm for Shanghanlun diagnostic system classification using natural language processing (NLP). Methods : A total of 201 data items were collected from 『Shanghanlun』 and 『Clinical Shanghanlun』, 'Taeyangbyeong-gyeolhyung' and 'Eumyangyeokchahunobokbyeong' were excluded to prevent oversampling or undersampling. Data were pretreated using a twitter Korean tokenizer and trained by logistic regression, ridge regression, lasso regression, naive bayes classifier, decision tree, and random forest algorithms. The accuracy of the models were compared. Results : As a result of machine learning, ridge regression and naive Bayes classifier showed an accuracy of 0.843, logistic regression and random forest showed an accuracy of 0.804, and decision tree showed an accuracy of 0.745, while lasso regression showed an accuracy of 0.608. Conclusions : Ridge regression and naive Bayes classifier are suitable NLP machine learning models for the Shanghanlun diagnostic system classification.

  • PDF

Investment, Export, and Exchange Rate on Prediction of Employment with Decision Tree, Random Forest, and Gradient Boosting Machine Learning Models (투자와 수출 및 환율의 고용에 대한 의사결정 나무, 랜덤 포레스트와 그래디언트 부스팅 머신러닝 모형 예측)

  • Chae-Deug Yi
    • Korea Trade Review
    • /
    • v.46 no.2
    • /
    • pp.281-299
    • /
    • 2021
  • This paper analyzes the feasibility of using machine learning methods to forecast the employment. The machine learning methods, such as decision tree, artificial neural network, and ensemble models such as random forest and gradient boosting regression tree were used to forecast the employment in Busan regional economy. The following were the main findings of the comparison of their predictive abilities. First, the forecasting power of machine learning methods can predict the employment well. Second, the forecasting values for the employment by decision tree models appeared somewhat differently according to the depth of decision trees. Third, the predictive power of artificial neural network model, however, does not show the high predictive power. Fourth, the ensemble models such as random forest and gradient boosting regression tree model show the higher predictive power. Thus, since the machine learning method can accurately predict the employment, we need to improve the accuracy of forecasting employment with the use of machine learning methods.

Comparative Analysis of Predictors of Depression for Residents in a Metropolitan City using Logistic Regression and Decision Making Tree (로지스틱 회귀분석과 의사결정나무 분석을 이용한 일 대도시 주민의 우울 예측요인 비교 연구)

  • Kim, Soo-Jin;Kim, Bo-Young
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.12
    • /
    • pp.829-839
    • /
    • 2013
  • This study is a descriptive research study with the purpose of predicting and comparing factors of depression affecting residents in a metropolitan city by using logistic regression analysis and decision-making tree analysis. The subjects for the study were 462 residents ($20{\leq}aged{\angle}65$) in a metropolitan city. This study collected data between October 7, 2011 and October 21, 2011 and analyzed them with frequency analysis, percentage, the mean and standard deviation, ${\chi}^2$-test, t-test, logistic regression analysis, roc curve, and a decision-making tree by using SPSS 18.0 program. The common predicting variables of depression in community residents were social dysfunction, perceived physical symptom, and family support. The specialty and sensitivity of logistic regression explained 93.8% and 42.5%. The receiver operating characteristic (roc) curve was used to determine an optimal model. The AUC (area under the curve) was .84. Roc curve was found to be statistically significant (p=<.001). The specialty and sensitivity of decision-making tree analysis were 98.3% and 20.8% respectively. As for the whole classification accuracy, the logistic regression explained 82.0% and the decision making tree analysis explained 80.5%. From the results of this study, it is believed that the sensitivity, the classification accuracy, and the logistics regression analysis as shown in a higher degree may be useful materials to establish a depression prediction model for the community residents.

Unit Nonresponse Weighting Adjustment Using Regression Tree (회귀나무를 이용한 무응답 가중치 조정)

  • Kim, Se-Mi;Lee, Seok-Hun
    • Proceedings of the Korean Association for Survey Research Conference
    • /
    • 2005.12a
    • /
    • pp.169-183
    • /
    • 2005
  • This paper considers formation of nonresponse weighting adjustment cell for handling unit nonresponse in sample surveys. We propose a multivariate regression tree mehtod for segmentation using the variable of interest and the estimated response probability simultaneously to construct effective nonresponse adjustment cell. One is using only response data and the other is using response and nonresponse data. These two cases are compared in terms of bias.

  • PDF

The Prediction Performance of the CART Using Bank and Insurance Company Data (CART의 예측 성능:은행 및 보험 회사 데이터 사용)

  • Park, Jeong-Seon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.3 no.6
    • /
    • pp.1468-1472
    • /
    • 1996
  • In this study, the performance of the CART(Classification and Regression Tree) is compared with that of discriminant analysis method. In most experiments using bank data, discriminant analysis shows better performance in terms of the total cost. In contrast, most experiments using insurance data show that the CART is better than discriminant analysis in terms of the total cost. The contradictory result are analysed by using the characteristics of the data sets. The performances of both the Classification and Regression Tree and discriminant analysis depend on the parameters:failure prior probability, data used, type I error, type II error cost, and validation method.

  • PDF

Dynamic Caching Routing Strategy for LEO Satellite Nodes Based on Gradient Boosting Regression Tree

  • Yang Yang;Shengbo Hu;Guiju Lu
    • Journal of Information Processing Systems
    • /
    • v.20 no.1
    • /
    • pp.131-147
    • /
    • 2024
  • A routing strategy based on traffic prediction and dynamic cache allocation for satellite nodes is proposed to address the issues of high propagation delay and overall delay of inter-satellite and satellite-to-ground links in low Earth orbit (LEO) satellite systems. The spatial and temporal correlations of satellite network traffic were analyzed, and the relevant traffic through the target satellite was extracted as raw input for traffic prediction. An improved gradient boosting regression tree algorithm was used for traffic prediction. Based on the traffic prediction results, a dynamic cache allocation routing strategy is proposed. The satellite nodes periodically monitor the traffic load on inter-satellite links (ISLs) and dynamically allocate cache resources for each ISL with neighboring nodes. Simulation results demonstrate that the proposed routing strategy effectively reduces packet loss rate and average end-to-end delay and improves the distribution of services across the entire network.