• Title/Summary/Keyword: 회귀나무

Search Result 246, Processing Time 0.026 seconds

Identifying Influencing Factors of Soldiers' Depression using Multiple Regression and CART (다중회귀와 회귀나무를 활용한 군인 우울 요인 분석)

  • Woo, Chung Hee;PARK, JU YOUNG;Lee, Yujeong
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2013.05a
    • /
    • pp.171-172
    • /
    • 2013
  • 우울은 군대 내 발생되는 극단적인 사고 중 하나인 자살의 주요 원인으로 제시되어 왔다. 본 연구는 군인들의 우울, 불안 및 자아존중감의 수준을 파악하고, 우울의 영향요인을 탐색하고 이들을 예측하는데 주로 사용해 왔던 다중회귀분석 방법과 효과적인 의사결정방법으로 알려진 회귀나무모형의 효과성을 비교해보고자 하였다. 방법: 횡단적 조사연구이며, 우울측정에는 CES-D, 불안측정은 SAI, 자아존중감은 Rosenberg(1965)의 도구를 사용하였다. 연구대상자는 강원도 전방 부대 근무 중인 군인이며, 534부가 회수되었다. SPSS/WIN 18.0을 이용하여 위계적 다중회귀분석과 회귀나무모형을 실시하였다. 결과: 대상자들의 우울, 불안 및 자아존중감의 정도는 각각 $10.7({\pm}9.8)$, $38.5({\pm}10.2)$$31.7({\pm}5.2)$이었다. 대상자의 23.6%(126명)가 경한 우울을 나타내었다. 다중회귀분석에 의한 우울 영향요인은 불안, 자아존중감과 복무기간이었으며, 우울에 대하여 62.0%의 설명력을 가지고 있었다. 또한 회귀나무모형에서는 높은 불안과 불안이 다소 낮더라도 전역 후 진로가 불확실한 집단이 우울 위험군일 것으로 예측되었다. 결론: 본 연구 대상자들의 우울의 주요 영향요인은 불안으로 나타났다. 군대 내에서 적용할 수 있는 불안 조절 방법 개발이 필요할 것으로 보인다. 또한 일부 요인에서 차이가 있어, 반복 연구가 필요하지만, 주요 변인인 불안을 예측했다는 점에서 보면 다중회귀분석과 회귀나무모형은 군인들의 우울을 예측에 유용한 방법으로 보인다.

  • PDF

회귀나무에서 변수선택 편의에 관한 연구

  • Kim, Min-Ho;Kim, Jin-Heum
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2003.10a
    • /
    • pp.263-268
    • /
    • 2003
  • Breiman, Friedman, Olshen and Stone(1984)의 전체탐색법에 의한 회귀나무는 상대적으로 많은 분리가 가능한 변수로 분리기준이 정해지는 편의 현상을 갖고 있다. 본 연구에서는 이런 문제점을 해결할 수 있는 알고리즘을 제안하여 변수선택편의가 없는 회귀나무를 만들고자 한다. 제안하는 알고리즘은 노드의 분리변수를 선택하는 단계와 그 선택된 변수에 의해 이진분리를 위한 분리점을 찾는 단계로 구성되어 있다. 예측변수 중에서 목표변수와 가장 밀접하게 연관된 예측변수는 예측변수의 자료의 종류에 따라 스피어만의 순위상관계수에 의한 검정 혹은 크루스칼-왈리스의 통계량에 의한 검정을 수행하여 가장 통계적으로 유의한 변수로 선택하였고, 선택된 변수에만 Breiman et al.(1984)의 전체선택법을 적용하여 분리점을 결정하였다. 모의실험을 통해 변수선택편의, 변수선택력 , 그리고 평균제곱오차 측면에서 Breiman et al. (1984)의 CART(Classification and Regression Trees)와 제안한 알고리즘을 서로 비교하였다. 또한, 두 알고리즘을 실제 자료에 적용하여 효율을 서로 비교하였다.

  • PDF

Panel data analysis with regression trees (회귀나무 모형을 이용한 패널데이터 분석)

  • Chang, Youngjae
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.6
    • /
    • pp.1253-1262
    • /
    • 2014
  • Regression tree is a tree-structured solution in which a simple regression model is fitted to the data in each node made by recursive partitioning of predictor space. There have been many efforts to apply tree algorithms to various regression problems like logistic regression and quantile regression. Recently, algorithms have been expanded to the panel data analysis such as RE-EM algorithm by Sela and Simonoff (2012), and extension of GUIDE by Loh and Zheng (2013). The algorithms are briefly introduced and prediction accuracy of three methods are compared in this paper. In general, RE-EM shows good prediction accuracy with least MSE's in the simulation study. A RE-EM tree fitted to business survey index (BSI) panel data shows that sales BSI is the main factor which affects business entrepreneurs' economic sentiment. The economic sentiment BSI of non-manufacturing industries is higher than that of manufacturing ones among the relatively high sales group.

Analysis of AI interview data using unified non-crossing multiple quantile regression tree model (통합 비교차 다중 분위수회귀나무 모형을 활용한 AI 면접체계 자료 분석)

  • Kim, Jaeoh;Bang, Sungwan
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.6
    • /
    • pp.753-762
    • /
    • 2020
  • With an increasing interest in integrating artificial intelligence (AI) into interview processes, the Republic of Korea (ROK) army is trying to lead and analyze AI-powered interview platform. This study is to analyze the AI interview data using a unified non-crossing multiple quantile tree (UNQRT) model. Compared to the UNQRT, the existing models, such as quantile regression and quantile regression tree model (QRT), are inadequate for the analysis of AI interview data. Specially, the linearity assumption of the quantile regression is overly strong for the aforementioned application. While the QRT model seems to be applicable by relaxing the linearity assumption, it suffers from crossing problems among estimated quantile functions and leads to an uninterpretable model. The UNQRT circumvents the crossing problem of quantile functions by simultaneously estimating multiple quantile functions with a non-crossing constraint and is robust from extreme quantiles. Furthermore, the single tree construction from the UNQRT leads to an interpretable model compared to the QRT model. In this study, by using the UNQRT, we explored the relationship between the results of the Army AI interview system and the existing personnel data to derive meaningful results.

A Combined Multiple Regression Trees Predictor for Screening Large Chemical Databases (대용량 화학 데이터 베이스를 선별하기위한 결합다중회귀나무 예측치)

  • 임용빈;이소영;정종희
    • The Korean Journal of Applied Statistics
    • /
    • v.14 no.1
    • /
    • pp.91-101
    • /
    • 2001
  • It has been shown that the multiple trees predictors are more accurate in reducing test set error than a single tree predictor. There are two ways of generating multiple trees. One is to generate modified training sets by resampling the original training set, and then construct trees. It is known that arcing algorithm is efficient. The other is to perturb randomly the working split at each node from a list of best splits, which is expected to generate reasonably good trees for the original training set. We propose a new combined multiple regression trees predictor which uses the latter multiple regression tree predictor as a predictor based on a modified training set at each stage of arcing. The efficiency of those prediction methods are compared by applying to high throughput screening of chemical compounds for biological effects.

  • PDF

Multivariate quantile regression tree (다변량 분위수 회귀나무 모형에 대한 연구)

  • Kim, Jaeoh;Cho, HyungJun;Bang, Sungwan
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.3
    • /
    • pp.533-545
    • /
    • 2017
  • Quantile regression models provide a variety of useful statistical information by estimating the conditional quantile function of the response variable. However, the traditional linear quantile regression model can lead to the distorted and incorrect results when analysing real data having a nonlinear relationship between the explanatory variables and the response variables. Furthermore, as the complexity of the data increases, it is required to analyse multiple response variables simultaneously with more sophisticated interpretations. For such reasons, we propose a multivariate quantile regression tree model. In this paper, a new split variable selection algorithm is suggested for a multivariate regression tree model. This algorithm can select the split variable more accurately than the previous method without significant selection bias. We investigate the performance of our proposed method with both simulation and real data studies.

Interesting Node Finding Criteria for Regression Trees (회귀의사결정나무에서의 관심노드 찾는 분류 기준법)

  • 이영섭
    • The Korean Journal of Applied Statistics
    • /
    • v.16 no.1
    • /
    • pp.45-53
    • /
    • 2003
  • One of decision tree method is regression trees which are used to predict a continuous response. The general splitting criteria in tree growing are based on a compromise in the impurity between the left and the right child node. By picking or the more interesting subsets and ignoring the other, the proposed new splitting criteria in this paper do not split based on a compromise of child nodes anymore. The tree structure by the new criteria might be unbalanced but plausible. It can find a interesting subset as early as possible and express it by a simple clause. As a result, it is very interpretable by sacrificing a little bit of accuracy.

Study on noise attenuation according to hedge species (생울타리의 종에 따른 소음감소효과에 관한 연구)

  • Oh, Kwang-Il;Kim, Dong-Pil;Choi, Song-Hyun
    • Korean Journal of Environment and Ecology
    • /
    • v.23 no.3
    • /
    • pp.272-279
    • /
    • 2009
  • The purpose of this study is to examine noise attenuation according to hedge species and thickness of their leaves. The order of their decrease effects was as follows from the highest to the lowest: Osmanthus asiaticus, Camellia japonica, Pyacantha angustifolia, Photinia glabra, Pittosporum tobira, Nandina domestica, Euonymus japonica, Chaenomeles lagenaria, Aucuba japonica for. Variegatar. The result of the experiment for noise atteunation has shown that woody plant with thicker leaves were better than those with thinner leaves. Multiple Regression Analysis showed Y = 7.653 + 26.530 X ($R^2$= 0.385). The order for the subjects according to their effects on noise attenuation is as follows from the highest to the lowest: Camellia japonica, Nandina domestica, Pittosporum tobira, Taxus cuspidata, Chaenomeles lagenaria. The noise attenuation level of Camellia japonica was the highest (14.70[dB]), while that of Chaenomeles lagenaria was the lowest (6.80[dB]), and its difference between them was 7.9[dB].

Comparative Analysis of Predictors of Depression for Residents in a Metropolitan City using Logistic Regression and Decision Making Tree (로지스틱 회귀분석과 의사결정나무 분석을 이용한 일 대도시 주민의 우울 예측요인 비교 연구)

  • Kim, Soo-Jin;Kim, Bo-Young
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.12
    • /
    • pp.829-839
    • /
    • 2013
  • This study is a descriptive research study with the purpose of predicting and comparing factors of depression affecting residents in a metropolitan city by using logistic regression analysis and decision-making tree analysis. The subjects for the study were 462 residents ($20{\leq}aged{\angle}65$) in a metropolitan city. This study collected data between October 7, 2011 and October 21, 2011 and analyzed them with frequency analysis, percentage, the mean and standard deviation, ${\chi}^2$-test, t-test, logistic regression analysis, roc curve, and a decision-making tree by using SPSS 18.0 program. The common predicting variables of depression in community residents were social dysfunction, perceived physical symptom, and family support. The specialty and sensitivity of logistic regression explained 93.8% and 42.5%. The receiver operating characteristic (roc) curve was used to determine an optimal model. The AUC (area under the curve) was .84. Roc curve was found to be statistically significant (p=<.001). The specialty and sensitivity of decision-making tree analysis were 98.3% and 20.8% respectively. As for the whole classification accuracy, the logistic regression explained 82.0% and the decision making tree analysis explained 80.5%. From the results of this study, it is believed that the sensitivity, the classification accuracy, and the logistics regression analysis as shown in a higher degree may be useful materials to establish a depression prediction model for the community residents.

Application of Regression Tree Model for the Estimation of Groundwater Use at the Agricultural (Dry-field Farming and Rice Farming) Purpose Wells (농업용(전작 및 답작용) 지하수 이용량 추정을 위한 회귀나무 모형의 적용)

  • Kim, yoo-Bum;Hwang, Chan-Ik
    • The Journal of Engineering Geology
    • /
    • v.29 no.4
    • /
    • pp.417-425
    • /
    • 2019
  • Agricultural groundwater use accounts for 51.8% of total groundwater use, so accurate estimation of groundwater use is important for efficient groundwater management. The purpose of this study is to develop a method for estimating the groundwater use of agricultural (rice farming and dry-field farming) wells using regression tree model based on the measured data of 370 wells. Three input variables of the model were evaluated as being significant: well depth, pipe diameter, and pump capacity, and the importance of each variable was 75% for well depth, 17% for pipe diameter, and 8% for pumping capacity. The daily usage of agricultural (rice farming and dry-field farming) wells by the regression tree model was estimated to be very similar to the actual usage, compared to the previous estimation method proposed by the Ministry of Construction and Transportation. In the future, it is expected that the reliability of the usage statistics will be improved if additional observed data is secured and this classification method is modified.