• Title/Summary/Keyword: quantile regression tree

Search Result 7, Processing Time 0.017 seconds

Wage Determinants Analysis by Quantile Regression Tree

  • Chang, Young-Jae
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.2
    • /
    • pp.293-301
    • /
    • 2012
  • Quantile regression proposed by Koenker and Bassett (1978) is a statistical technique that estimates conditional quantiles. The advantage of using quantile regression is the robustness in response to large outliers compared to ordinary least squares(OLS) regression. A regression tree approach has been applied to OLS problems to fit flexible models. Loh (2002) proposed the GUIDE algorithm that has a negligible selection bias and relatively low computational cost. Quantile regression can be regarded as an analogue of OLS, therefore it can also be applied to GUIDE regression tree method. Chaudhuri and Loh (2002) proposed a nonparametric quantile regression method that blends key features of piecewise polynomial quantile regression and tree-structured regression based on adaptive recursive partitioning. Lee and Lee (2006) investigated wage determinants in the Korean labor market using the Korean Labor and Income Panel Study(KLIPS). Following Lee and Lee, we fit three kinds of quantile regression tree models to KLIPS data with respect to the quantiles, 0.05, 0.2, 0.5, 0.8, and 0.95. Among the three models, multiple linear piecewise quantile regression model forms the shortest tree structure, while the piecewise constant quantile regression model has a deeper tree structure with more terminal nodes in general. Age, gender, marriage status, and education seem to be the determinants of the wage level throughout the quantiles; in addition, education experience appears as the important determinant of the wage level in the highly paid group.

Analysis of AI interview data using unified non-crossing multiple quantile regression tree model (통합 비교차 다중 분위수회귀나무 모형을 활용한 AI 면접체계 자료 분석)

  • Kim, Jaeoh;Bang, Sungwan
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.6
    • /
    • pp.753-762
    • /
    • 2020
  • With an increasing interest in integrating artificial intelligence (AI) into interview processes, the Republic of Korea (ROK) army is trying to lead and analyze AI-powered interview platform. This study is to analyze the AI interview data using a unified non-crossing multiple quantile tree (UNQRT) model. Compared to the UNQRT, the existing models, such as quantile regression and quantile regression tree model (QRT), are inadequate for the analysis of AI interview data. Specially, the linearity assumption of the quantile regression is overly strong for the aforementioned application. While the QRT model seems to be applicable by relaxing the linearity assumption, it suffers from crossing problems among estimated quantile functions and leads to an uninterpretable model. The UNQRT circumvents the crossing problem of quantile functions by simultaneously estimating multiple quantile functions with a non-crossing constraint and is robust from extreme quantiles. Furthermore, the single tree construction from the UNQRT leads to an interpretable model compared to the QRT model. In this study, by using the UNQRT, we explored the relationship between the results of the Army AI interview system and the existing personnel data to derive meaningful results.

Variable selection with quantile regression tree (분위수 회귀나무를 이용한 변수선택 방법 연구)

  • Chang, Youngjae
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.6
    • /
    • pp.1095-1106
    • /
    • 2016
  • The quantile regression method proposed by Koenker et al. (1978) focuses on conditional quantiles given by independent variables, and analyzes the relationship between response variable and independent variables at the given quantile. Considering the linear programming used for the estimation of quantile regression coefficients, the model fitting job might be difficult when large data are introduced for analysis. Therefore, dimension reduction (or variable selection) could be a good solution for the quantile regression of large data sets. Regression tree methods are applied to a variable selection for quantile regression in this paper. Real data of Korea Baseball Organization (KBO) players are analyzed following the variable selection approach based on the regression tree. Analysis result shows that a few important variables are selected, which are also meaningful for the given quantiles of salary data of the baseball players.

Penalized quantile regression tree (벌점화 분위수 회귀나무모형에 대한 연구)

  • Kim, Jaeoh;Cho, HyungJun;Bang, Sungwan
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.7
    • /
    • pp.1361-1371
    • /
    • 2016
  • Quantile regression provides a variety of useful statistical information to examine how covariates influence the conditional quantile functions of a response variable. However, traditional quantile regression (which assume a linear model) is not appropriate when the relationship between the response and the covariates is a nonlinear. It is also necessary to conduct variable selection for high dimensional data or strongly correlated covariates. In this paper, we propose a penalized quantile regression tree model. The split rule of the proposed method is based on residual analysis, which has a negligible bias to select a split variable and reasonable computational cost. A simulation study and real data analysis are presented to demonstrate the satisfactory performance and usefulness of the proposed method.

Multivariate quantile regression tree (다변량 분위수 회귀나무 모형에 대한 연구)

  • Kim, Jaeoh;Cho, HyungJun;Bang, Sungwan
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.3
    • /
    • pp.533-545
    • /
    • 2017
  • Quantile regression models provide a variety of useful statistical information by estimating the conditional quantile function of the response variable. However, the traditional linear quantile regression model can lead to the distorted and incorrect results when analysing real data having a nonlinear relationship between the explanatory variables and the response variables. Furthermore, as the complexity of the data increases, it is required to analyse multiple response variables simultaneously with more sophisticated interpretations. For such reasons, we propose a multivariate quantile regression tree model. In this paper, a new split variable selection algorithm is suggested for a multivariate regression tree model. This algorithm can select the split variable more accurately than the previous method without significant selection bias. We investigate the performance of our proposed method with both simulation and real data studies.

Panel data analysis with regression trees (회귀나무 모형을 이용한 패널데이터 분석)

  • Chang, Youngjae
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.6
    • /
    • pp.1253-1262
    • /
    • 2014
  • Regression tree is a tree-structured solution in which a simple regression model is fitted to the data in each node made by recursive partitioning of predictor space. There have been many efforts to apply tree algorithms to various regression problems like logistic regression and quantile regression. Recently, algorithms have been expanded to the panel data analysis such as RE-EM algorithm by Sela and Simonoff (2012), and extension of GUIDE by Loh and Zheng (2013). The algorithms are briefly introduced and prediction accuracy of three methods are compared in this paper. In general, RE-EM shows good prediction accuracy with least MSE's in the simulation study. A RE-EM tree fitted to business survey index (BSI) panel data shows that sales BSI is the main factor which affects business entrepreneurs' economic sentiment. The economic sentiment BSI of non-manufacturing industries is higher than that of manufacturing ones among the relatively high sales group.

Particulate Matter Prediction using Quantile Boosting (분위수 부스팅을 이용한 미세먼지 농도 예측)

  • Kwon, Jun-Hyeon;Lim, Yaeji;Oh, Hee-Seok
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.1
    • /
    • pp.83-92
    • /
    • 2015
  • Concerning the national health, it is important to develop an accurate prediction method of atmospheric particulate matter (PM) because being exposed to such fine dust can trigger not only respiratory diseases as well as dermatoses, ophthalmopathies and cardiovascular diseases. The National Institute of Environmental Research (NIER) employs a decision tree to predict bad weather days with a high PM concentration. However, the decision tree method (even with the inherent unstableness) cannot be a suitable model to predict bad weather days which represent only 4% of the entire data. In this paper, while presenting the inaccuracy and inappropriateness of the method used by the NIER, we present the utility of a new prediction model which adopts boosting with quantile loss functions. We evaluate the performance of the new method over various ${\tau}$-value's and justify the proposed method through comparison.