DOI QR코드

DOI QR Code

Penalized quantile regression tree

벌점화 분위수 회귀나무모형에 대한 연구

  • Kim, Jaeoh (Department of Statistics, Korea University) ;
  • Cho, HyungJun (Department of Statistics, Korea University) ;
  • Bang, Sungwan (Department of Mathematics, Korea Military Academy)
  • Received : 2016.09.02
  • Accepted : 2016.10.31
  • Published : 2016.12.31

Abstract

Quantile regression provides a variety of useful statistical information to examine how covariates influence the conditional quantile functions of a response variable. However, traditional quantile regression (which assume a linear model) is not appropriate when the relationship between the response and the covariates is a nonlinear. It is also necessary to conduct variable selection for high dimensional data or strongly correlated covariates. In this paper, we propose a penalized quantile regression tree model. The split rule of the proposed method is based on residual analysis, which has a negligible bias to select a split variable and reasonable computational cost. A simulation study and real data analysis are presented to demonstrate the satisfactory performance and usefulness of the proposed method.

분위수 회귀모형은 설명변수가 반응변수의 조건부 분위수 함수에 어떻게 관계되는지 탐색함으로서 많은 유용한 정보를 제공한다. 그러나 설명변수와 반응변수가 비선형 관계를 갖는다면 선형형태를 가정하는 전통적인 분위수 회귀모형은 적합하지 않다. 또한 고차원 자료 또는 설명변수간 상관관계가 높은 자료에 대해서 변수선택의 방법이 필요하다. 이러한 이유로 본 연구에서는 벌점화 분위수 회귀나무모형을 제안하였다. 한편 제안한 방법의 분할규칙은 과도한 계산시간과 분할변수 선택편향 문제를 극복한 잔차 분석을 기반으로 하였다. 본 연구에서는 모의실험과 실증 예제를 통해 제안한 방법의 우수한 성능과 유용성을 확인하였다.

Keywords

References

  1. Breiman, L. (1995). Better subset regression using the nonnegative garrote, Technometrics, 37, 373-384. https://doi.org/10.1080/00401706.1995.10484371
  2. Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984). Classi cation and Regression Trees, Wadsworth, Belmont.
  3. Chang, Y. J. (2014). Multi-step quantile regression tree, Journal of Statistical Computation and Simulation, 84, 663-682. https://doi.org/10.1080/00949655.2012.721886
  4. Chaudhuri, P. and Loh, W. Y. (2002). Nonparametric estimation of conditional quantiles using quantile regression trees, Bernoulli, 8, 561-576.
  5. Eo, S. H. and Cho, H. (2014). Tree-structured mixed-effects regression modeling for longitudinal data, Journal of Computational and Graphical Statistics, 23, 740-760. https://doi.org/10.1080/10618600.2013.794732
  6. Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association, 96, 1348-1360. https://doi.org/10.1198/016214501753382273
  7. Hallin, M., Lu, Z., and Yu, K. (2009). Local linear spatial quantile regression, Bernoulli, 15, 659-686. https://doi.org/10.3150/08-BEJ168
  8. Hothorn, T., Hornik, K., and Zeileis, A. (2006). Unbiased recursive partitioning a conditional inference framework, Journal of Computational and Graphical Statistics, 15, 651-674. https://doi.org/10.1198/106186006X133933
  9. Kim, H., Loh, W. Y., Shih, Y. S., and Chaudhuri, P. (2007). Visualizable and interpretable regression models with good prediction power, IIE Transactions, 39, 565-579. https://doi.org/10.1080/07408170600897502
  10. Kim, S. and Xing, E. P. (2012). Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping, The Annals of Applied Statistics, 6, 1095-1117. https://doi.org/10.1214/12-AOAS549
  11. Koenker, R. (2004). Quantile regression for longitudinal data, Journal of Multivariate Analysis, 91, 74-89. https://doi.org/10.1016/j.jmva.2004.05.006
  12. Koenker, R. (2005). Quantile Regression. Cambridge university press, New York.
  13. Koenker, R. and Bassett, Jr, G. (1978). Regression quantiles, Econometrica: Journal of the Econometric Society, 46, 33-50. https://doi.org/10.2307/1913643
  14. Koenker R. and Mizera, I. (2004). Penalized triograms: total variation regularization for bivariate smoothing, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 66, 145-163. https://doi.org/10.1111/j.1467-9868.2004.00437.x
  15. Li, Y., Liu, Y., and Zhu, J. (2007). Quantile regression in reproducing kernel Hilbert spaces. Journal of the American Statistical Association, 102, 255-268. https://doi.org/10.1198/016214506000000979
  16. Liu, Y. and Wu, Y. (2011). Simultaneous multiple non-crossing quantile regression estimation using kernel constraints, Journal of Nonparametric Statistics, 23, 415-437. https://doi.org/10.1080/10485252.2010.537336
  17. Loh, W. Y. (2002). Regression trees with unbiased variable selection and interaction detection, Statistica Sinica, 12, 361-386.
  18. Loh, W. Y. (2009). Improving the precision of classification trees, Annals of Applied Statistics, 3, 1710-1737. https://doi.org/10.1214/09-AOAS260
  19. Quinlan, J. R. (1993). C4.5: Programming for Machine Learning, Morgan Kaufmann Publishers Inc., San Francisco.
  20. Shen, X. and Ye, J. (2002). Adaptive model selection, Journal of the American Statistical Association, 97, 210-221. https://doi.org/10.1198/016214502753479356
  21. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B (Methodological), 58, 267-288.
  22. Wu, Y. and Liu, Y. (2009). Variable selection in quantile regression, Statistica Sinica, 19, 801-817.
  23. Yu, K. and Jones, M. C. (1998). Local linear quantile regression, Journal of the American Statistical Association, 93, 228-237. https://doi.org/10.1080/01621459.1998.10474104
  24. Zhang, C. H. (2010). Nearly unbiased variable selection under minimax concave penalty, The Annals of Statistics, 38, 894-942. https://doi.org/10.1214/09-AOS729