DOI QR코드

DOI QR Code

Multivariate quantile regression tree

다변량 분위수 회귀나무 모형에 대한 연구

  • Kim, Jaeoh (Department of Statistics, Korea University) ;
  • Cho, HyungJun (Department of Statistics, Korea University) ;
  • Bang, Sungwan (Department of Mathematics, Korea Military Academy)
  • Received : 2017.04.18
  • Accepted : 2017.05.27
  • Published : 2017.05.31

Abstract

Quantile regression models provide a variety of useful statistical information by estimating the conditional quantile function of the response variable. However, the traditional linear quantile regression model can lead to the distorted and incorrect results when analysing real data having a nonlinear relationship between the explanatory variables and the response variables. Furthermore, as the complexity of the data increases, it is required to analyse multiple response variables simultaneously with more sophisticated interpretations. For such reasons, we propose a multivariate quantile regression tree model. In this paper, a new split variable selection algorithm is suggested for a multivariate regression tree model. This algorithm can select the split variable more accurately than the previous method without significant selection bias. We investigate the performance of our proposed method with both simulation and real data studies.

분위수 회귀모형은 반응변수의 조건부 분포에 대하여 포괄적이고 유용한 통계적 정보를 제공한다. 그러나 많은 실제 자료는 설명변수와 반응변수가 비선형의 관계를 갖고 있어 전통적인 선형 분위수 회귀모형은 왜곡되고 잘못된 결과를 초래할 수 있다. 또한 자료의 복잡성이 증가하여 반응변수가 여러개인 다변량 자료의 분석에 대한 보다 정확한 예측과 더불어 풍부한 해석에 대한 요구가 증가하고 있다. 이러한 이유로 본 연구에서는 다변량 분위수 회귀나무 모형을 제안하였다. 본 연구에서는 기존의 다변량 회귀나무 모형의 분할변수 선택 알고리즘의 문제점을 지적하고 향상된 분할변수 선택 알고리즘을 제안하였다. 제안한 알고리즘은 합리적인 계산시간으로 적용 가능하며 분할변수 선택에서 편향 발생의 문제를 갖지 않는 동시에 기존 방법보다 더 정확하게 분할변수를 선택할 수 있있다. 본 연구에서는 모의실험과 실증 예제를 통해 제안한 방법의 우수한 성능과 유용성을 확인하였다.

Keywords

Acknowledgement

Supported by : 한국연구재단

References

  1. Asuncion, A. and Newman, D. (2007). UCI machine learning repository, Available at ttp://www.ics.uci.edu/-mlearn/MLRepository.tml.
  2. Breiman, L., Friedman, J., Stone, C. and Olshen, R. (1984). Classification and regression trees, Chapman & Hall/CRC, Belmont, CA.
  3. Chaudhuri, P. and Loh, W. Y. (2002). Nonparametric estimation of conditional quantiles using quantile regression trees. Bernoulli , 8, 561-576.
  4. De'ath, G. (2012). Mvpart: multivariate partitioning. R package version 1.6-0.
  5. Eo, SH. and Cho, H. (2014). Tree-structured mixed-effects regression modeling for longitudinal data. Journal of Computational and Graphical Statistics, 23, 740-760. https://doi.org/10.1080/10618600.2013.794732
  6. Hallin, M., Lu, Z. and Yu, K. (2009). Local linear spatial quantile regression. Bernoulli, 15, 659-686. https://doi.org/10.3150/08-BEJ168
  7. Richey, J. and Jung, K. H. (2014). Intergenerational economic mobility in Korea using a quantile regression analysis. Journal of the Korean Data & Information Science Society, 25, 715-725. https://doi.org/10.7465/jkdi.2014.25.4.715
  8. Kim, H. and Loh, W. Y. (2011). Classification trees with unbiased multiway splits. Journal of the American Statistical Association, 96, 589-604.
  9. Koenker, R. (2005). Quantile regression, Cambridge University Press, Cambridge, UK.
  10. Koenker, R. and Bassett Jr, G. (1978). Regression quantiles. Econometrica, 46, 33-50. https://doi.org/10.2307/1913643
  11. Koenker R. and Mizera I. (2004). Penalized triograms: total variation regularization for bivariate smoothing. Journal of the Royal Statistical Society B, 66, 145-163. https://doi.org/10.1111/j.1467-9868.2004.00437.x
  12. Li, Y., Liu, Y. and Zhu, J. (2007). Quantile regression in reproducing kernel hilbert spaces. Journal of the American Statistical Association, 102, 255-268. https://doi.org/10.1198/016214506000000979
  13. Liu Y. and Wu Y. (2011). Simultaneous multiple non-crossing quantile regression estimation using kernel constraints. Journal of Nonparametric statistics, 23, 415-437. https://doi.org/10.1080/10485252.2010.537336
  14. Loh, W. Y. (2002). Regression tress with unbiased variable selection and interaction detection. Statistica Sinica, 12, 361-386.
  15. Loh, W. Y. (2009). Improving the precision of classification trees. The Annals of Applied Statistics, 3, 1710-1737. https://doi.org/10.1214/09-AOAS260
  16. Loh, W. Y. and Wei, Z. (2013). Regression trees for longitudinal and multiresponse data. The Annals of Applied Statistics, 7, 495-522. https://doi.org/10.1214/12-AOAS596
  17. Loh, W. Y. and Vanichsetakul, N. (1988). Tree-structured classification via generalized discriminant analysis. Journal of the American Statistical Association, 83, 715-725. https://doi.org/10.1080/01621459.1988.10478652
  18. Segal, M. (1992). Tree-structured methods for longitudinal data. Journal of the American Statistical Association, 87, 407-418. https://doi.org/10.1080/01621459.1992.10475220
  19. Shim, J. Y. and Hwang, C. H. (2012). M-quantile kernel regression for small area estimation. Journal of the Korean Data & Information Science Society, 23, 749-756. https://doi.org/10.7465/jkdi.2012.23.4.749
  20. Yu, K. and Jones, M. (1998). Local linear quantile regression. Journal of the American Statistical Association, 93, 228-237. https://doi.org/10.1080/01621459.1998.10474104
  21. Zhang, H. (1998). Classification trees for multiple binary responses. Journal of the American Statistical Association, 93, 180-193. https://doi.org/10.1080/01621459.1998.10474100
  22. Zhang, H. and Ye, Y. (2008). A tree-based method for modeling a multivariate ordinal response. Statistics and its Interface, 1, 169. https://doi.org/10.4310/SII.2008.v1.n1.a14