Variable Selection with Regression Trees

Chang, Young-Jae;

doi:10.5351/KJAS.2010.23.2.357

The Korean Journal of Applied Statistics (응용통계연구)

Volume 23 Issue 2
/
Pages.357-366
/
2010
/
1225-066X(pISSN)
/
2383-5818(eISSN)

The Korean Statistical Society (한국통계학회)

DOI QR Code

Variable Selection with Regression Trees

Chang, Young-Jae (Research Department, The Bank of Korea)

Received : 20100100
Accepted : 20100200
Published : 2010.04.30

https://doi.org/10.5351/KJAS.2010.23.2.357 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Many tree algorithms have been developed for regression problems. Although they are regarded as good algorithms, most of them suffer from loss of prediction accuracy when there are many noise variables. To handle this problem, we propose the multi-step GUIDE, which is a regression tree algorithm with a variable selection process. The multi-step GUIDE performs better than some of the well-known algorithms such as Random Forest and MARS. The results based on simulation study shows that the multi-step GUIDE outperforms other algorithms in terms of variable selection and prediction accuracy. It generally selects the important variables correctly with relatively few noise variables and eventually gives good prediction accuracy.

Keywords

References

Belsley, D. A. (1980). On the efficient computation of the nonlinear full-information maximum-likelihood estimator, Journal of Econometrics, 14, 203-225. https://doi.org/10.1016/0304-4076(80)90091-3
Breiman, L. (2001). Random Forests, Machine Learning, 45, 5-32. https://doi.org/10.1023/A:1010933404324
Chattopadhyay, S. (2003). Divergence Between the Hicksian Welfare Measures: The Case of Revealed Preference for Public Amenities, Journal of Applied Econometrics, 17, 641-66.
Cook, D. and Weisberg, S. (1994). An introduction to Regression Graphics, Wiley, New York.
Denman, N. and Gregory, D. (1998). Analysis of sugar cane yields in the mulgrave area, for the 1997 sugar cane season, Technical report, MS305 Data Analysis Project, Department of Mathematics, University of Queensland.
Doksum, K., Tang, S. and Tsui, K. W. (2006). Nonparametric variable selection: The EARTH algorithm, Journal of the American Statistical Association, 103, 1609-1620. https://doi.org/10.1198/016214508000000878
Friedman, J. H. (1991). Multivariate adaptive regression splines, Annals of Statistics, 19, 1-67. https://doi.org/10.1214/aos/1176347963
Kenkel, D. and Terza, J. (2001). The effect of physician advice on alcohol consumption: countregression with an endogenous treatment effect, Journal of applied econometrics, 16, 165-184. https://doi.org/10.1002/jae.596
Liu, Z. and Stengos, T. (1999). Non-linearities in cross country growth regressions: A semiparametric approach, Journal of Applied Econometrics, 14, 527-538. https://doi.org/10.1002/(SICI)1099-1255(199909/10)14:5<527::AID-JAE528>3.0.CO;2-X
Loh, W. Y. (2002). Regression trees with unbiased variable selection and interaction detection, Statistica Sinica, 12, 361-386.
Onoyama, K., Ohsumi, N., Mitsumochi, N. and Kishihara, T. (1998). Data analysis of deer-train collisions in eastern Hokkaido, Data Science, Classification, and Related Methods (ed. by Hayashi, C., Ohsumi, N., Yajima, K., Tanaka, Y., Bock, H.-H., Baba, Y.), 746-751, Japan. BMC Bioinformatics, 8:25
Svetnik, V., Liaw, A., Tong, C. and Culberson, J. C. (2003). Random forest: A classification and regression tool for compound classification and QSAR modeling, Journal of Chemical Information and Computer Sciences, 43, 1947-1958. https://doi.org/10.1021/ci034160g

Cited by

Multi-Step Classification Trees vol.41, pp.9, 2012, https://doi.org/10.1080/03610918.2011.624238

The Korean Journal of Applied Statistics (응용통계연구)

Variable Selection with Regression Trees

Abstract

Keywords

References

Cited by

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)