Browse > Article
http://dx.doi.org/10.5351/KJAS.2010.23.2.357

Variable Selection with Regression Trees  

Chang, Young-Jae (Research Department, The Bank of Korea)
Publication Information
The Korean Journal of Applied Statistics / v.23, no.2, 2010 , pp. 357-366 More about this Journal
Abstract
Many tree algorithms have been developed for regression problems. Although they are regarded as good algorithms, most of them suffer from loss of prediction accuracy when there are many noise variables. To handle this problem, we propose the multi-step GUIDE, which is a regression tree algorithm with a variable selection process. The multi-step GUIDE performs better than some of the well-known algorithms such as Random Forest and MARS. The results based on simulation study shows that the multi-step GUIDE outperforms other algorithms in terms of variable selection and prediction accuracy. It generally selects the important variables correctly with relatively few noise variables and eventually gives good prediction accuracy.
Keywords
Regression tree; random forest; variable selection; bagging;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Belsley, D. A. (1980). On the efficient computation of the nonlinear full-information maximum-likelihood estimator, Journal of Econometrics, 14, 203-225.   DOI   ScienceOn
2 Breiman, L. (2001). Random Forests, Machine Learning, 45, 5-32.   DOI
3 Chattopadhyay, S. (2003). Divergence Between the Hicksian Welfare Measures: The Case of Revealed Preference for Public Amenities, Journal of Applied Econometrics, 17, 641-66.
4 Cook, D. and Weisberg, S. (1994). An introduction to Regression Graphics, Wiley, New York.
5 Denman, N. and Gregory, D. (1998). Analysis of sugar cane yields in the mulgrave area, for the 1997 sugar cane season, Technical report, MS305 Data Analysis Project, Department of Mathematics, University of Queensland.
6 Doksum, K., Tang, S. and Tsui, K. W. (2006). Nonparametric variable selection: The EARTH algorithm, Journal of the American Statistical Association, 103, 1609-1620.   DOI   ScienceOn
7 Friedman, J. H. (1991). Multivariate adaptive regression splines, Annals of Statistics, 19, 1-67.   DOI
8 Kenkel, D. and Terza, J. (2001). The effect of physician advice on alcohol consumption: countregression with an endogenous treatment effect, Journal of applied econometrics, 16, 165-184.   DOI   ScienceOn
9 Liu, Z. and Stengos, T. (1999). Non-linearities in cross country growth regressions: A semiparametric approach, Journal of Applied Econometrics, 14, 527-538.   DOI   ScienceOn
10 Loh, W. Y. (2002). Regression trees with unbiased variable selection and interaction detection, Statistica Sinica, 12, 361-386.
11 Svetnik, V., Liaw, A., Tong, C. and Culberson, J. C. (2003). Random forest: A classification and regression tool for compound classification and QSAR modeling, Journal of Chemical Information and Computer Sciences, 43, 1947-1958.   DOI   ScienceOn
12 Onoyama, K., Ohsumi, N., Mitsumochi, N. and Kishihara, T. (1998). Data analysis of deer-train collisions in eastern Hokkaido, Data Science, Classification, and Related Methods (ed. by Hayashi, C., Ohsumi, N., Yajima, K., Tanaka, Y., Bock, H.-H., Baba, Y.), 746-751, Japan. BMC Bioinformatics, 8:25