Browse > Article
http://dx.doi.org/10.5351/KJAS.2004.17.3.459

Regression Trees with. Unbiased Variable Selection  

김진흠 (수원대학교 자연과학대학 통계정보학과)
김민호 (수원대학교 자연과학대학 통계정보학과)
Publication Information
The Korean Journal of Applied Statistics / v.17, no.3, 2004 , pp. 459-473 More about this Journal
Abstract
It has well known that an exhaustive search algorithm suggested by Breiman et. a1.(1984) has a trend to select the variable having relatively many possible splits as an splitting rule. We propose an algorithm to overcome this variable selection bias problem and then construct unbiased regression trees based on the algorithm. The proposed algorithm runs two steps of selecting a split variable and determining a split rule for binary split based on the split variable. Simulation studies were performed to compare the proposed algorithm with Breiman et a1.(1984)'s CART(Classification and Regression Tree) in terms of degree of variable selection bias, variable selection power, and MSE(Mean Squared Error). Also, we illustrate the proposed algorithm with real data sets.
Keywords
CART CART; Kruscal-Wallis test; Regression trees; Spearman's rank correlation coefficient; Variable selection bias; Variable selection power;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Eubank, R. L., Lariccia, V. N., and Rosenstein, R. B. (1987). Test statistics derived as components of Pearson's Phi-squared distance measure, Journal of the American Statistical Association, 82, 816-825   DOI   ScienceOn
2 Kim, G. V. and Loh, W. (2001). Classification trees with unbised multiway splits, Journal of the American Statistical Association, 96, 589-604   DOI   ScienceOn
3 Lee, Y. M. and Song, M. S. (2002). A study on unbiased methods in constructing Classification trees, The Korean Communications in Statistics, 9, 809-824   과학기술학회마을   DOI   ScienceOn
4 Loh, W. (2002). Regression trees with unbiased variable selection and interaction detection, Statistica Sinica, 12, 361-386   ScienceOn
5 Loh, W. and Shih, Y. (1997). Split selection methods for classification trees, Statistica Sinica, 7, 815-840   ScienceOn
6 Loh, W. and Vanichsetakul, N. (1988). Tree-structured classification via generalized discriminant analysis (with discussion), Journal of the American Statistical Association, 83, 715-728   DOI   ScienceOn
7 Randles, R. H. and Wolfe, D. A. (1979). Introduction to The Theory of Nonparametric Statistics, John Wiley and Sons, New York
8 Breiman, L. (1996). Bagging predictors, Machine Learning, 24, 123-140   ScienceOn
9 송문섭, 윤영주 (2001). 데이터마이닝 패키지에서 변수선택 편의에 관한 연구, <응용통계연구>. 14, 475-486
10 이승천, 허문열 (2003). 혼합자료에서 독립성 검정에 의한 연관성 측정, <응용-통계연구>, 16, 151-167   DOI
11 Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. (1984). Classification and Regression Trees, Wadsworth, Belmont