[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.5351/KJAS.2004.17.3.459

Regression Trees with. Unbiased Variable Selection

김진흠 (수원대학교 자연과학대학 통계정보학과)
김민호 (수원대학교 자연과학대학 통계정보학과)

Publication Information

The Korean Journal of Applied Statistics / v.17, no.3, 2004 , pp. 459-473 More about this Journal

Abstract

It has well known that an exhaustive search algorithm suggested by Breiman et. a1.(1984) has a trend to select the variable having relatively many possible splits as an splitting rule. We propose an algorithm to overcome this variable selection bias problem and then construct unbiased regression trees based on the algorithm. The proposed algorithm runs two steps of selecting a split variable and determining a split rule for binary split based on the split variable. Simulation studies were performed to compare the proposed algorithm with Breiman et a1.(1984)'s CART(Classification and Regression Tree) in terms of degree of variable selection bias, variable selection power, and MSE(Mean Squared Error). Also, we illustrate the proposed algorithm with real data sets.

Keywords

CART CART; Kruscal-Wallis test; Regression trees; Spearman's rank correlation coefficient; Variable selection bias; Variable selection power;

Citations & Related Records

Times Cited By KSCI : 2 (Citation Analysis)

Reference
Cited By KSCI

1	Eubank, R. L., Lariccia, V. N., and Rosenstein, R. B. (1987). Test statistics derived as components of Pearson's Phi-squared distance measure, Journal of the American Statistical Association, 82, 816-825 DOI ScienceOn
2	Kim, G. V. and Loh, W. (2001). Classification trees with unbised multiway splits, Journal of the American Statistical Association, 96, 589-604 DOI ScienceOn
3	Lee, Y. M. and Song, M. S. (2002). A study on unbiased methods in constructing Classification trees, The Korean Communications in Statistics, 9, 809-824 과학기술학회마을 DOI ScienceOn
4	Loh, W. (2002). Regression trees with unbiased variable selection and interaction detection, Statistica Sinica, 12, 361-386 ScienceOn
5	Loh, W. and Shih, Y. (1997). Split selection methods for classification trees, Statistica Sinica, 7, 815-840 ScienceOn
6	Loh, W. and Vanichsetakul, N. (1988). Tree-structured classification via generalized discriminant analysis (with discussion), Journal of the American Statistical Association, 83, 715-728 DOI ScienceOn
7	Randles, R. H. and Wolfe, D. A. (1979). Introduction to The Theory of Nonparametric Statistics, John Wiley and Sons, New York
8	Breiman, L. (1996). Bagging predictors, Machine Learning, 24, 123-140 ScienceOn
9	송문섭, 윤영주 (2001). 데이터마이닝 패키지에서 변수선택 편의에 관한 연구, <응용통계연구>. 14, 475-486
10	이승천, 허문열 (2003). 혼합자료에서 독립성 검정에 의한 연관성 측정, <응용-통계연구>, 16, 151-167 DOI
11	Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. (1984). Classification and Regression Trees, Wadsworth, Belmont

KSCI

Regression Trees with. Unbiased Variable Selection 변수선택 편향이 없는 회귀나무를 만들기 위한 알고리즘

Regression Trees with. Unbiased Variable Selection