Browse > Article
http://dx.doi.org/10.5351/KJAS.2006.19.1.149

Ordinal Variable Selection in Decision Trees  

Kim Hyun-Joong (Department of Applied Statistics, Yonsei University)
Publication Information
The Korean Journal of Applied Statistics / v.19, no.1, 2006 , pp. 149-161 More about this Journal
Abstract
The most important component in decision tree algorithm is the rule for split variable selection. Many earlier algorithms such as CART and C4.5 use greedy search algorithm for variable selection. Recently, many methods were developed to cope with the weakness of greedy search algorithm. Most algorithms have different selection criteria depending on the type of variables: continuous or nominal. However, ordinal type variables are usually treated as continuous ones. This approach did not cause any trouble for the methods using greedy search algorithm. However, it may cause problems for the newer algorithms because they use statistical methods valid for continuous or nominal types only. In this paper, we propose a ordinal variable selection method that uses Cramer-von Mises testing procedure. We performed comparisons among CART, C4.5, QUEST, CRUISE, and the new method. It was shown that the new method has a good variable selection power for ordinal type variables.
Keywords
Decision Trees; Nonparametric statistics; Clamor-von Mises test; Ordinal variable; CART;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Fisz, M. (1960). On a result by M. Rosenblatt concerning the von Mises-Smirnov test. The Annals of Mathematical Statistics, 31: 427-429   DOI
2 Kim, H. and Loh, W.-Y. (2001). Classification trees with unbiased multiway splits, Journal of the American Statistical Association, 96: 589-604   DOI   ScienceOn
3 Kim, H. and Loh, W.-Y. (2003). Classification trees with bivariate linear discriminant node models. Journal of Computational and Graphical Statistics, 12: 512-530   DOI   ScienceOn
4 Liu, W. Z. and White, A. P. (1994). The importance of attribute-selection measures in decision tree induction, Machine Learning, 15: 25-41
5 Loh, W.-Y. and Shih, Y.-S. (1997). Split selection methods for classification trees, Statistica Sinica, 7: 815-840
6 Martin, J. K. (1997). An exact probability metric for decision tree splitting and stopping, Machine Learning, 28: 257 - 297   DOI
7 Quinlan, J. (1993). C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo
8 White, A. P. and Liu, W. Z. (1994). Bias in information-based measures in decision tree induction, Machine Learning, 15: 321-329
9 Burr, E. J. (1964). Small-sample distributions of the two-sample Cramer-von Mises' W2 and Watson's U2. The Annals of Mathematical Statistics, 35: 1091-1098   DOI
10 Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees, Chapman & Hall, New York
11 Kass, G. V. (1975). Significance testing in automatic interaction detection (A.I.D), Journal of Applied Statistics, 24: 178-189   DOI   ScienceOn
12 Kass, G. V. (1980). An exploratory technique for investigating large quantities of categorical data, Journal of Applied Statistics, 29: 119-127   DOI   ScienceOn