DOI QR코드

DOI QR Code

A Study on Unbiased Methods in Constructing Classification Trees

  • Published : 2002.12.01

Abstract

we propose two methods which separate the variable selection step and the split-point selection step. We call these two algorithms as CHITES method and F&CHITES method. They adapted some of the best characteristics of CART, CHAID, and QUEST. In the first step the variable, which is most significant to predict the target class values, is selected. In the second step, the exhaustive search method is applied to find the splitting point based on the selected variable in the first step. We compared the proposed methods, CART, and QUEST in terms of variable selection bias and power, error rates, and training times. The proposed methods are not only unbiased in the null case, but also powerful for selecting correct variables in non-null cases.

Keywords

References

  1. UCI repository of machine learning databases Blake, C.L.;Merz, C.J.
  2. Classification and Regression Trees Breiman, L.;Fredman, J.H.;Olshen. R.A.;Stone, C.J.
  3. Applied Statistics v.29 An Exploratory technique for investigating large quantities of categorical data Kass, G.V. https://doi.org/10.2307/2986296
  4. Journal of the American statistical Association v.96 Classification trees with unbiased multiway splits Kim, H.;loh, W.Y. https://doi.org/10.1198/016214501753168271
  5. Ph.D. Thesis A Study on Bias Problems in Constructing Classification Trees Lee, Y.M.
  6. Statistica Sinica v.7 Split selection methods for classification trees Loh, W.Y.;Shih, Y.S.
  7. Journal of the American Statistical Association v.83 Tree-structrued classification via generalized discriminant analysis Loh, W.Y.;Vanichsetakul, N. https://doi.org/10.2307/2289295
  8. C4.5 : Programs for Machine Learning Quinlan, J.R.
  9. Proceeding of the Tenth Japan and Korea Joint Conference of Statistics A comparable study on variable selection methods in data mining software packages Song, m.S.;Yoon, Y.J.

Cited by

  1. Bias Reduction in Split Variable Selection in C4.5 vol.10, pp.3, 2003, https://doi.org/10.5351/CKSS.2003.10.3.627
  2. Input Variable Importance in Supervised Learning Models vol.10, pp.1, 2003, https://doi.org/10.5351/CKSS.2003.10.1.239
  3. An unbiased method for constructing multilabel classification trees vol.47, pp.1, 2004, https://doi.org/10.1016/j.csda.2003.10.009