DOI QR코드

DOI QR Code

Estimating Prediction Errors in Binary Classification Problem: Cross-Validation versus Bootstrap

  • Published : 2006.04.01

Abstract

It is important to estimate the true misclassification rate of a given classifier when an independent set of test data is not available. Cross-validation and bootstrap are two possible approaches in this case. In related literature bootstrap estimators of the true misclassification rate were asserted to have better performance for small samples than cross-validation estimators. We compare the two estimators empirically when the classification rule is so adaptive to training data that its apparent misclassification rate is close to zero. We confirm that bootstrap estimators have better performance for small samples because of small variance, and we have found a new fact that their bias tends to be significant even for moderate to large samples, in which case cross-validation estimators have better performance with less computation.

Keywords

References

  1. Cha, E.S. (2005). 예측오차 추정방법에 대한 비교연구, 석사학위논문, 숭실대학교
  2. Bauer, E. and Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, Vol. 36, 105-139 https://doi.org/10.1023/A:1007515423169
  3. Blake, C.L. and Merz, C.J. (1998). UCI Repository of machine learning databases. University of California in Irvine, Department of Information and Computer Science
  4. Braga-Neto, U.M. and Dougherty, E.R. (2004). Is cross-validation valid for small-sample microarray classification? Bioinformatics, Vol. 20, 374-380 https://doi.org/10.1093/bioinformatics/btg419
  5. Crawford, S.L. (1989). Extensions to the CART algorithm, Intemational Journal of Man-Machine Studies, Vol. 31, 197-217 https://doi.org/10.1016/0020-7373(89)90027-8
  6. Efron, B. (1983). Estimating the error rate of a prediction rule: Improvement on cross -validation. Journal of the American Statistical Association, Vol. 78, 316-331 https://doi.org/10.2307/2288636
  7. Efron, B. and Tibshirani, R. (1993). An Introduction to the Bootstrap, Chapman and Hall
  8. Efron, B. and Tibshirani, R. (1997), Improvements on cross-validation: The 632+ bootstrap method. Journal of the American Statistical Association, Vol. 92. 548-560 https://doi.org/10.2307/2965703
  9. Freund, Y. and Schapire, R. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, Vol. 55, 119-139 https://doi.org/10.1006/jcss.1997.1504
  10. Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. Technical Report, Stanford University, Department of Computer Sciences
  11. Merler, S. and Furlanello, C. (1997). Selection of tree-based classifiers with the bootstrap 632+ rule. RIST Technical Report: TR-9605-01, revised Jan 97
  12. R Development Core Team (2004). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. Available from http://www.R-project.org
  13. Themeau, T.M. and Atkinson, E.J. (1997). An introduction to recursive partitioning using the RPART routines. Technical Report, Mayo Foundation

Cited by

  1. Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap vol.53, pp.11, 2009, https://doi.org/10.1016/j.csda.2009.04.009