Browse > Article
http://dx.doi.org/10.5351/CKSS.2006.13.1.151

Estimating Prediction Errors in Binary Classification Problem: Cross-Validation versus Bootstrap  

Kim Ji-Hyun (Dept. of Statistics, Soongsil University)
Cha Eun-Song (Dept. of Statistics, Soongsil University)
Publication Information
Communications for Statistical Applications and Methods / v.13, no.1, 2006 , pp. 151-165 More about this Journal
Abstract
It is important to estimate the true misclassification rate of a given classifier when an independent set of test data is not available. Cross-validation and bootstrap are two possible approaches in this case. In related literature bootstrap estimators of the true misclassification rate were asserted to have better performance for small samples than cross-validation estimators. We compare the two estimators empirically when the classification rule is so adaptive to training data that its apparent misclassification rate is close to zero. We confirm that bootstrap estimators have better performance for small samples because of small variance, and we have found a new fact that their bias tends to be significant even for moderate to large samples, in which case cross-validation estimators have better performance with less computation.
Keywords
Generalization Error; Prediction Accuracy; Classification Tree; Boosting;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Bauer, E. and Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, Vol. 36, 105-139   DOI
2 Merler, S. and Furlanello, C. (1997). Selection of tree-based classifiers with the bootstrap 632+ rule. RIST Technical Report: TR-9605-01, revised Jan 97
3 Efron, B. and Tibshirani, R. (1997), Improvements on cross-validation: The 632+ bootstrap method. Journal of the American Statistical Association, Vol. 92. 548-560   DOI
4 Efron, B. and Tibshirani, R. (1993). An Introduction to the Bootstrap, Chapman and Hall
5 Themeau, T.M. and Atkinson, E.J. (1997). An introduction to recursive partitioning using the RPART routines. Technical Report, Mayo Foundation
6 Cha, E.S. (2005). 예측오차 추정방법에 대한 비교연구, 석사학위논문, 숭실대학교
7 Blake, C.L. and Merz, C.J. (1998). UCI Repository of machine learning databases. University of California in Irvine, Department of Information and Computer Science
8 Braga-Neto, U.M. and Dougherty, E.R. (2004). Is cross-validation valid for small-sample microarray classification? Bioinformatics, Vol. 20, 374-380   DOI   ScienceOn
9 Crawford, S.L. (1989). Extensions to the CART algorithm, Intemational Journal of Man-Machine Studies, Vol. 31, 197-217   DOI   ScienceOn
10 Efron, B. (1983). Estimating the error rate of a prediction rule: Improvement on cross -validation. Journal of the American Statistical Association, Vol. 78, 316-331   DOI
11 Freund, Y. and Schapire, R. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, Vol. 55, 119-139   DOI   ScienceOn
12 Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. Technical Report, Stanford University, Department of Computer Sciences
13 R Development Core Team (2004). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. Available from http://www.R-project.org