Estimating Prediction Errors in Binary Classification Problem: Cross-Validation versus Bootstrap

Kim Ji-Hyun;Cha Eun-Song;

doi:10.5351/CKSS.2006.13.1.151

Communications for Statistical Applications and Methods

Volume 13 Issue 1
/
Pages.151-165
/
2006
/
2287-7843(pISSN)
/
2383-4757(eISSN)

The Korean Statistical Society (한국통계학회)

DOI QR Code

Estimating Prediction Errors in Binary Classification Problem: Cross-Validation versus Bootstrap

Kim Ji-Hyun (Dept. of Statistics, Soongsil University) ;
Cha Eun-Song (Dept. of Statistics, Soongsil University)

Published : 2006.04.01

https://doi.org/10.5351/CKSS.2006.13.1.151 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

It is important to estimate the true misclassification rate of a given classifier when an independent set of test data is not available. Cross-validation and bootstrap are two possible approaches in this case. In related literature bootstrap estimators of the true misclassification rate were asserted to have better performance for small samples than cross-validation estimators. We compare the two estimators empirically when the classification rule is so adaptive to training data that its apparent misclassification rate is close to zero. We confirm that bootstrap estimators have better performance for small samples because of small variance, and we have found a new fact that their bias tends to be significant even for moderate to large samples, in which case cross-validation estimators have better performance with less computation.

Keywords

References

Cha, E.S. (2005). 예측오차 추정방법에 대한 비교연구, 석사학위논문, 숭실대학교
Bauer, E. and Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, Vol. 36, 105-139 https://doi.org/10.1023/A:1007515423169
Blake, C.L. and Merz, C.J. (1998). UCI Repository of machine learning databases. University of California in Irvine, Department of Information and Computer Science
Braga-Neto, U.M. and Dougherty, E.R. (2004). Is cross-validation valid for small-sample microarray classification? Bioinformatics, Vol. 20, 374-380 https://doi.org/10.1093/bioinformatics/btg419
Crawford, S.L. (1989). Extensions to the CART algorithm, Intemational Journal of Man-Machine Studies, Vol. 31, 197-217 https://doi.org/10.1016/0020-7373(89)90027-8
Efron, B. (1983). Estimating the error rate of a prediction rule: Improvement on cross -validation. Journal of the American Statistical Association, Vol. 78, 316-331 https://doi.org/10.2307/2288636
Efron, B. and Tibshirani, R. (1993). An Introduction to the Bootstrap, Chapman and Hall
Efron, B. and Tibshirani, R. (1997), Improvements on cross-validation: The 632+ bootstrap method. Journal of the American Statistical Association, Vol. 92. 548-560 https://doi.org/10.2307/2965703
Freund, Y. and Schapire, R. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, Vol. 55, 119-139 https://doi.org/10.1006/jcss.1997.1504
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. Technical Report, Stanford University, Department of Computer Sciences
Merler, S. and Furlanello, C. (1997). Selection of tree-based classifiers with the bootstrap 632+ rule. RIST Technical Report: TR-9605-01, revised Jan 97
R Development Core Team (2004). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. Available from http://www.R-project.org
Themeau, T.M. and Atkinson, E.J. (1997). An introduction to recursive partitioning using the RPART routines. Technical Report, Mayo Foundation

Cited by

Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap vol.53, pp.11, 2009, https://doi.org/10.1016/j.csda.2009.04.009

Communications for Statistical Applications and Methods

Estimating Prediction Errors in Binary Classification Problem: Cross-Validation versus Bootstrap

Abstract

Keywords

References

Cited by

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)