Browse > Article
http://dx.doi.org/10.5351/KJAS.2005.18.2.343

An Empirical Comparison of Bagging, Boosting and Support Vector Machine Classifiers in Data Mining  

Lee Yung-Seop (Dept. of Statistics, Dongguk University)
Oh Hyun-Joung (DNI consulting)
Kim Mee-Kyung (Dept. of Statistics, Dongguk University)
Publication Information
The Korean Journal of Applied Statistics / v.18, no.2, 2005 , pp. 343-354 More about this Journal
Abstract
The goal of this paper is to compare classification performances and to find a better classifier based on the characteristics of data. The compared methods are CART with two ensemble algorithms, bagging or boosting and SVM. In the empirical study of twenty-eight data sets, we found that SVM has smaller error rate than the other methods in most of data sets. When comparing bagging, boosting and SVM based on the characteristics of data, SVM algorithm is suitable to the data with small numbers of observation and no missing values. On the other hand, boosting algorithm is suitable to the data with number of observation and bagging algorithm is suitable to the data with missing values.
Keywords
Bagging; Boosting; Data mining; Decision trees; Empirical comparison; CART; SVM;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Efron, B. and Tibshirani, R (1993). An Introduction to the Bootstrap, Chapman and Hall
2 Kass, G.V. (1980). An exploratory technique for investing large quantities of categorical data, Applied Statistics, 119-127
3 Kearns M. and Valiant. L,G (1994). Cryptographic limitations on learning boolean formulae and finite automata, Joural of the Association for Computing Machinery, 41, 67-95   DOI   ScienceOn
4 Optiz D. and Maclin R.A.(1999). Popular ensemble methods: An empirical study, Journal of Artificial Intelligence Research, 11, 169-198
5 Platt J. and Cristianini, N. and Shawe-Taylor, J.(2000). Large margin DAGs for multiclass classification, Advances in Neural Information Processing Systems, 12, 547-553
6 Quinlan, J.R (1993), C4.5, Programs for Machined Learning, Morgan Kaufmann, San Mateo
7 Saunders, C. (1998), Support vector machine user manual, RHUL Technical Report
8 Schapire, R (1990). The strength of weak learnability, Machine Learning, 5, 197-227
9 Weston, J. and Watkins C. (1998). Multi-class support vector machines, Technical Report CSD-TR-98-04, Royal Holloway
10 Valiant, L.C. (1984). A theory of the learnable, Communication of the ACM, 27, 1134-1142   DOI   ScienceOn
11 Vapnik, V. (1979). Estimation of Dependences Based on Empirical Data, Nauka. (English translation Springer Verlag, 1982)
12 Vapnik, V. (1995). The Nature of Statistical Learning Theory, Springer Verlag
13 김현중 (2004). Support Vector Machine 의 이론과 용용, 한국통계학회, <추계 학술발표회논문집>, 1-1
14 이영섭, 오현정 (2003). 데이터 마이닝에서 배깅과 부스팅 알고리즘 비교 분석, 한국통계학회, <춘계 학술발표회 논문집 >, 97-102
15 Burger, C. J. C(1998). A tutorial on support vector machines for pattern recognition, Bell Laboratories, Lucent Technoloties
16 Blake, C.L. and Merz, C.J. (1998). UCI Repository of machine learning databases [http:// www.ics.uci.edu/ mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science
17 Breiman, L.(1996). Bagging predictor, Machine Learning, 26, 123-140
18 Breiman, L., Friedman, J. H. and Olshen, R A. and Stone C. J. (1984). Classification and Regression Trees, Chapman and Hall
19 Cristianini, N. and Shawe-Taylor, J.(2000). Support Vector Machines and Other Kernel-based Learning Methods, Cambridge University Press
20 Freund, Y. (1995). Boosting a weak learning algorithm by majority, Information and Computation, 121, 256-285   DOI   ScienceOn