[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.5351/KJAS.2005.18.2.343

An Empirical Comparison of Bagging, Boosting and Support Vector Machine Classifiers in Data Mining

Lee Yung-Seop (Dept. of Statistics, Dongguk University)
Oh Hyun-Joung (DNI consulting)
Kim Mee-Kyung (Dept. of Statistics, Dongguk University)

Publication Information

The Korean Journal of Applied Statistics / v.18, no.2, 2005 , pp. 343-354 More about this Journal

Abstract

The goal of this paper is to compare classification performances and to find a better classifier based on the characteristics of data. The compared methods are CART with two ensemble algorithms, bagging or boosting and SVM. In the empirical study of twenty-eight data sets, we found that SVM has smaller error rate than the other methods in most of data sets. When comparing bagging, boosting and SVM based on the characteristics of data, SVM algorithm is suitable to the data with small numbers of observation and no missing values. On the other hand, boosting algorithm is suitable to the data with number of observation and bagging algorithm is suitable to the data with missing values.

Keywords

Bagging; Boosting; Data mining; Decision trees; Empirical comparison; CART; SVM;

Citations & Related Records

Reference

1	Efron, B. and Tibshirani, R (1993). An Introduction to the Bootstrap, Chapman and Hall
2	Kass, G.V. (1980). An exploratory technique for investing large quantities of categorical data, Applied Statistics, 119-127
3	Kearns M. and Valiant. L,G (1994). Cryptographic limitations on learning boolean formulae and finite automata, Joural of the Association for Computing Machinery, 41, 67-95 DOI ScienceOn
4	Optiz D. and Maclin R.A.(1999). Popular ensemble methods: An empirical study, Journal of Artificial Intelligence Research, 11, 169-198
5	Platt J. and Cristianini, N. and Shawe-Taylor, J.(2000). Large margin DAGs for multiclass classification, Advances in Neural Information Processing Systems, 12, 547-553
6	Quinlan, J.R (1993), C4.5, Programs for Machined Learning, Morgan Kaufmann, San Mateo
7	Saunders, C. (1998), Support vector machine user manual, RHUL Technical Report
8	Schapire, R (1990). The strength of weak learnability, Machine Learning, 5, 197-227
9	Weston, J. and Watkins C. (1998). Multi-class support vector machines, Technical Report CSD-TR-98-04, Royal Holloway
10	Valiant, L.C. (1984). A theory of the learnable, Communication of the ACM, 27, 1134-1142 DOI ScienceOn
11	Vapnik, V. (1979). Estimation of Dependences Based on Empirical Data, Nauka. (English translation Springer Verlag, 1982)
12	Vapnik, V. (1995). The Nature of Statistical Learning Theory, Springer Verlag
13	김현중 (2004). Support Vector Machine 의 이론과 용용, 한국통계학회, <추계 학술발표회논문집>, 1-1
14	이영섭, 오현정 (2003). 데이터 마이닝에서 배깅과 부스팅 알고리즘 비교 분석, 한국통계학회, <춘계 학술발표회 논문집 >, 97-102
15	Burger, C. J. C(1998). A tutorial on support vector machines for pattern recognition, Bell Laboratories, Lucent Technoloties
16	Blake, C.L. and Merz, C.J. (1998). UCI Repository of machine learning databases [http:// www.ics.uci.edu/ mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science
17	Breiman, L.(1996). Bagging predictor, Machine Learning, 26, 123-140
18	Breiman, L., Friedman, J. H. and Olshen, R A. and Stone C. J. (1984). Classification and Regression Trees, Chapman and Hall
19	Cristianini, N. and Shawe-Taylor, J.(2000). Support Vector Machines and Other Kernel-based Learning Methods, Cambridge University Press
20	Freund, Y. (1995). Boosting a weak learning algorithm by majority, Information and Computation, 121, 256-285 DOI ScienceOn

1	A Study for Improving the Performance of Data Mining Using Ensemble Techniques / [Jung, Yon-Hae;Eo, Soo-Heang;Moon, Ho-Seok;Cho, Hyung-Jun;] / Communications for Statistical Applications and Methods
2	A Study on Advanced Bagging Algorithm Using Trimming and Weighting / [Yu, Jung-Yun;Lee, Seong-Keon;] / Journal of the Korean Data Analysis Society

	Yoonseok Shin. (2015) Computational Intelligence and Neuroscience Application of Boosting Regression Trees to Preliminary Cost Estimation in Building Construction Projects / 2015 , 1
3	Yoonseok Shin. (2009) Journal of Computing in Civil Engineering Application of AdaBoost to the Retaining Wall Method Selection in Construction / 23 (3) , 188

KSCI

An Empirical Comparison of Bagging, Boosting and Support Vector Machine Classifiers in Data Mining 데이터 마이닝에서 배깅, 부스팅, SVM 분류 알고리즘 비교 분석

An Empirical Comparison of Bagging, Boosting and Support Vector Machine Classifiers in Data Mining