Browse > Article
http://dx.doi.org/10.5351/CKSS.2010.17.4.561

A Study for Improving the Performance of Data Mining Using Ensemble Techniques  

Jung, Yon-Hae (Department of Statistics, Korea University)
Eo, Soo-Heang (Department of Statistics, Korea University)
Moon, Ho-Seok (Department of Computer and Information, Korea Military Academy)
Cho, Hyung-Jun (Department of Statistics, Korea University)
Publication Information
Communications for Statistical Applications and Methods / v.17, no.4, 2010 , pp. 561-574 More about this Journal
Abstract
We studied the performance of 8 data mining algorithms including decision trees, logistic regression, LDA, QDA, Neral network, and SVM and their combinations of 2 ensemble techniques, bagging and boosting. In this study, we utilized 13 data sets with binary responses. Sensitivity, Specificity and missclassificate error were used as criteria for comparison.
Keywords
Ensemble; bagging; boosting; data mining;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Schapire, R. E. and Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions, Machine Learning, 37, 297-336.   DOI
2 Valiant, L. G. (1984). A theory of the learnable, Communication of the ACM, 27, 1134-1142.   DOI   ScienceOn
3 Vapnik, V. (1979). Estimation of Dependences Based on Empirical Data, Nauka, Moscow.
4 Wolpert, D. (1992). Stacked generalization, Neural Network, 5, 241-259.   DOI   ScienceOn
5 Frank, A. and Asuncion, A. (2010). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
6 Freund, Y. (1995). Boosting a weak learning algorithm by majority, Information and Computation, 121, 256-285.   DOI   ScienceOn
7 Freund, Y. and Schapire, R. (1996). Experiments with a new boosting algorithm, Proceedings of the Thirteenth International Conference on Machine Learning, 148-156.
8 Kass, G. V. (1980). An Exploratory Technique for Investigating Large Quantities of Categorical Data. Journal of the Royal Statistical Society. Series C (Applied Statistics), 29, 119-127.
9 Kearns, M. and Valiant, L. G. (1994). Cryptographic limitations on learning Boolean formulae and finite automata, Journal of the Association for Computing Machinery, 41, 67-95.   DOI   ScienceOn
10 Kim, H. J. and Loh, W. Y. (2001). Classification trees with unbiased multiway splits, Journal of the American Statistical Association, 96, 598-604.
11 Loh, W. Y. and Shih, Y. S. (1997). Split selection method for classification trees, Statistica Sinica, 7, 815-840.
12 Opitz, D. and Maclin, R. (1999). Popular ensemble methods: An empirical study, Journal of the Artificial Intelligence Research, 11, 169-198.
13 Schapire, R. E. (1990). The strength of weak learnability, Machine Learning, 5, 197-227.
14 Perrone, M. (1993). Improving regression estimation: Averaging methods for variance reduction with extensions to general convex measure optimization, Doctoral dissertation, Department of Physics, Brown University.
15 Quinlan, J. R. (1992). C4.5 : Programming with Machine Learning, Morgan Kaufmann Publishers.
16 Quinlan, J. R. (1996). Bagging, boosting, and C4.5, Proceedings of the Fourteenth National Conference on Machine Learning, 725-730.
17 김규곤 (2003). 데이터 마이닝에서 분류방법에 관한 연구, Journal of the Korean Data Analysis Society, 5, 101-112.
18 김기영, 전명식 (1994). <다변량 통계자료분석>, 자유아카데미, 서울.
19 이영섭, 오현정, 김미경 (2005). 데이터 마이닝에서 배깅, 부스팅, SVM 분류 알고리즘 비교 분석, <응용통계연구>, 18, 343-354.   과학기술학회마을   DOI   ScienceOn
20 허면회, 서혜선 (2001). , 자유아카데미, 서울.
21 Bauer, E. and Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, Boosting and variants, Machine Learning, 36, 105-139.   DOI
22 Breiman, L. (1996). Bagging predictors, Machine Learning, 26, 123-140.
23 Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees, Chapman & Hall, New York.
24 Efron, B. and Tibshirani, R. (1993). An Introduction to the Bootstrap, Chapman & Hall, New York.
25 Clemen, R. (1989). Combining forecasts: A review and annotated bibliography, Journal of Forecasting, 5, 559-583.   DOI   ScienceOn
26 Drucker, H. and Cortes, C. (1996). Boosting decision trees, Neural Information Processing Systems, 8, 470-485.
27 Druker, H., Schapire, R. and Simard, P. (1993). Boosting performance in neural networks, International Journal of Pattern Recognition and Artificial Intelligence, 7, 705-719.   DOI   ScienceOn