Browse > Article

Optimal Selection of Classifier Ensemble Using Genetic Algorithms  

Kim, Myung-Jong (부산대학교 경영학과)
Publication Information
Journal of Intelligence and Information Systems / v.16, no.4, 2010 , pp. 99-112 More about this Journal
Abstract
Ensemble learning is a method for improving the performance of classification and prediction algorithms. It is a method for finding a highly accurateclassifier on the training set by constructing and combining an ensemble of weak classifiers, each of which needs only to be moderately accurate on the training set. Ensemble learning has received considerable attention from machine learning and artificial intelligence fields because of its remarkable performance improvement and flexible integration with the traditional learning algorithms such as decision tree (DT), neural networks (NN), and SVM, etc. In those researches, all of DT ensemble studies have demonstrated impressive improvements in the generalization behavior of DT, while NN and SVM ensemble studies have not shown remarkable performance as shown in DT ensembles. Recently, several works have reported that the performance of ensemble can be degraded where multiple classifiers of an ensemble are highly correlated with, and thereby result in multicollinearity problem, which leads to performance degradation of the ensemble. They have also proposed the differentiated learning strategies to cope with performance degradation problem. Hansen and Salamon (1990) insisted that it is necessary and sufficient for the performance enhancement of an ensemble that the ensemble should contain diverse classifiers. Breiman (1996) explored that ensemble learning can increase the performance of unstable learning algorithms, but does not show remarkable performance improvement on stable learning algorithms. Unstable learning algorithms such as decision tree learners are sensitive to the change of the training data, and thus small changes in the training data can yield large changes in the generated classifiers. Therefore, ensemble with unstable learning algorithms can guarantee some diversity among the classifiers. To the contrary, stable learning algorithms such as NN and SVM generate similar classifiers in spite of small changes of the training data, and thus the correlation among the resulting classifiers is very high. This high correlation results in multicollinearity problem, which leads to performance degradation of the ensemble. Kim,s work (2009) showedthe performance comparison in bankruptcy prediction on Korea firms using tradition prediction algorithms such as NN, DT, and SVM. It reports that stable learning algorithms such as NN and SVM have higher predictability than the unstable DT. Meanwhile, with respect to their ensemble learning, DT ensemble shows the more improved performance than NN and SVM ensemble. Further analysis with variance inflation factor (VIF) analysis empirically proves that performance degradation of ensemble is due to multicollinearity problem. It also proposes that optimization of ensemble is needed to cope with such a problem. This paper proposes a hybrid system for coverage optimization of NN ensemble (CO-NN) in order to improve the performance of NN ensemble. Coverage optimization is a technique of choosing a sub-ensemble from an original ensemble to guarantee the diversity of classifiers in coverage optimization process. CO-NN uses GA which has been widely used for various optimization problems to deal with the coverage optimization problem. The GA chromosomes for the coverage optimization are encoded into binary strings, each bit of which indicates individual classifier. The fitness function is defined as maximization of error reduction and a constraint of variance inflation factor (VIF), which is one of the generally used methods to measure multicollinearity, is added to insure the diversity of classifiers by removing high correlation among the classifiers. We use Microsoft Excel and the GAs software package called Evolver. Experiments on company failure prediction have shown that CO-NN is effectively applied in the stable performance enhancement of NNensembles through the choice of classifiers by considering the correlations of the ensemble. The classifiers which have the potential multicollinearity problem are removed by the coverage optimization process of CO-NN and thereby CO-NN has shown higher performance than a single NN classifier and NN ensemble at 1% significance level, and DT ensemble at 5% significance level. However, there remain further research issues. First, decision optimization process to find optimal combination function should be considered in further research. Secondly, various learning strategies to deal with data noise should be introduced in more advanced further researches in the future.
Keywords
Neural Networks; Ensemble; Genetic Algorithms; Coverage Optimization;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Maclin, R. and D. Opitz, "An empirical evaluation of bagging and boosting", Proceedings of the Fourteenth National Conferenceon Artificial Intelligence, (1997), 546-551.
2 Maia, T. T., A. P. Braga and A. F. Carvalho, "Hybrid classification algorithms based on boosting and support vector machines", Kybernetes, Vol.37, No.9(2008), 1469-1491.   DOI   ScienceOn
3 Oliveira, L. S., R. Sabourin, F. Bortolozzi and C. Y. Suen, "Feature selection for ensembles : a hierarchical multi-objective genetic algorithm approach", ICDAR, 2003.
4 Quinlan, J. R., "Bagging, boosting and C4.5. Machine Learning", Proceedings of the Fourteenth International Conference", (1996), 725-730.
5 Valentini, G., M. Muselli and F. Ruffino, "Bagged ensembles of SVMs or gene expression data analysis", The IEEE-INNS-ENNS International Joint Conference on Neural Networks, (2003), 1844-1849.
6 Zhou, Z. H., J. X. Wu, and W. Tang, "Ensembling neural networks: many could better than all", Artificial Intelligence, Vol.137 (2002), 239-263.   DOI   ScienceOn
7 Bauer, E. and R. Kohavi, "An empiricalcomparison of voting classification algorithms : Bagging, boosting, and variants", Machine Learning, Vol.36(1999), 105-139.   DOI   ScienceOn
8 Breiman, L., "Bagging predictors", Machine learning, Vol.24, No.2(1996), 123-140.
9 Buciu, I., C. Kotrooulos and I. Pitas, "Combining support vector machines for accuracy face detection", Proc. ICIP, (2001), 1054-1057.
10 Dong, Y. S. and K. S. Han, "A comparison of several ensemble methods for text categorization", IEEE International Conference on Service Computing, 2004.
11 Drucker, H. and C. Cortes, "Boosting decision trees", Advanced Neural Information Processing Systems, Vol.8(1996).
12 Evgeniou, T., L. Perez-Breva, M. Pontil and T. Poggio, "Bound on the generalization performance of kernel machine ensembles", Proc. ICMI, (2000), 271-278.
13 Fawcett, T., "An introduction to ROC analysis", Pattern Recognition Letters, Vol.27(2006), 861-874.   DOI   ScienceOn
14 Freund, Y. and R. E. Schapire, "A decision theoretic generalization of online learning and an application to boosting", Journal of Computer and System Science, Vol.55, No.1(1997), 119-139.   DOI   ScienceOn
15 Hansen, L. and P. Salamon, "Neural network ensembles", IEEE Trans, PAMI, Vol.12(1990), 993-1001.   DOI   ScienceOn
16 Ho, T. K., "Multiple classifier combination: lessons and next steps, in Hybrid Methods in Pattern Recognition(Ed. By H. Bubke and A. kandel)", World Scientific, 2002.
17 Kim, M. J., "A Performance Comparison of Ensembles in Bankruptcy Prediction", Entrue Journal of Information Technology, Vol.8, No.2(2009), 41-49.
18 Kim, M. J. and D. G. Kang, "An Ensemble with neural networks for bankruptcy prediction", Expert Systems with applications, Vol.37 (2010), 3373-3379.   DOI   ScienceOn
19 Kim, Y. W., I. S. Oh, "Classifier ensemble selection using hybrid genetic algorithms", Pattern Recognition Letters, Vol.29, No.6(2008), 796-802.   DOI   ScienceOn
20 Alfaro, E., M. Gamez and N. García, "Multiclass corporate failure prediction by AdaBoost.M1", AdvancedEconomic Research, Vol.13(2007), 301-312.
21 Alfaro, E., N. Garcia, M. Gamez and D. Elizondo, "Bankruptcy forecasting : an empirical comparison of AdaBooost and neural networks", Decision Support Systems, Vol.45 (2008), 110-122.   DOI   ScienceOn