Browse > Article
http://dx.doi.org/10.11627/jkise.2016.39.1.105

Improving an Ensemble Model Using Instance Selection Method  

Min, Sung-Hwan (Department of Business Administration, Hallym University)
Publication Information
Journal of Korean Society of Industrial and Systems Engineering / v.39, no.1, 2016 , pp. 105-115 More about this Journal
Abstract
Ensemble classification involves combining individually trained classifiers to yield more accurate prediction, compared with individual models. Ensemble techniques are very useful for improving the generalization ability of classifiers. The random subspace ensemble technique is a simple but effective method for constructing ensemble classifiers; it involves randomly drawing some of the features from each classifier in the ensemble. The instance selection technique involves selecting critical instances while deleting and removing irrelevant and noisy instances from the original dataset. The instance selection and random subspace methods are both well known in the field of data mining and have proven to be very effective in many applications. However, few studies have focused on integrating the instance selection and random subspace methods. Therefore, this study proposed a new hybrid ensemble model that integrates instance selection and random subspace techniques using genetic algorithms (GAs) to improve the performance of a random subspace ensemble model. GAs are used to select optimal (or near optimal) instances, which are used as input data for the random subspace ensemble model. The proposed model was applied to both Kaggle credit data and corporate credit data, and the results were compared with those of other models to investigate performance in terms of classification accuracy, levels of diversity, and average classification rates of base classifiers in the ensemble. The experimental results demonstrated that the proposed model outperformed other models including the single model, the instance selection model, and the original random subspace ensemble model.
Keywords
Random Subspace; Bagging; Bankruptcy Prediction; Genetic Algorithms;
Citations & Related Records
Times Cited By KSCI : 4  (Citation Analysis)
연도 인용수 순위
1 Abellan, J. and Mantas, C.J., Improving Experimental Studies about Ensembles of Classifiers for Bankruptcy Prediction and Credit Scoring, Expert Systems with Applications, 2014, Vol. 41, No. 8, pp. 3825-3830.   DOI
2 Altman, E.L., Financial ratios, discriminant analysis and the prediction of corporate bankruptcy, The Journal of Finance, 1968, Vol. 23, No. 4, pp. 589-609.   DOI
3 Beaver, W., Financial ratios as predictors of failure, empirical research in accounting : Selected studied, Journal of Accounting Research, 1966, Vol. 4, No. 3, pp. 71-111.   DOI
4 Bian, S. and Wang, W., On diversity and accuracy of homogeneous and heterogeneous ensembles, International Journal of Hybrid Intelligent Systems, 2007, Vol. 4, No. 2, pp. 103-128.   DOI
5 Breiman, L., Bagging predictors, Machine Learning, 1996, Vol. 24, No. 2, pp. 123-140.   DOI
6 Bryant, S.M., A case-based reasoning approach to bankruptcy prediction modeling, International Journal of Intelligent Systems in Accounting, Finance and Management, 1997, Vol. 6, No. 3, pp. 195-214.   DOI
7 Derrac, J., Cornelis, C., Garcia, S., and Herrera, F., Enhancing evolutionary instance selection algorithms by means of fuzzy rough set based feature selection, Information Sciences, 2012, Vol. 186, No. 1, pp. 73-92.   DOI
8 Dietterich, T.G., Machine-learning research : Four current directions, AI Magazine, 1997, Vol. 18, No. 4, pp. 97-136.
9 Freund, Y. and Schapire, R., Experiments with a new boosting algorithm, Proceedings of the 13th, International Conference on Machine learning, 1996, pp. 148-156.
10 Garcia, V., Marques, A.I., and Sanchez, J.S., On the use of data filtering techniques for credit risk prediction with instance-based models, Expert Systems with Applications, 2012, Vol. 39, No. 18, pp. 13267-13276.   DOI
11 Goldberg, D.E., Genetic algorithms in search, optimization and machine learning, New York : Addison-Wesley, 1989.
12 Hart, P.E., The condensed nearest neighbor rule, IEEE Transactions on Information Theory, 1968, Vol. 14, pp. 515-516.   DOI
13 Ho, T.K., The random subspace method for constructing decision forests, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, Vol. 20, No. 8, pp. 832-844.   DOI
14 Hung, C. and Chen, J.-H., A Selective Ensemble Based on Expected Probabilities for Bankruptcy Prediction, Expert Systems with Applications, 2009, Vol. 36, No. 3, pp. 5297-5303.   DOI
15 Kim, K.-J. and Ahn, H., Optimization of Support Vector Machines for Financial Forecasting, Journal of Intelligence and Information Systems, 2011, Vol. 17, No. 4, pp. 241-254.
16 Kim, M. and Kang, D., Ensemble with neural networks for bankruptcy prediction, Expert System with Applications, 2010, Vol. 37, No. 4, pp. 3373-3379.   DOI
17 Louzada, F., Anacleto-Junior, O., Candolo, C., and Mazucheli, J., Poly-bagging predictors for classification modelling for credit scoring, Expert Systems with Applications, 2011, Vol. 38, No. 10, pp. 2717-12720.   DOI
18 Kim, M., Kang, D., and Kim, H.B., Geometric Mean Based Boosting Algorithm with over-Sampling to Resolve Data Imbalance Problem for Bankruptcy Prediction, Expert Systems with Applications, 2015, Vol. 42, No. 3, pp. 1074-1082.   DOI
19 Kuncheva, L.I. and Whitaker, C.J., Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy, Machine Learning, 2003, Vol. 51, No. 2, pp. 181-207.   DOI
20 Li, H., Lee, Y.-C., Zhou, Y.-C., and Sun, J., The random subspace binary logit (RSBL) model for bankruptcy prediction, Knowledge-Based Systems, 2011, Vol. 24, No. 8, pp. 1380-1388.   DOI
21 Marques, A.I., Garcia, V., and Sanchez, J.S., Exploring the Behaviour of Base Classifiers in Credit Scoring Ensembles, Expert Systems with Applications, 2012, Vol. 39, No. 11, pp. 10244-10250.   DOI
22 Messier, W. and Hansen, J., Inducing rules for expert system development : an example using default and bankruptcy data, Management Science, 1998, Vol. 34, No. 12, pp. 1403-1415.   DOI
23 Meyer, P.A. and Pifer, H., Prediction of bank failures, The Journal of Finance, 1970, Vol. 25, pp. 853-868.   DOI
24 Min, S.-H., Lee, J., and Han, I., Hybrid genetic algorithms and support vector machines for bankruptcy prediction, Expert Systems with Applications, 2006, Vol. 31, No. 3, pp. 652-660.   DOI
25 Nanni, L. and Lumini, A., An Experimental Comparison of Ensemble of Classifiers for Bankruptcy Prediction and Credit Scoring, Expert Systems with Applications, 2009, Vol. 36, No. 2, pp. 3028-3033.   DOI
26 Tsai, C. and Wu, J., Using Neural Network Ensembles for Bankruptcy Prediction and Credit Scoring, Expert Systems with Applications, 2008, Vol. 34, No. 4, pp. 2639-2649.   DOI
27 Ohlson, J., Financial ratios and the probabilistic prediction of bankruptcy, Journal of Accounting Research, 1980, Vol. 18, No. 1, pp. 109-131.   DOI
28 Park, K.-J., Simulation Optimization of Manufacturing System using Real-coded Genetic Algorithm, Journal of Society of Korea Industrial and Systems Engineering, 2005, Vol. 28, No. 3, pp. 149-155.
29 Tam, K. and Kiang, M., Managerial applications of neural networks : the case of bank failure predictions, Management Science, 1992, Vol. 38, No. 7, pp. 926-947.   DOI
30 Wang, G. and Ma, J., A hybrid ensemble approach for enterprise credit risk assessment based on Support Vector Machine, Expert Systems with Applications, 2009, Vol. 39, No. 5, pp. 5325-5331.
31 www.kaggle.com/c/GiveMeSomeCredit (Give Me Some Credit).
32 Yoo, J., Release Planning in Software Product Lines Using a Genetic Algorithm, Journal of Society of Korea Industrial and Systems Engineering, 2012, Vol. 35, No. 4, pp. 142-148.   DOI
33 Yum, C.-S. and Lee, H.-J., Economic Design of Local Area Networks using Genetic Algorithms, Journal of Society of Korea Industrial and Systems Engineering, 2005, Vol. 28, No. 2, pp. 101-108.
34 Yum, J.K., Nam, K.S., A Study of D-Optimal Design in Nonlinear Model Using the Genetic Algorithm, Journal of the Korean Society for Quality Management, 2000, Vol. 28, No. 2, pp. 135-146.
35 Zhang, G., Hu, Y.M., Patuwo, E.B., and Indro, C.D., Artificial neural networks in bankruptcy prediction : general framework and cross-validation analysis, European Journal of Operational Research, 1999, Vol. 116, pp. 16-32.   DOI