DOI QR코드

DOI QR Code

The Optimization of Ensembles for Bankruptcy Prediction

기업부도 예측 앙상블 모형의 최적화

  • 김명종 (부산대학교 경영대학 경영학과 ) ;
  • 윤우섭 (부산대학교 경영대학 경영학과 )
  • Received : 2021.10.29
  • Accepted : 2021.12.29
  • Published : 2022.02.28

Abstract

This paper proposes the GMOPTBoost algorithm to improve the performance of the AdaBoost algorithm for bankruptcy prediction in which class imbalance problem is inherent. AdaBoost algorithm has the advantage of providing a robust learning opportunity for misclassified samples. However, there is a limitation in addressing class imbalance problem because the concept of arithmetic mean accuracy is embedded in AdaBoost algorithm. GMOPTBoost can optimize the geometric mean accuracy and effectively solve the category imbalance problem by applying Gaussian gradient descent. The samples are constructed according to the following two phases. First, five class imbalance datasets are constructed to verify the effect of the class imbalance problem on the performance of the prediction model and the performance improvement effect of GMOPTBoost. Second, class balanced data are constituted through data sampling techniques to verify the performance improvement effect of GMOPTBoost. The main results of 30 times of cross-validation analyzes are as follows. First, the class imbalance problem degrades the performance of ensembles. Second, GMOPTBoost contributes to performance improvements of AdaBoost ensembles trained on imbalanced datasets. Third, Data sampling techniques have a positive impact on performance improvement. Finally, GMOPTBoost contributes to significant performance improvement of AdaBoost ensembles trained on balanced datasets.

본 연구에서는 범주 불균형 문제가 내재된 기업부도 예측 AdaBoost 앙상블 모형의 성과를 개선하기 위하여 GMOPTBoost 알고리즘을 제안한다. AdaBoost 알고리즘은 오분류 표본에 대하여 강건한 학습기회를 제공한다는 장점이 있지만, 산술평균 정확도에 기반하기 때문에 범주 불균형 문제를 효과적으로 해결하지 못한다는 한계점이 존재한다. GMOPTBoost는 가우시안 경사하강법(Gaussian gradient descent)을 적용하여 기하평균 정확도를 최적화하고 범주 불균형 문제를 효과적으로 해결할 수 있다는 장점이 있다. 본 연구에서는 첫째, 범주 불균형 문제가 예측 모형의 성과에 미치는 효과와 GMOPTBoost의 성과 개선 효과를 검증하기 위하여 5개의 범주 불균형 데이터를 구성하였으며, 둘째, 범주 균형 데이터에 대한 GMOPTBoost의 성과 개선 효과를 검증하기 위하여 데이터 샘플링 기법을 통하여 구성된 균형 데이터를 구성하였다. 30회의 교차타당성 분석의 주요 결과는 다음과 같다. 첫째, 범주 불균형 문제는 예측 성과에 부정적인 영향을 미친다. 둘째, GMOPTBoost는 불균형 데이터에 적용된 AdaBoost의 성과를 유의적으로 개선시키는 긍정적인 효과를 제공한다. 셋째, 데이터 샘플링 기법은 성과 개선에 긍정적인 영향을 미친다. 마지막으로 데이터 샘플링 기법을 적용한 범주 균형 데이터에서도 GMOPTBoost는 유의적인 성과 개선에 기여한다.

Keywords

Acknowledgement

본 연구는 과학기술정보통신부 및 정보통신기획평가원의 대학ICT연구센터지원사업의 연구결과로 수행되었음(IITP-2020-0-01797).

References

  1. 김량형, 유동희, 김건우, "데이터마이닝 기법을 이용한 기업부실화 예측 모델 개발과 예측 성능 향상에 관한 연구", Information Systems Review, 제18권, 제2호, 2016, pp. 173-198.  https://doi.org/10.29214/damis.2016.35.3.010
  2. 안철휘, 안현철, "효과적인 기업부도 예측모형을 위한 ROSE 표본추출기법의 적용", 한국콘텐츠학회논문지, 제18권, 제8호, 2018, pp. 525-535.  https://doi.org/10.5392/JKCA.2018.18.08.525
  3. Altman, E. L., "Financial ratios, discriminant analysis and the prediction of corporate bankruptcy", The Journal of Finance, Vol.23, No.4, 1968, pp. 589-609.  https://doi.org/10.1111/j.1540-6261.1968.tb00843.x
  4. Barboza, F., H. Kimura, and E. Altman, "Machine Learning Models and Bankruptcy Prediction", Expert Systems with Applications, Vol.83, 2017, pp. 405-417.  https://doi.org/10.1016/j.eswa.2017.04.006
  5. Barua, S., M. Islam, and X. Yao, "MWMOTEMajority weighted minority oversampling technique for imbalanced data set learning", IEEE Transaction on Knowledge and Data Engineering, Vol.26, No.2, 2014, pp. 405-424.  https://doi.org/10.1109/TKDE.2012.232
  6. Beaver, W., "Financial ratios as predictors of failure, empirical research in accounting: Selected studied", Journal of Accounting Research, Vol.4, No.3, 1996, pp. 71-111.  https://doi.org/10.2307/2490171
  7. Chawla, N. V., A. Lazarevic, L. O. Hall, and K. W. Bowyer, "SMOTEBoost: Improving prediction of the minority class in boosting", Proceedings of 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, 2003, pp. 107-119. 
  8. Davis, J. and M. Goadrich, "The relationship between precision-recall and ROC curves", Proceedings of the 23rd International Conference on Machine Learning, 2006, pp. 233-240. 
  9. Fawcett, T., "An introduction to ROC analysis", Pattern Recognition Letters, Vol.27, No.8, 2006, pp. 861-874.  https://doi.org/10.1016/j.patrec.2005.10.010
  10. Freund, Y. and R. E. Schapire, "A Decision theoretic generalization of online learning and an application to boosting", Journal of Computer and System Science, Vol.55, No.1, 1997, pp. 119-139.  https://doi.org/10.1006/jcss.1997.1504
  11. He, H. and E. A. Garcia, "Learning from imbalanced data", IEEE Transactions on Knowledge and Data Engineering, Vol.21, No.9, 2009, pp. 1263-1284.  https://doi.org/10.1109/TKDE.2008.239
  12. Kim, M. J. and D. K. Kang, "Ensemble with neural networks for bankruptcy prediction", Expert Systems with Applications, Vol.37, No.4, 2010, pp. 3373-3379.  https://doi.org/10.1016/j.eswa.2009.10.012
  13. Kim, M. J., D. K. Kang, and H. B. Kim, "Geometric mean based boosting algorithm with over-sampling to resolve data imbalance problem for bankruptcy prediction", Expert Systems with Applications, Vol.42, No.3, 2015, pp. 1074-1082.  https://doi.org/10.1016/j.eswa.2014.08.025
  14. Kim, S. Y. and A. Upneja, "Predicting restaurant financial distress using decision tree and ada-boosted decision tree models", Economic Modeling, Vol.36, 2014, pp. 354-362.  https://doi.org/10.1016/j.econmod.2013.10.005
  15. Kuncheva, L. I., A. Arnaiz-Gonzalez, J. F. Diez-Pastor, and L. A. D. Gunn, "Instance selection improves geometric mean accuracy: A study on imbalanced data classification", Progress in Artificial Intelligence, Vol.8, 2019, pp. 215-228.  https://doi.org/10.1007/s13748-019-00172-4
  16. Kwon, Y. S., I. Han, and K. C. Lee, "Ordinal pairwise partitioning(OPP) approach to neural networks training in bond rating", Intelligent Systems in Accounting, Finance and Management, Vol.6, 1997, 23-40.  https://doi.org/10.1002/(SICI)1099-1174(199703)6:1<23::AID-ISAF113>3.0.CO;2-4
  17. Le, T., M. Y. Lee, J. R. Park, and S. W. Baik, "Oversampling techniques for bankruptcy prediction: novel features from a transaction dataset", Symmetry, Vol.10, No.4, 2018b, Available at https://doi.org/10.3390/sym10040079. 
  18. Le, T., L. H. Son, M. T. Vo, M. Y. Lee, and S. W. Baik, "A cluster-based boosting algorithm for bankruptcy prediction in a highly imbalanced dataset", Symmetry, Vol.10, No.7, 2018a. Available at https://doi.org/10.3390/sym10070250. 
  19. Lin, W. C., C. F. Tsai, Y. H. Hu, and J. S. Jhang, "Clustering-based undersampling in class imbalanced data", Information Sciences, Vol.409-410, 2017, pp. 17-26.  https://doi.org/10.1016/j.ins.2017.05.008
  20. Mellor, A., S. Boukir, A. Haywood, and S. Jones, "Exploring issues of training data imbalance and mislabelling on random forest performance for large area land cover classification using the ensemble margin", ISPRS Journal of Photogrammetry and Remote Sensing, Vol.105, 2015, pp. 155-168.  https://doi.org/10.1016/j.isprsjprs.2015.03.014
  21. Messier, W. F. Jr. and J. V. Hansen, "Inducing rules for expert system development: An example using default and bankruptcy data", Management Science, Vol.34, No.4, 1998, pp. 1403-1415.  https://doi.org/10.1287/mnsc.34.12.1403
  22. Nanni, L. and A. Lumini, "A genetic encoding approach for learning methods for combining classifiers", Expert Systems with Applications, Vol.36, No.4, 2009, pp. 7510-7514.  https://doi.org/10.1016/j.eswa.2008.09.029
  23. Odom, M. D. and R. Sharda, "A neural network model for bankruptcy prediction", IJCNN International Joint Conference on Neural Networks Neural Networks, Vol.2, 1990, pp. 163-168. 
  24. Schapire, R. E., "The strength of weak learnability", Machine Learning, Vol.5, No.2, 1990, pp. 197-227.  https://doi.org/10.1007/BF00116037
  25. Seiffert, C., T. M. Khoshgoftaar, J. V. Hulse, and A. Napolitano, "RUSBoost: Improving classification performance when training data is skewed", Proceedings of the 19th International Conference on Pattern Recognition, 2008, pp. 1-4. 
  26. Shin, K., T. Lee, and H. Kim, "An application of support vector machines in bankruptcy prediction", Expert Systems with Applications, Vol.28, 2005, pp. 127-135.  https://doi.org/10.1016/j.eswa.2004.08.009
  27. Somasundaram, A. and S. Reddy, "Parallel and incremental credit card fraud detection model to handle concept drift and data imbalance", Neural Computing and Applicatopms, Vol.31, 2019, pp. 3-14.  https://doi.org/10.1007/s00521-018-3633-8
  28. UlagaPriya, K. and S. Pushpa, "A comprehensive study on ensemble-based imbalanced data classification methods for bankruptcy data", IEEE 6th international Conference on Inventive Computation Technologies(ICICT), 2021. pp. 800-804. 
  29. Weng, C. G. and J. Poon, "A new evaluation measure for imbalanced datasets", Proceedings of the 7th Australasian Data Mining Conference, Vol.87, 2008, pp. 27-32. 
  30. Zhang, G., M. Y. Hu, B. E. Patuwo, and D. C. Indro, "Theory and methodology artificial neural networks in bankruptcy prediction: General framework and cross-validation analysis", European Journal of Operational Research, Vol.116, 1999, pp. 16-32.  https://doi.org/10.1016/S0377-2217(98)00051-4
  31. Zieba, M., S. K. Tomczak, and J. M. Tomczak, "Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction", Expert Systems with Applications, Vol.58, 2016, pp. 93-101. https://doi.org/10.1016/j.eswa.2016.04.001