DOI QR코드

DOI QR Code

A Methodology for Bankruptcy Prediction in Imbalanced Datasets using eXplainable AI

데이터 불균형을 고려한 설명 가능한 인공지능 기반 기업부도예측 방법론 연구

  • Heo, Sun-Woo (Department of Management Consulting, Graduate School of Hanyang University) ;
  • Baek, Dong Hyun (Division of Business Administration, Hanyang University)
  • 허선우 (한양대학교 일반대학원 경영컨설팅학과) ;
  • 백동현 (한양대학교 경상대학 경영학부)
  • Received : 2022.05.29
  • Accepted : 2022.06.25
  • Published : 2022.06.30

Abstract

Recently, not only traditional statistical techniques but also machine learning algorithms have been used to make more accurate bankruptcy predictions. But the insolvency rate of companies dealing with financial institutions is very low, resulting in a data imbalance problem. In particular, since data imbalance negatively affects the performance of artificial intelligence models, it is necessary to first perform the data imbalance process. In additional, as artificial intelligence algorithms are advanced for precise decision-making, regulatory pressure related to securing transparency of Artificial Intelligence models is gradually increasing, such as mandating the installation of explanation functions for Artificial Intelligence models. Therefore, this study aims to present guidelines for eXplainable Artificial Intelligence-based corporate bankruptcy prediction methodology applying SMOTE techniques and LIME algorithms to solve a data imbalance problem and model transparency problem in predicting corporate bankruptcy. The implications of this study are as follows. First, it was confirmed that SMOTE can effectively solve the data imbalance issue, a problem that can be easily overlooked in predicting corporate bankruptcy. Second, through the LIME algorithm, the basis for predicting bankruptcy of the machine learning model was visualized, and derive improvement priorities of financial variables that increase the possibility of bankruptcy of companies. Third, the scope of application of the algorithm in future research was expanded by confirming the possibility of using SMOTE and LIME through case application.

Keywords

Acknowledgement

This work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea(NRF-2019S1A5C2A04083153).

References

  1. Ahn, J. H., XAI, Dissects Artificial Intelligence, Wiki Books, 2020.
  2. Altman, E. I., Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy, The Journal of Finance, 1968, Vol. 23, No. 4, pp. 589-609. https://doi.org/10.1111/j.1540-6261.1968.tb00843.x
  3. Altman, E.I., Marco, G., and Varetto, F., Corporate Distress Diagnosis: Comparisons Using Linear Discriminant Analysis and Neural Networks (the Italian experience), Journal of Banking & Finance, 1994, Vol. 18, No. 3. pp. 505-529. https://doi.org/10.1016/0378-4266(94)90007-8
  4. Beaver, W.H., Financial Ratios as Predictors of Failure, Journal of Accounting Research, 1966, Vol. 4, pp. 71-111 https://doi.org/10.2307/2490171
  5. Breiman, L., Random forests, Machine Learning, 2001, Vol. 45, No. 1, pp. 5-32. https://doi.org/10.1023/A:1010933404324
  6. Chawla, N.V., Bowyer, K.W., Hall, L.O., and Kegelmeyer, W.P., SMOTE: Synthetic Minority oversampling Technique, Journal of Artificial Intelligence Research, 2002, Vol. 16, pp. 321-357. https://doi.org/10.1613/jair.953
  7. Chen, T. and Guestrin, C., Xgboost: A Scalable Tree Boosting System, In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, August 2016, pp. 785-794.
  8. Cortes, C. and Vapnik, V., Support-vector Networks, Machine Learning, 1995, Vol. 20, No. 3, pp. 273-297. https://doi.org/10.1007/BF00994018
  9. Dastile, X., Celik, T., and Potsane, M., Statistical and Machine Learning Models in Credit Scoring: A Systematic Literature Survey, Applied Soft Computing, 2020, Vol. 91, pp. 1-21.
  10. Drotar, P., Gnip, P., Zoricak, M., and Gazda, V., Small-and Medium-Enterprises Bankruptcy Dataset, Data in brief, 2019. Vol. 25, pp. 1-6.
  11. Edmister, R. O., An Empirical Test of Financial Ratio Analysis for Small Business Failure Prediction, Journal of Financial and Quantitative Analysis, 1972, Vol. 7, No. 2, pp. 1477-1493. https://doi.org/10.2307/2329929
  12. Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., and Bing, G., Learning from Class- Imbalanced Data: Review of Methods and Applications, Expert Systems with Applications, 2017, Vol. 73, pp. 220-239. https://doi.org/10.1016/j.eswa.2016.12.035
  13. Jo, H. and Han, I., Integration of Case-Based Forecasting, Neural Network, and Discriminant Analysis for Bankruptcy Prediction, Expert Systems with applications, 1996. Vol. 11, No. 4, pp 415-422. https://doi.org/10.1016/S0957-4174(96)00056-5
  14. John, G.H., Kohavi, R., and Pfleger, K., Irrelevant Features and the Subset Selection Problem, In Machine Learning Proceedings, 1994, pp. 121-129.
  15. Keany, E., Boruta-Shap: A Tree Based Feature Selection Tool which Combines Both the Boruta Feature Selection Algorithm with Shapley Values, 2019.[Website] (2021, Nov .27). https://github.com/Ekeany/Boruta-Shap.
  16. Kim, H., GAN-based Oversampling Technique for Imbalanced Bankruptcy Data Processing. Master's Thesis, Ewha Womans University, 2020.
  17. Kim, S.J. and Ahn, H.C., Application of Random Forests to Corporate Credit Rating Prediction, The Journal of Business and Economics, 2016, Vol. 32, No. 1, pp. 187-211.
  18. Kotsiantis, S., Tzelepis, D., Koumanakos, E., and Tampakas, V., Selective Costing Voting for Bankruptcy Prediction, International Journal of Knowledge-based and Intelligent Engineering Systems, 2007, Vol. 11, No. 2, pp. 115-127. https://doi.org/10.3233/KES-2007-11204
  19. Marvin, M. and Seymour, A.P., Perceptrons, MIT Press, 1969.
  20. McCulloch, W.S. and Pitts, W., A Logical Calculus of the Ideas Immanent in Nervous Activity, The bulletin of Mathematical Biophysics, 1943, Vol. 5, No. 4, pp. 115-133. https://doi.org/10.1007/BF02478259
  21. Nilsson, R., Pena, J. M., Bjorkegren, J., and Tegnor, J., Consistent Feature Selection for Pattern Recognition in Polynomial Time, The Journal of Machine Learning Research, 2007, Vol. 8, pp. 589-612.
  22. O'Brien, R. and Ishwaran, H., A Random Forests Quantile Classifier for Class Imbalanced Data, Pattern Recognition, 2019, Vol. 90, pp. 232-249. https://doi.org/10.1016/j.patcog.2019.01.036
  23. Odom, M.D. and Sharda, R., A Neural Network Model for Bankruptcy Prediction, In 1990 IJCNN International Joint Conference on Neural Networks, June 1990.
  24. Ohlson, J.A., Financial Ratios and the Probabilistic Prediction of Bankruptcy, Journal of Accounting Research, 1980, Vol. 18, No. 1, pp. 109-131 https://doi.org/10.2307/2490395
  25. Ohn, S.Y., Chi, S.D., and Han, M.Y., Feature Selection for Classification of Mass Spectrometric Proteomic Data Using Random Forest, Journal of the Korea Society for Simulation, 2013, Vol. 22, No. 4, pp. 139-147. https://doi.org/10.9709/JKSS.2013.22.4.139
  26. Park, J.R., A Study on Improving Turnover Intention Forecasting Power through Solving Imbalanced Data Problems: Focusing on SMOTE and Generative Adversarial Networks, Doctorial Dissertation, Chungbuk National University, 2021.
  27. Pinches, G.E., Mingo, K.A., and Caruthers, J.K., The Stability of Financial Patterns in Industrial Organizations, The Journal of Finance, 1973, Vol. 28, No. 2, pp. 389-396. https://doi.org/10.1111/j.1540-6261.1973.tb01782.x
  28. Ribeiro, B. and Lopes, N., Deep Belief Networks for Financial Prediction, In International Conference on Neural Information Processing, 2011. pp. 766-773
  29. Rosenblatt, F., The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain, Psychological Review, 1958, Vol. 65, No. 6, pp. 386-408. https://doi.org/10.1037/h0042519
  30. Rumelhart, D.E., Hinton, G.E., and Williams, R.J., Learning Representations by Back-Propagating Errors, Nature, 1986, Vol. 323, No. 6088, pp. 533-536. https://doi.org/10.1038/323533a0
  31. Scholkopf, B., The Kernel Trick for Distances, Advances in neural information processing systems, 2000, Vol.13.
  32. Shin, K.S., Lee, T.S., and Kim, H.J., An Application of Support Vector Machines in Bankruptcy Prediction Model, Expert Systems with Applications, 2005, Vol. 28, No. 1, pp.127-135. https://doi.org/10.1016/j.eswa.2004.08.009
  33. Sun, Y., Kamel, M.S., Wong, A.K., and Wang, Y., Cost-sensitive Boosting for Classification of Imbalanced Data, Pattern Recognition, 2007, Vol. 40, No. 12, pp. 3358-3378. https://doi.org/10.1016/j.patcog.2007.04.009
  34. Tam, K.Y. and Kiang, M.Y., "Managerial Applications of Neural Networks: The Case of Bank Failure Predictions, Management Science, 1992, Vol. 38, No. 7, pp. 926-947. https://doi.org/10.1287/mnsc.38.7.926
  35. Wang, B. X. and Japkowicz, N., Boosting Support Vector Machines for Imbalanced Data Sets. Knowledge and Information Systems, Vol. 25, No. 1, pp. 1-20. https://doi.org/10.1007/s10115-009-0198-y
  36. Wilson, R.L. and Sharda, R., Bankruptcy prediction using neural networks, Decision Support Systems, 1994, Vol. 11, No. 5, pp 545-557. https://doi.org/10.1016/0167-9236(94)90024-8
  37. Wu, C.H., Tzeng, G.H., Goo, Y.J., and Fang, W.C., A Real-valued Genetic Algorithm to Optimize the Parameters of Support Vector Machine for Predicting Bankruptcy, Expert systems with application, 2007. Vol. 32, No. 2, pp. 397-408 https://doi.org/10.1016/j.eswa.2005.12.008
  38. Yoo, J.E., Random Forests, an Alternative Data Mining Technique to Decision Tree, Journal of Educational Evaluation, 2015, Vol. 28, No. 2, pp. 427-448.
  39. Zmijewski, M.E., Methodological Issues Related to the Estimation of Financial Distress Prediction Models, Journal of Accounting Research, 1984, Vol. 22, pp. 59-82 https://doi.org/10.2307/2490859
  40. Zoricak, M., Gnip, P., Drotar, P., and Gazda, V., Bankruptcy Prediction for Small-and Medium-sized Companies Using Severely Imbalanced Datasets, Economic Modelling, 2020, Vol. 84, pp. 165-176 https://doi.org/10.1016/j.econmod.2019.04.003