DOI QR코드

DOI QR Code

A Study on Classification Models for Predicting Bankruptcy Based on XAI

XAI 기반 기업부도예측 분류모델 연구

  • Received : 2022.12.26
  • Accepted : 2023.04.24
  • Published : 2023.08.31

Abstract

Efficient prediction of corporate bankruptcy is an important part of making appropriate lending decisions for financial institutions and reducing loan default rates. In many studies, classification models using artificial intelligence technology have been used. In the financial industry, even if the performance of the new predictive models is excellent, it should be accompanied by an intuitive explanation of the basis on which the result was determined. Recently, the US, EU, and South Korea have commonly presented the right to request explanations of algorithms, so transparency in the use of AI in the financial sector must be secured. In this paper, an artificial intelligence-based interpretable classification prediction model was proposed using corporate bankruptcy data that was open to the outside world. First, data preprocessing, 5-fold cross-validation, etc. were performed, and classification performance was compared through optimization of 10 supervised learning classification models such as logistic regression, SVM, XGBoost, and LightGBM. As a result, LightGBM was confirmed as the best performance model, and SHAP, an explainable artificial intelligence technique, was applied to provide a post-explanation of the bankruptcy prediction process.

기업 부도의 효율적인 예측은 금융기관의 적절한 대출 결정과 여신 부실률 감소 측면에서 중요한 부분이다. 많은 연구에서 인공지능 기술을 활용한 분류모델 연구를 진행하였다. 금융 산업 특성상 새로운 예측 모델의 성능이 우수하더라도 어떤 근거로 결과를 출력했는지 직관적인 설명이 수반되어야 한다. 최근 미국, EU, 한국 등 에서는 공통적으로 알고리즘의 설명요구권을 제시하고 있어 금융권 AI 활용에 투명성을 확보하여야 한다. 본 논문에서는 외부에 오픈된 기업부도 데이터를 활용하여 인공지능 기반의 해석 가능한 분류 예측 모델을 제안하였다. 먼저 데이터 전처리 작업, 5겹 교차검증 등을 수행하고 로지스틱 회귀, SVM, XGBoost, LightGBM 등 10가지 지도학습 분류모델 최적화를 통해 분류 성능을 비교하였다. 그 결과 LightGBM이 가장 우수한 모델로 확인되었고, 설명 가능한 인공지능 기법인 SHAP을 적용하여 부도예측 과정에 대한 사후 설명을 제공하였다.

Keywords

References

  1. A. Hanif, "Towards explainable artificial intelligence in banking and financial services," arXiv preprint arXiv: 2112.08441, 2021. 
  2. E. I. Altman, "Financial ratios, discriminant analysis and the prediction of corporate bankruptcy," The Journal of Finance, Vol.23, No.4, pp.589-609, 1968.  https://doi.org/10.1111/j.1540-6261.1968.tb00843.x
  3. J. A. Ohlson, "Financial ratios and the probabilistic prediction of bankruptcy," Journal of Accounting Research, pp.109-131, 1980. 
  4. C. Cortes and V. Vapnik, "Support-vector networks," Machine Learning, Vol.20, pp.273-297, 1995.  https://doi.org/10.1007/BF00994018
  5. G. Wang, J. Hao, J. Ma, and H. Jiang, "A comparative assessment of ensemble learning for credit scoring," Expert Systems with Applications, Vol.38, No.1, pp.223-230, 2011.  https://doi.org/10.1016/j.eswa.2010.06.048
  6. S. M. Lundberg et al., "From local explanations to global understanding with explainable AI for trees," Nature Machine Intelligence, Vol.2, No.1, pp.56-67, 2020.  https://doi.org/10.1038/s42256-019-0138-9
  7. T. N. Chou, "An explainable hybrid model for bankruptcy prediction based on the decision tree and deep neural network," In 2019 IEEE 2nd International Conference on Knowledge Innovation and Invention (ICKII), IEEE, pp. 122-125, 2019. 
  8. M. T. Ribeiro, S. Singh, and C. Guestrin, "'Why should i trust you?' Explaining the predictions of any classifier," In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135-1144, 2016. 
  9. R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, F. Giannotti, and D. Pedreschi, "A survey of methods for explaining black box models," ACM computing surveys (CSUR), Vol.51, No.5, pp.1-42, 2018.  https://doi.org/10.1145/3236009
  10. A. Adadi and M. Berrada, "Peeking inside the black-box: A survey on explainable artificial intelligence (XAI)," IEEE Access, Vol.6, pp.52138-52160, 2018.  https://doi.org/10.1109/ACCESS.2018.2870052
  11. S. M. Lundberg and S. I. Lee, "A unified approach to interpreting model predictions," Advances in Neural Information Processing Systems, Vol.30, 2017. 
  12. F. Barboza, H. Kimura, and E. Altman, "Machine learning models and bankruptcy prediction," Expert Systems with Applications, Vol.83, pp.405-417, 2017.  https://doi.org/10.1016/j.eswa.2017.04.006
  13. H. A. Alaka, L. O. Oyedele, H. A. Owolabi, V. Kumar, S. O. Ajayi, O. O. Akinade and M. Bilal, "Systematic review of bankruptcy prediction models: Towards a framework for tool selection," Expert Systems with Applications, Vol.94, pp.164-184, 2018.  https://doi.org/10.1016/j.eswa.2017.10.040
  14. S. Jones, "Corporate bankruptcy prediction: A high dimensional analysis," Review of Accounting Studies, Vol.22, pp.1366-1422, 2017.  https://doi.org/10.1007/s11142-017-9407-1
  15. S. B. Jabeur, C. Gharib, S. Mefteh-Wali, and W. B. Arfi, "CatBoost model and artificial intelligence techniques for corporate failure prediction," Technological Forecasting and Social Change, Vol.166, pp.120658, 2021. 
  16. Y. P. Huang and M. F. Yen, "A new perspective of performance comparison among machine learning algorithms for financial distress prediction," Applied Soft Computing, Vol.83, pp.105663, 2019. 
  17. P. Carmona, F. Climent, and A. Momparler, "Predicting failure in the US banking sector: An extreme gradient boosting approach," International Review of Economics & Finance, Vol.61, pp.304-323, 2019.  https://doi.org/10.1016/j.iref.2018.03.008
  18. R. Matin, C. Hansen, C. Hansen, and P. Molgaard, "Predicting distresses using deep learning of text segments in annual reports," Expert Systems with Applications, Vol.132, pp.199-208, 2019.  https://doi.org/10.1016/j.eswa.2019.04.071
  19. H. Son, C. Hyun, D. Phan, and H. J. Hwang, "Data analytic approach for bankruptcy prediction," Expert Systems with Applications, Vol.138, pp.112816, 2019. 
  20. S. B. Jabeur, "Bankruptcy prediction using partial least squares logistic regression," Journal of Retailing and Consumer Services, Vol.36, pp.197-202, 2017.  https://doi.org/10.1016/j.jretconser.2017.02.005
  21. Y. Wu, Y. Xu, and J. Li, "Feature construction for fraudulent credit card cash-out detection," Decision Support Systems, Vol.127, pp.113155, 2019. 
  22. S. Zhang, X. Li, M. Zong, X. Zhu, and D. Cheng, "Learning k for knn classification," ACM Transactions on Intelligent Systems and Technology (TIST), Vol.8, No.3, pp.1-19, 2017.  https://doi.org/10.1145/2990508
  23. M. I. Habibie and N. Nurda, "Performance analysis and classification using naive bayes and logistic regression on big data," In 2022 1st International Conference on Smart Technology, Applied Informatics, and Engineering (APICS), IEEE, pp.48-52, 2022. 
  24. L. Breiman, "Random forests," Machine Learning, Vol.45, pp.5-32, 2001.  https://doi.org/10.1023/A:1010933404324
  25. S. K. Yadav and S. Pal, "Data mining: A prediction for performance improvement of engineering students using classification," arXiv preprint arXiv:1203.3832, 2012. 
  26. T. Chen and C. Guestrin, "Xgboost: A scalable tree boosting system," In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.785-794, 2016. 
  27. J. H. Friedman, "Greedy function approximation: A gradient boosting machine," Annals of Statistics, pp. 1189-1232, 2001. 
  28. G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T. Y. Liu, "Lightgbm: A highly efficient gradient boosting decision tree," Advances in Neural Information Processing Systems, Vol.30, 2017. 
  29. L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, "CatBoost: Unbiased boosting with categorical features," Advances in Neural Information Processing Systems, Vol.31, 2018. 
  30. T. Duan, A. Anand, D. Y. Ding, K. K. Thai, S. Basu, A. Ng, and A. Schuler, "Ngboost: Natural gradient boosting for probabilistic prediction," In International Conference on Machine Learning, PMLR, pp.2690-2700, 2020. 
  31. G. D. P. Regulation, "General data protection regulation (GDPR)," Intersoft Consulting, Accessed in October, 24.1, 2018. 
  32. Ministry of Science and ICT, "Artificial Intelligence (AI) R&D Strategy for the Realization of I-Korea 4.0," 2018. 
  33. D. Liang, C. C. Lu, C. F. Tsai, and G. A. Shih, "Financial ratios and corporate governance indicators in bankruptcy prediction: A comprehensive study," European Journal of Operational Research, Vol.252, No.2, pp.561-572, 2016.  https://doi.org/10.1016/j.ejor.2016.01.012
  34. J. Sun, H. Li, H. Fujita, B. Fu, and W. Ai, "Class-imbalanced dynamic financial distress prediction based on Adaboost-SVM ensemble combined with SMOTE and time weighting," Information Fusion, Vol.54, pp.128-144, 2020.  https://doi.org/10.1016/j.inffus.2019.07.006
  35. J. Fan, X. Wang, F. Zhang, X. Ma, and L. Wu, "Predicting daily diffuse horizontal solar radiation in various climatic regions of China using support vector machine and tree-based soft computing models with local and extrinsic climatic data," Journal of Cleaner Production, Vol.248, pp.119264, 2020. 
  36. J. D. Rodriguez, A. Perez, and J. A. Lozano, "Sensitivity analysis of k-fold cross validation in prediction error estimation," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.32, pp.3, pp.569-575, 2009.  https://doi.org/10.1109/TPAMI.2009.187
  37. A. Tharwat, "Classification assessment methods," Applied Computing and Informatics, Vol.17, pp.1, pp.168-192, 2021. https://doi.org/10.1016/j.aci.2018.08.003