DOI QR코드

DOI QR Code

Incorporating BERT-based NLP and Transformer for An Ensemble Model and its Application to Personal Credit Prediction

  • Received : 2023.11.05
  • Accepted : 2024.01.26
  • Published : 2024.04.30

Abstract

Tree-based algorithms have been the dominant methods used build a prediction model for tabular data. This also includes personal credit data. However, they are limited to compatibility with categorical and numerical data only, and also do not capture information of the relationship between other features. In this work, we proposed an ensemble model using the Transformer architecture that includes text features and harness the self-attention mechanism to tackle the feature relationships limitation. We describe a text formatter module, that converts the original tabular data into sentence data that is fed into FinBERT along with other text features. Furthermore, we employed FT-Transformer that train with the original tabular data. We evaluate this multi-modal approach with two popular tree-based algorithms known as, Random Forest and Extreme Gradient Boosting, XGBoost and TabTransformer. Our proposed method shows superior Default Recall, F1 score and AUC results across two public data sets. Our results are significant for financial institutions to reduce the risk of financial loss regarding defaulters.

Keywords

References

  1. Wang, Chongren, and Zhuoyi Xiao. "A Deep Learning Approach for Credit Scoring Using Feature Embedded Transformer," Applied Sciences, vol. 12, no. 21, 2022. 
  2. Markov, Anton, Zinaida Seleznyova, and Victor Lapshin. "Credit scoring methods: Latest trends and points to consider," The Journal of Finance and Data Science, vol.8, pp. 180-201, 2022.  https://doi.org/10.1016/j.jfds.2022.07.002
  3. Hayashi, Yoichi. "Emerging trendsin deep learning for credit scoring: A review," Electronics, vol. 11, no. 19, 2022. 
  4. Crook, Jonathan N., David B. Edelman, and Lyn C. Thomas. "Recent developments in consumer credit risk assessment," European Journal of Operational Research, vol. 185, no. 3, pp. 1447-1465, 2007. 
  5. Li, Yu. "Credit risk prediction based on machine learning methods," 2019 14th International Conference on Computer Science & Education (ICCSE). IEEE, 2019. 
  6. Li, Hua, et al. "XGBoost model and its application to personal credit evaluation," IEEE Intelligent Systems, vol. 35, no. 3, pp. 52-61, 2020.  https://doi.org/10.1109/MIS.2020.2972533
  7. Misra, Siddharth, Hao Li, and J. He. "Noninvasive fracture characterization based on the classification of sonic wave travel times," Machine learning for subsurface characterization, pp. 243-287, 2020. 
  8. L Breiman. "Random Forests," Springer, vol. 45, pp.5-32, 2001. 
  9. Kotu, Vijay, and Bala Despande. "Chapter 2-data science process," Data science, 2nd edn, Morgan Kaufmann (2019):, pp.19-37. 
  10. Deshpande, V.K., and V.Kotu. "Predictive Analytics and Data Mining,"Elsevier Inc, 2015. 
  11. Vaswani, Ashish, et al. "Attention is all you need," Advances in Neural Information Processing Systems 30 , 2017. 
  12. Zhu, Yitan, et al. "Converting tabular data into images for deep learning with convolutional neural networks," Scientific reports, no. 11325, 2021. 
  13. Huang, Xin, et al. "Tabtransformer: Tabular data modeling using contextual embeddings," arXiv preprint arXiv:2012.06678 (2020). 
  14. Gorishniy, Yury, et al. "Revisiting deep learning models for tabular data," Advances in Neural Information Processing Systems 34 (2021): 18932-18943 
  15. Erickson, Nick, et al. "Multimodal automl for image, text and tabular data," Proceeding of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 4786-4787, 2022. 
  16. Huang, Allen H., Hui Wang, and Yi Yang. "FinBERT: A large language model for extracting information from financial text," Contemporary Accounting Research, vol. 40, no. 2, 2023. 
  17. Ju, Cheng, Aurelien Bibaut, and Mark van der Laan. "The relative performance of ensemble methods with deep convolutional neural networks for image classification," Journal of Applied Statistics, vol. 45, no. 15 2018. 
  18. Devlin, Jacon, et.al "BERT: Pretraining of Deep Bidirectional Transformers for Language Understandin," arXiv preprint arXiv:1810.04805 (2018). 
  19. Sun, Chi, et al. "How to fine-tune bert for text classification?," Chinese Computational Linguistics: 18th China National Conference, CCL 2019, Kunming, China, October 18-20, 2019, Proceedings 18. Springer International Publishing , 2019. 
  20. Howard, Jeremy, and Sebastian Ruder. "Universal language model fine-tuning for text classification," arXiv preprint arXiv:1801.06146 (2018). 
  21. Kotsiantis, Sotiris, Dimitris Kanellopoulos, and Panayiotis Pintelas. "Handling imbalanced datasets:A review," GEST international transactions on computer science and engineering, vol. 30, no. 1 pp. 25-36, 2006.