Browse > Article
http://dx.doi.org/10.7838/jsebs.2021.26.3.097

A Comparative Analysis of Ensemble Learning-Based Classification Models for Explainable Term Deposit Subscription Forecasting  

Shin, Zian (Department of Security Convergence Science, Chung-Ang University)
Moon, Jihoon (Chung-Ang University)
Rho, Seungmin (Department of Industrial Security, Chung-Ang University)
Publication Information
The Journal of Society for e-Business Studies / v.26, no.3, 2021 , pp. 97-117 More about this Journal
Abstract
Predicting term deposit subscriptions is one of representative financial marketing in banks, and banks can build a prediction model using various customer information. In order to improve the classification accuracy for term deposit subscriptions, many studies have been conducted based on machine learning techniques. However, even if these models can achieve satisfactory performance, utilizing them is not an easy task in the industry when their decision-making process is not adequately explained. To address this issue, this paper proposes an explainable scheme for term deposit subscription forecasting. For this, we first construct several classification models using decision tree-based ensemble learning methods, which yield excellent performance in tabular data, such as random forest, gradient boosting machine (GBM), extreme gradient boosting (XGB), and light gradient boosting machine (LightGBM). We then analyze their classification performance in depth through 10-fold cross-validation. After that, we provide the rationale for interpreting the influence of customer information and the decision-making process by applying Shapley additive explanation (SHAP), an explainable artificial intelligence technique, to the best classification model. To verify the practicality and validity of our scheme, experiments were conducted with the bank marketing dataset provided by Kaggle; we applied the SHAP to the GBM and LightGBM models, respectively, according to different dataset configurations and then performed their analysis and visualization for explainable term deposit subscriptions.
Keywords
Financial Marketing; Term Deposit Subscription Forecasting; Explainable Artificial Intelligence; Ensemble Learning; Bagging; Boosting;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T.-Y., "LightGBM: A highly efficient gradient boosting decision tree," in Proceedings of the 31st International Conference on Neural Information Processing Systems, Vol. 30, pp. 3146-3154, 2017.
2 Sun, J. C. and Kim, I. S., "Improvement of selective consent method in the collection process of personal information of financial institutions," The Journal of Society for e-Business Studies, Vol. 25, No. 1, pp. 123-134, 2020.
3 Kim, S., Kim, W., Jang, Y., and Kim, H., "Development of Explainable AI-Based Learning Support System," The Journal of Korean Association of Computer Education, Vol. 24, No. 1, pp. 107-115, 2021.   DOI
4 Landis, J. R. and Koch, G. G., "An Application of Hierarchical Kappa-type Statistics in the Assessment of Majority Agreement among Multiple Observers," Biometrics, pp. 363-374, 1977.
5 Lee, D. Y. and Hwang, B. S., "Performance comparison of algorithm for the prediction of time deposit," in Proceedings of the Korea Computer Congress, pp. 2074-2076, 2018.
6 Lee, Y.-G., Oh, J.-Y., and Kim, G., "Interpretation of load forecasting using explainable artificial intelligence techniques," The Transactions of the Korean Institute of Electrical Engineers, Vol. 69, No. 3, pp. 480-485, 2020.   DOI
7 Lim, M. and Jang, H., "A Study on the Risk Reduction Plan of Cryptocurrency Exchange," Journal of Platform Technology, Vol. 8, No. 4, pp. 29-37, 2020.   DOI
8 Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., Katz, R., Himmelfarb, J., Bansal, N., and Lee, S.-I., "From local explanations to global understanding with explainable AI for trees," Nature Machine Intelligence, Vol. 2, No. 1, pp. 56-67, 2020.   DOI
9 Moon, J., Jung, S., Rew, J., Rho, S., and Hwang, E., "Combination of short-term load forecasting models based on a stacking ensemble approach," Energy and Buildings, Vol. 216, p. 109921, 2020.   DOI
10 Hung, P. D., Hanh, T. D., and Tung, T. D., "Term deposit subscription prediction using spark MLlib and ML packages," in Proceedings of the 2019 5th International Conference on E-Business and Applications, pp. 88-93, 2019.
11 Natekin, A. and Knoll, A., "Gradient boosting machines, a tutorial," Frontiers in Neurorobotics, Vol. 7, p. 21, 2013.   DOI
12 Moon, J., Kim, J., Kang, P., and Hwang, E., "Solving the Cold-Start Problem in Short-Term Load Forecasting Using Tree-Based Methods," Energies, Vol. 13, No. 4, p. 886, 2020.   DOI
13 Moon, J., Kim, Y., Son, M., and Hwang, E., "Hybrid short-term load forecasting scheme using random forest and multilayer perceptron," Energies, Vol. 11, No. 12, p. 3283, 2018.   DOI
14 Moro, S., Cortez, P., and Rita, P., "A data-driven approach to predict the success of bank telemarketing," Decision Support Systems, Vol. 62, pp. 22-31, 2014.   DOI
15 Park, S. H., Lee, J. H., Jung, Y. W., and Won, Y. J., "Performance comparison of periodic deposit prediction using machine learning," Proceedings of the Korea Software Congress, pp. 2139-2141, 2018.
16 Oh, H. R., Son, A. L., and Lee, Z., "Occupational accident prediction modeling and analysis using SHAP," Journal of Digital Contents Society, Vol. 22, No. 7, pp. 1115-1123, 2021.   DOI
17 Oshiro, T. M., Perez, P. S., and Baranauskas, J. A., "How Many Trees in a Random Forest?," Proceedings of the International Workshop on Machine Learning and Data Mining in Pattern Recognition, pp. 154-168, 2012.
18 Park, J., Moon, J., Jung, S., and Hwang, E., "Multistep-ahead solar radiation forecasting scheme based on the light gradient boosting machine: A case study of Jeju Island," Remote Sensing, Vol. 12, No. 14, p. 2271, 2020.   DOI
19 Park, S., Moon, J., and Hwang, E., "Explainable anomaly detection for district heating based on shapley additive explanations," Proceedings of the 2020 International Conference on Data Mining Workshops (ICDMW), pp. 762-765, 2020.
20 Park, S., Moon, J., Jung, S., Rho, S., and Baik, S. W., Hwang, E., "A two-stage industrial load forecasting scheme for day-ahead combined cooling, heating and power scheduling," Energies, Vol. 13, No. 2, p. 443, 2020.   DOI
21 Park, W. and Jang, H., "A study on implementing a priority tasks for invigoration of cloud in financial sector," Journal of Platform Technology, Vol. 8, No. 1, pp. 10-15, 2020.   DOI
22 Parlar, T., "Using Data Mining Techniques for detecting the important features of the bank direct marketing data," International Journal of Economics and Financial Issues, Vol. 7, No. 2, p. 692, 2017.
23 Rew, J., Cho, Y., Moon, J., and Hwang, E., "Habitat suitability estimation using a two-stage ensemble approach," Remote Sensing, Vol. 12, No. 9, p. 1475, 2020.   DOI
24 Belgiu, M. and Dragut, L., "Random forest in remote sensing: A review of applications and future directions," ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 114, pp. 24-31, 2016.   DOI
25 Adadi, A. and Berrada, M., "Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)," IEEE Access, Vol. 6, pp. 52138-52160, 2018.   DOI
26 Ahmadi, A., Nabipour, M., MohammadiIvatloo, B., Amani, A. M., Rho, S., and Piran, M. J., "Long-Term Wind Power Forecasting Using Tree-Based Learning Algorithms," IEEE Access, Vol. 8, pp. 151511-151522, 2020.   DOI
27 Altman, N. and Krzywinski, M., "Ensemble methods: bagging and random forests," Nature Methods, Vol. 14, No. 10, pp. 933-935, 2017.   DOI
28 Rodriguez, J. D., Perez, A., and Lozano, J. A., "Sensitivity analysis of k-fold cross validation in prediction error estimation," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 32, No. 3, pp. 569-575, 2009.   DOI
29 Rew, J., Kim, H., and Hwang, E., "Hybrid segmentation scheme for skin feature extraction using dermoscopy images," Computers, Materials & Continua, Vol. 69, No. 1, pp. 801-817, 2021.   DOI
30 Ribeiro, M. H. D. M., and dos Santos Coelho, L., "Ensemble approach based on bagging, boosting and stacking for short-term prediction in agribusiness time series," Applied Soft Computing, Vol. 86, p. 105837, 2020.   DOI
31 Park, S., Moon, J., Jung, S., Jung, S., and Hwang, E., "SHAP-based Explainable Influenza Occurrence Forecasting using LightGBM," Proceedings of the Korea Software Congress, pp. 666-668, 2020.
32 Chun, Y. E., Kim, S. B., Lee, J. Y., and Woo, J. H., "Study on credit rating model using explainable AI," Journal of the Korean Data and Information Science Society, Vol. 32, No. 2, pp. 283-295, 2021.   DOI
33 Kwon, B. C., Choi, M.-J., Kim, J. T., Choi, E., Kim, Y. B., Kwon, S., Sun, J., and Choo, J., "RetainVis: Visual Analytics with Interpretable and Interactive Recurrent Neural Networks on Electronic Medical Records," IEEE Transactions on Visualization and Computer Graphics, Vol. 25, No. 1, pp. 299-309, 2018.   DOI
34 Chun, Y. E., Park, Y., Sung, N., and Choi, J., "Model analysis using estimation of shapley value on classification of sentences explaining causes of changes in stock prices," KIISE Transactions on Computing Practices, Vol. 26, No. 4, pp. 195-201, 2020.   DOI
35 Jung, C. and Lee, H., "A comparative study of explainable AI techniques for process analysis," Journal of the Institute of Electronics and Information Engineers, Vol. 57, No. 8, pp. 51-59, 2020.   DOI
36 Chen, T. and Guestrin, C., "XGBoost: A Scalable Tree Boosting System," in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785-794, 2016.
37 Mangalathu, S., Hwang, S. H., and Jeon, J. S., "Failure mode and effects analysis of RC members based on machine-learning-based SHapley Additive exPlanations (SHAP) approach," Engineering Structures, Vol. 219, p. 110927, 2020.   DOI