DOI QR코드

DOI QR Code

A Fusion Method of Co-training and Label Propagation for Prediction of Bank Telemarketing

은행 텔레마케팅 예측을 위한 레이블 전파와 협동 학습의 결합 방법

  • 김아름 (연세대학교 컴퓨터과학과) ;
  • 조성배 (연세대학교 컴퓨터과학과)
  • Received : 2017.02.07
  • Accepted : 2017.04.28
  • Published : 2017.07.15

Abstract

Telemarketing has become the center of marketing action of the industry in the information society. Recently, machine learning has emerged in many areas, especially, financial prediction. Financial data consists of lots of unlabeled data in most parts, and therefore, it is difficult for humans to perform their labeling. In this paper, we propose a fusion method of semi-supervised learning for automatic labeling of unlabeled data to predict telemarketing. Specifically, we integrate labeling results of label propagation and co-training with a decision tree. The data with lower reliabilities are removed, and the data are extracted that have consistent label from two labeling methods. After adding them to the training set, a decision tree is learned with all of them. To confirm the usefulness of the proposed method, we conduct the experiments with a real telemarketing dataset in a Portugal bank. Accuracy of the proposed method is 83.39%, which is 1.82% higher than that of the conventional method, and precision of the proposed method is 19.37%, which is 2.67% higher than that of the conventional method. As a result, we have shown that the proposed method has a better performance as assessed by the t-test.

텔레마케팅은 지식정보화 사회가 되면서 기업 마케팅 활동의 중심축으로 발전하였다. 최근 금융 데이터에 기계학습을 적용하는 연구가 활발하게 진행되고 있으며 좋은 성과를 내고 있다. 하지만 지도학습법이 대부분이어서 많은 양의 클래스가 있는 데이터가 필요하다. 본 논문에서는 텔레마케팅의 목표 고객을 선정하는데 클래스가 없는 금융 데이터에 자동으로 클래스를 부여하는 방법을 제안한다. 준지도 학습법 중 레이블 전파와 의사결정나무 기반의 협동 학습으로 클래스가 없는 데이터를 레이블링한다. 신뢰도가 낮은 데이터를 제거한 후 두 방법이 같은 클래스로 예측한 데이터만 추출한다. 이를 학습 데이터에 추가한 후 의사결정나무를 학습하여 테스트 데이터로 평가한다. 제안하는 방법의 유용성을 입증하기 위해 실제 포르투갈 은행의 텔레마케팅 데이터를 이용하여 실험을 수행하였다. 비교 실험 결과, 정확도가 83.39%로 1.82% 향상되고, 정밀도가 19.37%로 2.67% 향상되었으며, t-검증을 통해 유의미한 성능 향상이 있음을 입증하였다.

Keywords

Acknowledgement

Supported by : 정보통신기술진흥센터

References

  1. J.-W. Kim, "A Study of the Relationship Between the Outbound Call Center Service Quality and Service Recovery Customer Royalty," Journal of Digital Convergence, Vol. 13, No. 1, pp. 163-176, Jan. 2015. (in Korean) https://doi.org/10.14400/JDC.2015.13.1.163
  2. P. N. Druzhkov and V. D. Kustikova, "A Survey of Deep Learning Methods and Software Tools for Image Classification and Object Detection," Pattern Recognition and Image Analysis, Vol. 26, No. 1, pp. 9-15, Jan. 2016. https://doi.org/10.1134/S1054661816010065
  3. X. Zhu and Z. Ghahramani. Learning from Labeled and Unlabeled Data with Label Propagation, Technical Report CMU-CALD-02-107, pp. 1-17. Carnegie Mellon University, Pittsburgh, 2002.
  4. A. Blum and T. Mitchell, "Combining Labeled and Unlabeled Data with Co-training," Proc. of the 11th Conf. on Computational Learning Theory, pp. 92-100, 1998.
  5. L. V. D. Maaten and G. Hinton, "Visualizing Data using t-SNE," Journal of Machine Learning Research, Vol. 9, pp. 2579-2605, Nov. 2008.
  6. W.-Y. Lin, Y.-H. Hu and C.-F. Tsai, "Machine Learning in Financial Crisis Prediction: A Survey," IEEE Trans. on Systems, Man, and Cybernetics, Part C, Vol. 42, No. 4, pp. 421-436, Nov. 2011. https://doi.org/10.1109/TSMCC.2011.2170420
  7. X.-Y. Lu, X.-Q. Chu, M.-H. Chen, P.-C. Chang and S.-H. Chen, "Artificial Immune Network with Feature Selection for Bank Term Deposit Recommendation," Journal of Intelligent Information Systems, Vol. 47, pp. 267-285, Oct. 2016. https://doi.org/10.1007/s10844-016-0399-2
  8. M. Scherer, J. Smolang and A. Gaweda, "Predicting Success of Bank Direct Marketing by Neuro-fuzzy Systems," Proc. of the 15th Int. Conf. on Artificial Intelligence and Soft Computing, pp. 570-576, 2016.
  9. K.-H. Kim, C.-S. Lee, S.-M. Jo and S.-B. Cho, "Predicting the Success of Bank Telemarketing using Deep Convolutional Neural Network," Proc. of the 7th Int. Conf. of Soft Computing and Pattern Recognition, pp. 314-317, 2016.
  10. T. Harris, "Credit Scoring using the Clustered Support Vector Machine," Expert Systems and Applications, Vol. 42, pp. 741-750, Feb. 2015. https://doi.org/10.1016/j.eswa.2014.08.029
  11. M. Malekipirbazari and V. Aksakalli, "Risk Assessment in Social Lending via Random Forests," Expert Systems and Applications, Vol. 42, No. 10, pp. 4621-4631, Jun. 2015. https://doi.org/10.1016/j.eswa.2015.02.001
  12. P. K. Mallapragada, R. Jin, A. K. Jain and Y. Liu, "SemiBoost: Boosting for Semi-supervised Learning," IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 31, No. 11, pp. 2000-2014, Sep. 2009. https://doi.org/10.1109/TPAMI.2008.235
  13. S. Moro, P. Cortez and P. Rita, "A Data-driven Approach to Predict the Success of Bank Telemarketing," Decision Support Systems, Vol. 62, pp. 22-31, Jun. 2014. https://doi.org/10.1016/j.dss.2014.03.001