Browse > Article
http://dx.doi.org/10.3837/tiis.2020.12.005

The Impact of Transforming Unstructured Data into Structured Data on a Churn Prediction Model for Loan Customers  

Jung, Hoon (Hana Institute of Finance)
Lee, Bong Gyou (Yonsei University)
Publication Information
KSII Transactions on Internet and Information Systems (TIIS) / v.14, no.12, 2020 , pp. 4706-4724 More about this Journal
Abstract
With various structured data, such as the company size, loan balance, and savings accounts, the voice of customer (VOC), which is text data containing contact history and counseling details was analyzed in this study. To analyze unstructured data, the term frequency-inverse document frequency (TF-IDF) analysis, semantic network analysis, sentiment analysis, and a convolutional neural network (CNN) were implemented. A performance comparison of the models revealed that the predictive model using the CNN provided the best performance with regard to predictive power, followed by the model using the TF-IDF, and then the model using semantic network analysis. In particular, a character-level CNN and a word-level CNN were developed separately, and the character-level CNN exhibited better performance, according to an analysis for the Korean language. Moreover, a systematic selection model for optimal text mining techniques was proposed, suggesting which analytical technique is appropriate for analyzing text data depending on the context. This study also provides evidence that the results of previous studies, indicating that individual customers leave when their loyalty and switching cost are low, are also applicable to corporate customers and suggests that VOC data indicating customers' needs are very effective for predicting their behavior.
Keywords
Churn Prediction Model; Text Mining; Unstructured Data; Voice of Customer; Convolutional Neural Network;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 V. Kayser and K. Blind, "Extending the knowledge base of foresight: The contribution of text mining," Technological Forecasting and Social Change, vol. 116, pp. 208-215, Mar. 2017.   DOI
2 T. K. Das and P. M. Kumar, "Big Data Analytics: A Framework for Unstructured data Analysis," International Journal of Engineering and Technology, vol. 5, no. 1, pp. 153-156, Mar. 2013.   DOI
3 Q. He, C. A. W. Glas, M. Kosinski, D. J. Stillwell, and P. B. Veldkamp, "Predicting self-monitoring skills using textual posts on Facebook," Computers in Human Behavior, vol. 33, pp. 69-78, Apr. 2014.   DOI
4 A. Aizawa, "An information-theoretic perspective of TF-IDF measures," Information Processing and Management, vol. 39, no. 1, pp. 45-65, Jan. 2003.   DOI
5 T. Xia and Y. Chai, "An Improvement to TF-IDF: Term Distribution based Term Weight Algorithm," Journal of Software, vol. 6, no. 3, pp. 413-420, Mar. 2011.
6 J. L. Myers and E. J. O'Brien, "Accessing the discourse representation during reading," Discourse Processes, vol. 26, no 2, pp. 131-157, Nov. 2009.   DOI
7 C. Hanig, M. Schierle, and D.Trabold, "Comparison of Structured vs. Unstructured Data for Industrial Quality Analysis," in Proc. of the World Congress on Engineering and Computer Science, vol. 103, pp. 257-270, July 2011.
8 L. Lopes, P. Fernandes, and R. Vieira, "Estimating term domain relevance through term frequency, disjoint corpora frequency-TF-DCF," Knowledge-Based Systems, vol. 97, pp. 237-249, Apr. 2016.   DOI
9 K. S. Jones, "A statistical interpretation of term specificity and its application in retrieval," Journal of Documentation, vol. 60, no. 5, pp. 493-502, Oct. 2004.   DOI
10 O. B. Petrina, V. Volokhova, S. E. Yalovitsyna, A. G. Varfolomeyev, and D. G. Korzun, "On Semantic Network Design for a Smart Museum of Everyday Life History," in Proc. of the 20th Conference of FRUCT Association, vol. 776, no. 20, pp. 676-680, 2017.
11 F. Figueiredo, L. Rocha, T. Couto, T. Salles, M. A. Goncalves, and M. Jr. Wagner, "Word co-occurrence features for text classification," Information Systems, vol. 36, no. 5, pp. 843-858, July 2011.   DOI
12 K. Ravi and V. Ravi, "A survey on opinion mining and sentiment analysis: tasks, approaches and applications," Knowledge-Based Systems, vol. 89, pp. 14-46, Nov. 2015.   DOI
13 Y. Hongliang, Z. H. Deng, and S. Li, "Identifying Sentiment Words Using an Optimization-based Model without Seed Words," in Proc. of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 855-859, Aug. 2013.
14 G. Salton and C. Buckley, "Term-Weighting Approaches in Automatic Text Retrieval," Information Processing and Management, vol. 24, no. 5, pp. 513-523, 1988.   DOI
15 F. Fatahillah, B. N. Saryanto, and E. Rimawan, "Chartering Services Development with the QFD Approach: Case Study on Liquid Freight Shipping Companies," International Journal of Innovative Science and Research Technology, vol. 4, pp. 457-464, 2019.
16 W. Ji, R. Chen, F. Li, and Q. Ling, "Log Prediction of Wireless Telecommunication Systems Based on a Sequence-To-Sequence Model," Journal of Advances in Mathematics and Computer Science, vol. 24, no. 1, pp. 1-8, Aug. 2017.
17 L. Dey, H. Meisheri, and I. Verma, "Predictive Analytics with Structured and Unstructured Data - A Deep Learning Based Approach," IEEE Intelligent Informatics Bulletin, vol. 18, no. 2, pp. 27-34, 2017.
18 C. C. Aguwa, L. Monplaisir, and O. Turgut, "Voice of the customer: Customer satisfaction ratio based analysis," Expert Systems with Applications, vol. 39, no. 11, pp. 10112-10119, Sep. 2012.   DOI
19 A. S. Khangura and S. K. Gandhi, "Design and Development of the Refrigerator with Quality Function Deployment Concept," International Journal on Emerging Technologies, vol. 3, no. 1, pp. 173-177, Apr. 2012.
20 P. Li, Y. Yan, C. Wang, Z. Ren, P. Cong, H. Wang, and J. Feng, "Customer Voice Sensor: A Comprehensive Opinion Mining System for Call Center Conversation," in Proc. of IEEE International Conference on Cloud Computing and Big Data Analysis, pp. 324-329, 2016.
21 Y. Kim, "Convolutional Neural Networks for Sentence Classification," in Proc. of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1746-1751, 2014.
22 R. A. Stine, "Sentiment Analysis," Annual Review of Statistics and its Application, vol. 6, pp. 287-308, Mar. 2019.   DOI
23 Y. Bar, I. Diamant, L. Wolf, S. Lieberman, E. Konen, and H. Greenspan, "Chest Pathology Detection Using Deep Learning With Non-Medical Training," in Proc. of IEEE 12th International Symposium on Biomedical Imaging, pp. 294-297, July 2015.
24 Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," in Proc. of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
25 C. Li, L. Ji, and J. Yan, "Acronym Disambiguation Using Word Embedding," in Proc. of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 4178-4179, 2015.
26 Y. LeCun, Y. Bengio, and G. Hinton, "Deep Learning," Nature, vol. 521, no. 7553, pp. 436-444, May 2015.   DOI
27 L. Fan, "Revisit Fuzzy Neural Network: Demystifying Batch Normalization and ReLU with Generalized Hamming Network," in Proc. of the 31st International Conference on Neural Information Processing Systems, Oct. 2017.
28 L. M. Ellram and W. L. Tate, "The use of secondary data in purchasing and supply management (P/SM) research," Journal of Purchasing and Supply Management, vol. 22, no. 4, pp. 250-254, Dec. 2016.   DOI
29 M. Gartner, A. Rauber, and H. Berger, "Bridging structured and unstructured data via hybrid semantic search and interactive ontology-enhanced query formulation," Knowledge and Information Systems, vol. 41, pp. 761-792, 2014.   DOI
30 C. Eaton, D. Deroos, T. Deutsch, G. Lapis, and P. Zikopoulos, Understanding Big Data, New York, USA: McGraw-Hill, 2012.
31 H. Bharadhwaj and S. Joshi, "Explanations for Temporal Recommendations," Kunstliche Intelligenz, vol. 32, pp. 267-272, 2018.   DOI
32 R. Mu and X. Zeng, "A Review of Deep Learning Research," KSII Transactions on Internet and Information Systems, vol. 13, no. 4, pp. 1738-1764, 2019.   DOI
33 P. Vijayaraghavan, I. Sysoev, S. Vosoughi, and D. Roy, "Detecting Stance in Tweets Using Character and Word-Level CNNs," in Proc. of the 10th International Workshop on Semantic Evaluation, 2016.
34 Y. Kim, Y. Jernite, D. Sontag, and A. M. Rush, "Character-Aware Neural Language Models," Association for the Advancement of Artificial Intelligence, pp. 2741-2749, 2015.
35 S. Rotchanakitumnuai and M. Speece, "Barriers to Internet banking adoption: a qualitative study among corporate customers in Thailand," International Journal of Bank Marketing, vol. 21, no. 6, pp. 312-323, 2003.   DOI
36 P. Thammasorn, L. W. A. Chaovlitwongse, L. Wootton, E. Ford, and M. Nyflot, "Deep convolutional Triplet network for quantitative medical image analysis with comparative case study of gamma image classification," in Proc. of 2017 IEEE International Conference on Bioinformatics and Biomedicine, 2017.
37 X. Zhang, Z. Junbo, and Y. LeCun, "Character-level Convolutional Networks for Text Classification," in Proc. of Advances in Neural Information Processing Systems, 2015.
38 J. Grimmer and B. M. Stewart, "Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts," Political Analysis, vol. 21, no. 3, pp. 267-297, 2013.   DOI
39 M. A. Hearst, "Untangling Text Data Mining," in Proc. of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, pp. 3-10, June 1999.
40 H. Hussein, A. Hafez, and H. Mathkour, "Selection criteria for text mining approaches," Computers in Human Behavior, vol. 51, pp. 729-733, Oct. 2015.   DOI
41 J. Ansell, T. Harrison, and T. Archibald, "Identifying cross-selling opportunities, using lifestyle segmentation and survival analysis," Marketing Intelligence and Planning, vol. 25, no. 4, pp. 394-410, June 2007.   DOI
42 O. Malms and C. Schmitz, "Cross-Divisional Orientation: Antecedents and Effects on Cross-Selling Success," Journal of Business-to-Business Marketing, vol. 18, no. 3, pp. 253-275, Aug. 2011.   DOI
43 M. A. Jones, D. L. Mothersbaugh, and S. E. Beatty, "Why customers stay: measuring the underlying dimensions of services switching costs and managing their differential strategic outcomes," Journal of Business Research, vol. 55, no. 6, pp. 441-450, June 2002.   DOI
44 S. N. Danesh, S. A. Nasab, and K. C. Ling, "The Study of Customer Satisfaction, Customer Trust and Switching Barriers on Customer Retention in Malaysia Hypermarkets," International Journal of Business and Management, vol. 7, no. 7, pp. 141-150, Apr. 2012.
45 S. K. Saha, A. Aman, M. S. Hossain, A. Islam, and R. S. Rodela, "A Comparative Study On B2B Vs. B2C Based On Asia Pacific Region," International Journal of Scientific and Technology Research, vol. 3, no 9, pp. 294-298, 2014.
46 M. Subramani and E. Walden, "Economic Returns to Firms from Business-to Business Electronic Commerce Initiatives: An Empirical Examination, Association for Information Systems," in Proc. of International Conference on Information Systems, pp. 229-241, 2000.
47 K. Coussement and D. Poel, "Integrating the voice of customers through call center emails into a decision support system for churn prediction," Information and Management, vol. 45, no. 3, pp. 164-174, 2008.   DOI
48 A. S. Dick and K. Basu, "Customer Loyalty: Toward an Integrated Conceptual Framework," Journal of the Academy of Marketing Science, vol. 22, no. 2, pp. 99-113, 1994.   DOI
49 N. Behravan and M. SabbirRahman, "Customer Relationship Management Constructs under Social Networks towards Customers' Retention," Australian Journal of Basic and Applied Sciences, vol. 6, pp. 271-282, 2012.
50 C. Bonanni, J. Dermine, and L. H. Roller, "Some evidence on customer 'lock-in' in the French mutual funds industry," Applied Economics Letters, vol. 5, no. 5, pp. 275-279, 1998.   DOI
51 B. D. Arie, "About the relationship between ROC curves and Cohen's kappa," Engineering Applications of Artificial Intelligence 21, vol. 21, no. 6, pp. 874-882, Sep. 2008.   DOI
52 H. S. Yuan, Y. L. Wei, and X. G. Wang, "Maxent modeling for predicting the potential distribution of Sanghuang, an important group of medicinal fungi in China," Fungal Ecology, vol. 17, pp. 140-145, 2015.   DOI
53 A. P. Bradley, "The Use of the area under the ROC curve in the evaluation of machine learning algorithms", Pattern Recognition, vol. 30, no. 7, pp. 1145-1159, July 1997.   DOI
54 J. V. Carter, J. Pan, S. N. Rai, and S. Galandiuk, "ROC-ing along: Evaluation and interpretation of receiver operating characteristic curves," Surgery, vol. 159, no. 6, pp. 1638-1645, Mar. 2016.   DOI
55 A. Cano, A. Zafra, and S. Ventura, "Weighted Data Gravitation Classification for Standard and Imbalanced Data," IEEE Transactions on Cybernetics, vol. 43, no. 6, pp. 1672-1687, Dec. 2013.   DOI
56 M. L. McHugh, "Interrater reliability: the kappa statistic," Biochemia Medica, vol. 22, no. 3, pp. 276-282, Oct. 2012.   DOI