DOI QR코드

DOI QR Code

Prediction of Customer Satisfaction Using RFE-SHAP Feature Selection Method

RFE-SHAP을 활용한 온라인 리뷰를 통한 고객 만족도 예측

  • Olga Chernyaeva (College of Business Administration, Pusan National University) ;
  • Taeho Hong (College of Business Administration, Pusan National University)
  • Received : 2023.11.13
  • Accepted : 2023.12.21
  • Published : 2023.12.31

Abstract

In the rapidly evolving domain of e-commerce, our study presents a cohesive approach to enhance customer satisfaction prediction from online reviews, aligning methodological innovation with practical insights. We integrate the RFE-SHAP feature selection with LDA topic modeling to streamline predictive analytics in e-commerce. This integration facilitates the identification of key features-specifically, narrowing down from an initial set of 28 to an optimal subset of 14 features for the Random Forest algorithm. Our approach strategically mitigates the common issue of overfitting in models with an excess of features, leading to an improved accuracy rate of 84% in our Random Forest model. Central to our analysis is the understanding that certain aspects in review content, such as quality, fit, and durability, play a pivotal role in influencing customer satisfaction, especially in the clothing sector. We delve into explaining how each of these selected features impacts customer satisfaction, providing a comprehensive view of the elements most appreciated by customers. Our research makes significant contributions in two key areas. First, it enhances predictive modeling within the realm of e-commerce analytics by introducing a streamlined, feature-centric approach. This refinement in methodology not only bolsters the accuracy of customer satisfaction predictions but also sets a new standard for handling feature selection in predictive models. Second, the study provides actionable insights for e-commerce platforms, especially those in the clothing sector. By highlighting which aspects of customer reviews-like quality, fit, and durability-most influence satisfaction, we offer a strategic direction for businesses to tailor their products and services.

본 연구는 온라인 리뷰를 이용하여 고객 만족도를 예측하는 새로운 접근 방식을 제안한다. LDA 주제 모델링과 결합된 RFE-SHAP 기능 선택 방법을 활용하여 고객 만족도에 큰 영향을 미치는 주요 기능을 식별하여 예측 분석을 개선했다. 먼저 Random Forest 알고리즘의 경우, 초기 28개 입력변수에서 14개의 변수를 최적 하위 집합으로 추출했다. 제안된 방법에서 Random Forest 모델의 성과는 84%로 확인 되었으며 변수가 많은 모델에서 흔히 발생하는 과적합을 방지하였다. 또한 품질, 착용감, 내구성 등과 같은 리뷰의 특정 요소들이 패션 산업 내에서 소비자 만족도를 증진시키는 중요한 역할을 한다는 사실을 밝혀냈다. 본 연구는 예측 결과를 설명할 때 선택한 각 기능이 고객 만족도에 어떻게 영향을 미치는지에 대한 자세한 설명을 제공하고 고객이 가장 중요하게 생각하는 측면에 대한 세부적인 보기를 제공한다. 본 연구의 공헌도는 다음과 같다. 첫째, 전자상거래 분석 분야 내에서 예측 모델링을 강화하고 특성 중심적인 접근법을 소개함으로써 방법론을 개선하였다. 이는 고객 만족도 예측의 정확도를 높일 뿐만 아니라 예측 모델에서의 변수 선택에 대한 새로운 접근을 제시한다. 둘째, 특히 의류 부문에서 전자상거래 플랫폼에 구체적인 통찰력을 제공한다. 품질, 사이즈, 내구성 등 고객 리뷰의 어떤 부분이 만족도에 가장 큰 영향을 미치는지 강조함으로써, 기업들이 제품과 서비스를 맞춤화 할 수 있는 전략적 방향을 제시한다. 이러한 목표 지향적인 개선은 고객의 쇼핑 경험을 개선하고, 만족도를 향상시키면서 충성도를 이끌어낼 수 있을 것으로 기대한다.

Keywords

Acknowledgement

This work was supported by the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (No. IITP-2022-2020-0-01797) supervised by the IITP (Institute for Information & Communications Technology Planning & Evaluation). Additionally, this research was funded by an IITP grant as a part of the Convergence Security Core Talent Training Business at Pusan National University (No. 2022-0-01201).

References

  1. Aakash, A., & Gupta Aggarwal, A. (2022). Assessment of hotel performance and guest satisfaction through eWOM: big data for better insights. International Journal of Hospitality & Tourism Administration, 23(2), 317-346. https://doi.org/10.1080/15256480.2020.1746218
  2. Alzahrani, S., Wang, Q., & Rana, O. (2022). Latent Dirichlet Allocation for Customer Satisfaction Analysis in Online Reviews. Journal of E-Commerce Research, 16(2), 145-158.
  3. Ansari, G., Gupta, S., & Singhal, N. (2020). Natural Language Processing in Online Reviews. Journal of E-commerce and Digital Marketing, 8(1), 34-47.
  4. Bauer, J., & Jannach, D. (2021). Improved Customer Lifetime Value Prediction With Sequence-To-Sequence Learning and Feature-Based Models. Journal of E-commerce Research, 21(3), 45-60.
  5. Chen, J., Yuan, S., Lv, D., & Xiang, Y. (2021). A novel self-learning feature selection approach based on feature attributions. Expert Systems with Applications, 183, 115219.
  6. Chen, Y., & Xie, J. (2008). Online customer review: Word-of-mouth as a new element of marketing communication mix. Management science, 54(3), 477-491.
  7. Chernyaeva, O. ., & Hong, T. . (2022). The Detection of Online Manipulated Reviews Using Machine Learning and GPT-3. Journal of Intelligence and Information Systems, 28(4), 347-364.
  8. Cui, G., Lui, H. K., & Guo, X. (2012). The effect of online customer reviews on new product sales. International Journal of Electronic Commerce, 17(1), 39-58. https://doi.org/10.2753/JEC1086-4415170102
  9. Darko, A. P., & Liang, D. (2022). Modeling customer satisfaction through online reviews: A FlowSort group decision model under probabilistic linguistic settings. Expert Systems with Applications, 195, 116649.
  10. Darst, B. F., Malecki, K. C., & Engelman, C. D. (2018). Using recursive feature elimination in random forest to account for correlated variables in high dimensional data. BMC genetics, 19(1), 1-6.
  11. Ding, X., Yang, F., & Ma, F. (2022). An efficient model selection for linear discriminant function-based recursive feature elimination. Journal of Biomedical Informatics, 129, 104070.
  12. Du, C., & Huang, L. (2018). Text classification research with attention-based recurrent neural networks. International Journal of Computers Communications & Control, 13(1), 50-61.
  13. Engler, T. H., Winter, P., & Schulz, M. (2015). Understanding online product ratings: A customer satisfaction model. Journal of Retailing and Customer Services, 27, 113-120. https://doi.org/10.1016/j.jretconser.2015.07.010
  14. Guo, J., Wang, Z., Jin, Y., Li, M., & Chen, Q. (2023). Predicting and extracting thermal behavior rules of hydronic thermal barrier with interpretable ensemble learning in the heating season. Energy and Buildings, 113699.
  15. He, J., Hu, D., Zhang, W., & Liu, T. (2020). Probabilistic Topic Modeling for Sentiment Analysis of Online Reviews. Journal of Business Analytics, 7(3), 210-227.
  16. Herrera, G. P., Constantino, M., Su, J. J., & Naranpanawa, A. (2023). The use of ICTs and income distribution in Brazil: A machine learning explanation using SHAP values. Telecommunications Policy, 47(8), 102598.
  17. Hong, A. C. Y., Khaw, K. W., Chew, X., & Yeong, W. C. (2023). Prediction of US airline passenger satisfaction using machine learning algorithms. Data Analytics and Applied Mathematics (DAAM), 8-24.
  18. Jing, H., Yang, P., & Lin, H. (2023). A Multilayer Stacking Method Base on RFE-SHAP Feature Selection Strategy for Recognition of Driver's Mental Load and Emotional State. Expert Systems with Applications, 121729.
  19. Johar, S., & Mubeen, S. (2020). Sentiment analysis on large scale Amazon product reviews. IJSRCSE, 8(1), 7-15. https://doi.org/10.26438/ijsrcse/v8i1.715
  20. Kang, D., & Park, Y. (2014). based measurement of customer satisfaction in mobile service: Sentiment analysis and VIKOR approach. Expert Systems with Applications, 41(4), 1041-1050. https://doi.org/10.1016/j.eswa.2013.07.101
  21. Kannari, P. R., Chowdary, N. S., & Biradar, R. L. (2022). An anomaly-based intrusion detection system using recursive feature elimination technique for improved attack detection. Theoretical Computer Science, 931, 56-64. https://doi.org/10.1016/j.tcs.2022.07.030
  22. Karim, A., & Das, R. (2018). Rule-based vs. Machine Learning: A Comparative Study on Sentiment Analysis and LDA. International Journal of Data Science, 5(1), 56-68.
  23. Kumar, S., Yadava, M., & Roy, P. (2019). Fusion of EEG response and sentiment analysis of products review to predict customer satisfaction. Information Fusion, 47, 124-133.
  24. Lin, C. L., Lee, S. H., & Horng, D. J. (2011). The effects of online reviews on purchasing intention: The moderating role of need for cognition. Social Behavior and Personality: an international journal, 39(1), 71-81. https://doi.org/10.2224/sbp.2011.39.1.71
  25. Liu, B., Zhou, X., Jiang, P., & Zhang, L. (2020). Customer Satisfaction in B2C E-commerce: An LDA Approach. E-Commerce Research and Applications, 14(4), 301-315.
  26. Liu, M., Lu, X., & Song, J. (2016). A New Feature Selection Method for Text Categorization of Customer Reviews. E-commerce Research Letters, 10(1), 5-15.
  27. Maharani, A.P., & Triayudi, A. (2022). Sentiment Analysis of Indonesian Digital Payment Customer Satisfaction Towards GOPAY, DANA, and ShopeePay Using Naive Bayes and K-Nearest Neighbour Methods. Management and Informatics Business Journal, 6(1), 1-10.
  28. Matuszelanski, K., & Kopczewska, K. (2022). Customer Churn in Retail E-Commerce Business: Spatial and Machine Learning Approach. International Journal of E-commerce Studies, 15(2), 120-138.
  29. Mudambi, S. M., & Schuff, D. (2010). Research note: What makes a helpful online review? A study of customer reviews on Amazon. com. MIS quarterly, 185-200.
  30. Park, S., & Lee, S.-Y. T. (2023). A Study on the Relationship between Social Media ESG Sentiment and Firm Performance. Journal of Intelligence and Information Systems, 29(3), 317-340.
  31. Park, Y.-J., & Kim, K.-j. (2017). Impact of Semantic Characteristics on Perceived Helpfulness of Online Reviews. Journal of Intelligence and Information Systems, 23(3), 29-44.
  32. Pelegrina, G. D., Duarte, L. T., & Grabisch, M. (2023). A k-additive Choquet integral-based approach to approximate the SHAP values for local interpretability in machine learning. Artificial Intelligence, 325, 104014.
  33. Ren, Y., Wang, R., & Ji, D. (2016). A topic-enhanced word embedding for Twitter sentiment classification. Information Sciences, 369, 188-198. https://doi.org/10.1016/j.ins.2016.06.040
  34. Ren, Y., Wang, R., & Ji, D. (2016). A topic-enhanced word embedding for Twitter sentiment classification. Information Sciences, 369, 188-198. https://doi.org/10.1016/j.ins.2016.06.040
  35. Samb, M. L., Camara, F., Ndiaye, S., Slimani, Y., & Esseghir, M. A. (2012). A novel RFE-SVM-based feature selection approach for classification. International Journal of Advanced Science and Technology, 43(1), 27-36.
  36. Uthirapathy, S. E., & Sandanam, D. (2023). Topic Modelling and Opinion Analysis On Climate Change Twitter Data Using LDA And BERT Model. Procedia Computer Science, 218, 908-917. https://doi.org/10.1016/j.procs.2023.01.071
  37. Van den Broeck, G., Lykov, A., Schleich, M., & Suciu, D. (2022). On the tractability of SHAP explanations. Journal of Artificial Intelligence Research, 74, 851-886.
  38. Wisnu, H., Afif, M., & Ruldevyani, Y. (2020). Sentiment analysis on customer satisfaction of digital payment in Indonesia: A comparative study using KNN and Naive Bayes. Journal of Physics: Conference Series, 1444(1), 012034.
  39. Xu, H., Li, Z., Chu, C., Chen, Y., Yang, Y., Lu, H., Wang, H., & Stavrou, A. (2018). Detecting and Characterizing Web Bot Traffic in a Large E-commerce Marketplace. International Journal of E-commerce Research, 16(3), 201-218.
  40. Yu, D., Fang, A., & Xu, Z. (2023). Topic research in fuzzy domain: Based on LDA topic modelling. Information Sciences, 648, 119600.
  41. Zhang, J., Ma, X., Zhang, J., Sun, D., Zhou, X., Mi, C., & Wen, H. (2023). Insights into geospatial heterogeneity of landslide susceptibility based on the SHAP-XGBoost model. Journal of Environmental Management, 332, 117357.
  42. Zhang, M., & Luo, L. (2023). Can customer-posted photos serve as a leading indicator of restaurant survival? Evidence from Yelp. Management Science, 69(1), 25-50.