A Study on Machine Learning-Based Modelling of Online Review Sentiment Analysis

머신러닝 기반 온라인 리뷰 감성 분석 모델링에 대한 연구

  • 김민수 (한성대학교 IT공과대학 컴퓨터공학부) ;
  • 김주희 (동덕여자대학교 문화지식융합대학)
  • Received : 2024.09.20
  • Accepted : 2024.10.21
  • Published : 2024.10.31

Abstract

Online reviews play a crucial role in assessing a company's market value and are a significant factor influencing profitability. As such, sentiment analysis of online reviews has emerged as a key indicator for predicting business success. This study focuses on restaurant reviews from Yelp, one of the leading online review platforms, utilizing the Yelp Open Dataset. Six machine learning algorithms were applied to predict the sentiment polarity of these reviews: Logistic Regression, Support Vector Machine (SVM), Random Forest, Gradient Boosting Machine (GBM), XGBoost, and LightGBM. Performance evaluations demonstrated that Logistic Regression, SVM, and LightGBM achieved the highest accuracy, with a score of 0.91. The primary contribution of this study is its ability to transform unstructured review text into quantifiable data, enabling businesses, especially startups, to effectively analyze customer feedback and predict ratings. These insights are expected to assist business owners in forecasting consumer behavior and developing strategic marketing approaches.

온라인 리뷰는 시장 내에서의 기업의 가치를 평가하는 데 있어 중요한 역할을 하며, 기업의 수익에 큰 영향을 미치는 요인 중 하나이다. 따라서 온라인 리뷰의 감성 분석 지표는 사업의 성공을 예측할 수 있는 중요한 지표 중 하나이다. 본 연구에서는 대표적인 온라인 리뷰 플랫폼 중의 하나인 Yelp 플랫폼에 있는 레스토랑 리뷰 텍스트를 연구대상으로 선정하였고, Yelp Open Dataset에서 제공하는 리뷰 데이터 세트를 활용하였다. 본 연구에서는 레스토랑 리뷰의 Polarity Prediction을 위해 Logistic Regression, SVM, Random Forest, Gradient Boosting Machine(GBM), XGBoost, LightGBM 총 6가지 머신러닝 알고리즘을 사용하여 연구를 진행하였다. 각 모델의 성능평가 결과, Logistic Regression, SVM, LightGBM 알고리즘이 0.91로 가장 정확도가 높게 나타났다. 본 연구는 비정형화된 형태로 작성된 텍스트의 리뷰 데이터를 정량화하여 평점으로 예측할 수 있도록 하여 스타트업을 포함한 기업이 고객 피드백을 효과적으로 분석할 수 있도록 한다는 점에서 공헌점이 있다, 나아가 비즈니스 운영자들이 소비자 행동을 예측하고, 마케팅 전략 수립에 활용할 수 있는 유용한 인사이트를 제공할 수 있을 것으로 기대된다.

Keywords

References

  1. 나희경.이희우.(2016). 린 스타트업 방법론의 적용: 한국'카닥'사례를 중심으로: 한국 '카닥' 사례를 중심으로. 벤처창업연구, 11(5), 29-43.
  2. 정화영.양영석.(2007). 창업기업의 비즈니스 모델 타당성 평가방안의 이론적 고찰. 벤처창업연구, 2(2), 1-22.
  3. Alslaity, A., & Orji, R.(2024). Machine learning techniques for emotion detection and sentiment analysis: current state, challenges, and future directions. Behaviour & Information Technology, 43(1), 139-164.
  4. Blank, S., & Eckhardt, J. T.(2023). The lean startup as an actionable theory of entrepreneurship. Journal of Management, 1-15.
  5. Bortolini, R. F., Nogueira Cortimiglia, M., Danilevicz, A. D. M. F., & Ghezzi, A.(2021). Lean Startup: a comprehensive historical review. Management Decision, 59(8), 1765-1783.
  6. Choi, H. S., & Leon, S.(2020). An empirical investigation of online review helpfulness: A big data perspective. Decision Support Systems, 139, 1-12.
  7. Cialdini, R. B., & Goldstein, N.(2009). Normative influences on consumption and conservation behaviors. Social Psychology of Consumer Behavior, 273-296.
  8. Deutsch, M., & Gerard, H. B.(1955). A study of normative and informational social influences upon individual judgment. The Journal of Abnormal and Social Psychology, 51(3), 629-636.
  9. Frederiksen, D. L., & Brem, A.(2017). How do entrepreneurs think they create value? A scientific reflection of Eric Ries' Lean Startup approach. International Entrepreneurship and Management Journal, 13, 169-189.
  10. Frederick, S., Loewenstein, G., & O'donoghue, T.(2002). Time discounting and time preference: A critical review. Journal of Economic Literature, 40(2), 351-401.
  11. Hemalatha, S., & Ramathmika, R.(2019). Sentiment analysis of yelp reviews by machine learning. In 2019 International Conference on Intelligent Computing and Control Systems (ICCS) (pp. 700-704). IEEE.
  12. Hong, H., Xu, D., Wang, G. A., & Fan, W.(2017). Understanding the determinants of online review helpfulness: A meta-analytic investigation. Decision Support Systems, 102, 1-11.
  13. Jahoda, M.(1959). Conformity and independence: A psychological analysis. Human Relations, 12(2), 99-120.
  14. Jemai, F., Hayouni, M., & Baccar, S. (2021, June). Sentiment analysis using machine learning algorithms. In 2021 International Wireless Communications and Mobile Computing (IWCMC) (pp. 775-779). IEEE.
  15. Latane, B.(1981). The psychology of social impact. American psychologist, 36(4), 343.
  16. Lopez-Lopez, I., & Parra, J. F.(2016). Is a most helpful eWOM review really helpful? The impact of conflicting aggregate valence and consumer's goals on product attitude. Internet Research, 26(4), 827-844.
  17. Majumder, M. G., Gupta, S. D., & Paul, J.(2022). Perceived usefulness of online customer reviews: A review mining approach using machine learning & exploratory data analysis. Journal of Business Research, 150, 147-164.
  18. Mariani, M. M., Borghi, M., & Laker, B.(2023). Do submission devices influence online review ratings differently across different types of platforms? A big data analysis. Technological Forecasting and Social Change, 189, 1-12.
  19. Nelson, P.(1970). Information and consumer behavior. Journal of Political Economy, 78(2), 311-329.
  20. Raju, P. S., Lonial, S. C., & Mangold, W. G.(1995). Differential effects of subjective knowledge, objective knowledge, and usage experience on decision making: An exploratory investigation. Journal of Consumer Psychology, 4(2), 153-180.
  21. Ries, E.(2011). The lean startup: How today's entrepreneurs use continuous innovation to create radically successful businesses. Crown Currency.
  22. Severin, W.(1967). Another look at cue summation. AV Communication Review, 15(3), 233-245.
  23. Shepherd, D. A., & Gruber, M.(2021). The lean startup framework: Closing the academic-practitioner divide. Entrepreneurship Theory and Practice, 45(5), 967-998.
  24. Shukla, A., & Mishra, A.(2023). Role of review length, review valence and review credibility on consumer's online hotel booking intention. FIIB Business Review, 12(4), 403-414.
  25. Singla, Z., Randhawa, S., & Jain, S.(2017). Sentiment analysis of customer product reviews using machine learning. In 2017 international conference on intelligent computing and control (I2C2) (pp. 1-5). IEEE.
  26. Stigler, G. J.(1961). The economics of information. Journal of Political Economy, 69(3), 213-225.
  27. Taherdoost, H., & Madanchian, M.(2023). Artificial intelligence and sentiment analysis: A review in competitive research. Computers, 12(2), 37-52.
  28. Thakur, R.(2018). Customer engagement and online reviews. Journal of Retailing and Consumer Services, 41, 48-59.
  29. Tripathy, A., & Rath, S. K.(2017). Classification of sentiment of reviews using supervised machine learning techniques. International Journal of Rough Sets and Data Analysis, 4(1), 56-74.
  30. Veena, G., Vinayak, A., & Nair, A. J.(2021). Sentiment Analysis using Improved Vader and Dependency Parsing. 2021 2nd Global Conference for Advancement in Technology (GCAT). IEEE.
  31. Wang, F., Du, Z., & Wang, S.(2023). Information multidimensionality in online customer reviews. Journal of Business Research, 159, 1-15.
  32. Xu, Y., Wu, X., & Wang, Q.(2015). Sentiment analysis of yelp's ratings based on text reviews. 2015 17th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), 17(1).