Browse > Article
http://dx.doi.org/10.13088/jiis.2022.28.4.347

The Detection of Online Manipulated Reviews Using Machine Learning and GPT-3  

Chernyaeva, Olga (College of Business Administration, Pusan National University)
Hong, Taeho (College of Business Administration, Pusan National University)
Publication Information
Journal of Intelligence and Information Systems / v.28, no.4, 2022 , pp. 347-364 More about this Journal
Abstract
Fraudulent companies or sellers strategically manipulate reviews to influence customers' purchase decisions; therefore, the reliability of reviews has become crucial for customer decision-making. Since customers increasingly rely on online reviews to search for more detailed information about products or services before purchasing, many researchers focus on detecting manipulated reviews. However, the main problem in detecting manipulated reviews is the difficulties with obtaining data with manipulated reviews to utilize machine learning techniques with sufficient data. Also, the number of manipulated reviews is insufficient compared with the number of non-manipulated reviews, so the class imbalance problem occurs. The class with fewer examples is under-represented and can hamper a model's accuracy, so machine learning methods suffer from the class imbalance problem and solving the class imbalance problem is important to build an accurate model for detecting manipulated reviews. Thus, we propose an OpenAI-based reviews generation model to solve the manipulated reviews imbalance problem, thereby enhancing the accuracy of manipulated reviews detection. In this research, we applied the novel autoregressive language model - GPT-3 to generate reviews based on manipulated reviews. Moreover, we found that applying GPT-3 model for oversampling manipulated reviews can recover a satisfactory portion of performance losses and shows better performance in classification (logit, decision tree, neural networks) than traditional oversampling models such as random oversampling and SMOTE.
Keywords
Text Mining; Online Reviews; Manipulated Reviews Detection; Text Generation; Class Imbalance Problem;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Gossling, S., Hall, C. M., & Andersson, A. C. (2018). The manager's dilemma: a conceptualization of online review manipulation strategies. Current Issues in Tourism, 21(5), 484-503.   DOI
2 Hu, N., Liu, L., & Sambamurthy, V. (2011). Fraud detection in online consumer reviews. Decision Support Systems, 50(3), 614-626.   DOI
3 Khurshid, F., Zhu, Y., Xu, Z., Ahmad, M., & Ahmad, M. (2019). Enactment of ensemble learning for review spam detection on selected features. International Journal of Computational Intelligence Systems, 12(1), 387-394.   DOI
4 Shmueli, G., Patel, N. R., & Bruce, P. C. (2011). Data mining for business intelligence: Concepts, techniques, and applications in Microsoft Office Excel with XLMiner. John Wiley and Sons.
5 Kumar, A., Gopal, R. D., Shankar, R., & Tan, K. H. (2022). Fraudulent review detection model focusing on emotional expressions and explicit aspects: investigating the potential of feature engineering. Decision Support Systems, 155, 113728.   DOI
6 Li, H., Li, J., Chang, P. C., & Sun, J. (2013). Parametric prediction on default risk of Chinese listed tourism companies by using random oversampling, isomap, and locally linear embeddings on imbalanced samples. International Journal of Hospitality Management, 35, 141-151.   DOI
7 Li, L., Qin, B., Ren, W., & Liu, T. (2017). Document representation and feature combination for deceptive spam review detection. Neurocomputing, 254, 33-41.   DOI
8 Li, X., Yun, H., Li, Q., & Kim, J. (2022). A multi-channel CNN based online review helpfulness prediction model. Journal of Intelligence and Information Systems, 28(2), 171-189.   DOI
9 Liang, Y., & Zhu, K. (2018, April). Automatic generation of text descriptive comments for code blocks. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 32, No. 1).
10 Liu, Y., Pang, B., & Wang, X. (2019). Opinion spam detection by incorporating multimodal embedded representation into a probabilistic review graph. Neurocomputing, 366, 276-283.   DOI
11 Lopez, V., Fernandez, A., Garcia, S., Palade, V., & Herrera, F. (2013). An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Information sciences, 250, 113-141.   DOI
12 Luca, M. (2016). Reviews, reputation, and revenue: The case of Yelp. com. Com (March 15, 2016). Harvard Business School NOM Unit Working Paper, (12-016).
13 Rajamohana, S. P., & Umamaheswari, K. (2018). Hybrid approach of improved binary particle swarm optimization and shuffled frog leaping for feature selection. Computers & Electrical Engineering, 67, 497-508.   DOI
14 Kotsiantis, S., Kanellopoulos, D., & Pintelas, P. (2006). Handling imbalanced datasets: A review. GESTS international transactions on computer science and engineering, 30(1), 25-36.
15 Majumdar, S., Kulkarni, D., & Ravishankar, C. V. (2007, May). Addressing click fraud in content delivery systems. In IEEE INFOCOM 2007-26th IEEE International Conference on Computer Communications (pp. 240-248). IEEE.
16 Crawford, M., Khoshgoftaar, T. M., Prusa, J. D., Richter, A. N., & Al Najada, H. (2015). Survey of review spam detection using machine learning techniques. Journal of Big Data, 2(1), 1-24.   DOI
17 Mayzlin, D., Dover, Y., & Chevalier, J. (2014). Promotional reviews: An empirical investigation of online review manipulation. American Economic Review, 104(8), 2421-55.   DOI
18 Mouratidis, D., Nikiforos, M. N., & Kermanidis, K. L. (2021). Deep learning for fake news detection in a pairwise textual input schema. Computation, 9(2), 20.   DOI
19 Jalther, D., & Priya, G. (2019). Reputation reporting system using text based classification. Int. J. Innov. Technol. and Expl. Eng., 8(8), 1555-1558.
20 Lim, E. P., Nguyen, V. A., Jindal, N., Liu, B., & Lauw, H. W. (2010, October). Detecting product review spammers using rating behaviors. In Proceedings of the 19th ACM international conference on Information and knowledge management (pp. 939-948).
21 Anderson, E. T., & Simester, D. I. (2014). Reviews without a purchase: Low ratings, loyal customers, and deception. Journal of Marketing Research, 51(3), 249-269.   DOI
22 Nunamaker Jr, J. F., Burgoon, J. K., & Giboney, J. S. (2016). Information systems for deception detection. Journal of Management Information Systems, 33(2), 327-331.   DOI
23 Tian, K., Shao, M., Wang, Y., Guan, J., & Zhou, S. (2016). Boosting compound-protein interaction prediction by deep learning. Methods, 110, 64-72.   DOI
24 Ott, M., Choi, Y., Cardie, C., & Hancock, J. T. (2011). Finding deceptive opinion spam by any stretch of the imagination. arXiv preprint arXiv:1107.4557.   DOI
25 Park, Y.-J., & Kim, K.-j. (2017). Impact of Semantic Characteristics on Perceived Helpfulness of Online Reviews. Journal of Intelligence and Information Systems, 23(3), 29-44.   DOI
26 Rajamohana, S. P., Umamaheswari, K., & Abirami, B. (2017). Performance analysis of iBPSO and BFPA based feature selection techniques for improving classification accuracy in review spam detection. Appl. Math, 11(4), 1149-1153.
27 Ren, Y., & Ji, D. (2017). Neural networks for deceptive opinion spam detection: An empirical study. Information Sciences, 385, 213-224.   DOI
28 Salminen, J., Kandpal, C., Kamel, A. M., Jung, S. G., & Jansen, B. J. (2022). Creating and detecting fake reviews of online products. Journal of Retailing and Consumer Services, 64, 102771.   DOI
29 Suh, Y., Yu, J., Mo, J., Song, L., & Kim, C. (2017). A comparison of oversampling methods on imbalanced topic classification of Korean news articles. Journal of Cognitive Science, 18(4), 391-437.   DOI
30 Weisberg, J., Te'eni, D., & Arman, L. (2011). Past purchase and intention to purchase in e-commerce: The mediation of social presence and trust. Internet research.
31 Veganzones, D., & Severin, E. (2018). An investigation of bankruptcy prediction in imbalanced datasets. Decision Support Systems, 112, 111-124.   DOI
32 Yelp Trust & Safety. Trust & Safety Report. https://trust.yelp.com/trust-and-safety-report/
33 Zhang, D., Zhou, L., Kehoe, J. L., & Kilic, I. Y. (2016). What online reviewer behaviors really matter? Effects of verbal and nonverbal behaviors on detection of fake online reviews. Journal of Management Information Systems, 33(2), 456-481.   DOI
34 Chen, L. S., & Lin, J. Y. (2013, July). A study on review manipulation classification using decision tree. In 2013 10th international conference on service systems and service management (pp. 680-685). IEEE.
35 Scott, K. (2020). Microsoft teams up with OpenAI to exclusively license GPT-3 language model. Official Microsoft Blog.
36 Banerjee, S., Bhattacharyya, S., & Bose, I. (2017). Whose online reviews to trust? Understanding reviewer trustworthiness and its impact on business. Decision Support Systems, 96, 17-26   DOI
37 Cao, Q., Duan, W., & Gan, Q. (2011). Exploring determinants of voting for the "helpfulness" of online user reviews: A text mining approach. Decision Support Systems, 50(2), 511-521.   DOI
38 Cheng, Y. H., & Ho, H. Y. (2015). Social influence's impact on reader perceptions of online reviews. Journal of Business Research, 68(4), 883-887.   DOI
39 Ball, L., & Elworthy, J. (2014). Fake or real? The computational detection of online deceptive text. Journal of Marketing Analytics, 2(3), 187-201.   DOI
40 Banerjee, S., & Chua, A. Y. (2014). A theoretical framework to identify authentic online reviews. Online Information Review.
41 Douzas, G., Bacao, F., & Last, F. (2018). Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Information Sciences, 465, 1-20.   DOI
42 Fernandez, A., Garcia, S., Luengo, J., Bernado-Mansilla, E., & Herrera, F. (2010). Genetics-based machine learning for rule induction: state of the art, taxonomy, and comparative study. IEEE Transactions on Evolutionary Computation, 14(6), 913-941.   DOI
43 Gobi, N., & Rathinavelu, A. (2019). Analyzing cloud based reviews for product ranking using feature based clustering algorithm. Cluster Computing, 22(3), 6977-6984.   DOI
44 Dwivedi, Y. K., Ismagilova, E., Hughes, D. L., Carlson, J., Filieri, R., Jacobson, J., ... & Wang, Y. (2021). Setting the future of digital and social media marketing research: Perspectives and research propositions. International Journal of Information Management, 59, 102168.   DOI
45 Eslami, S. P., & Ghasemaghaei, M. (2018). Effects of online review positiveness and review score inconsistency on sales: A comparison by product involvement. Journal of Retailing and Consumer Services, 45, 74-80.   DOI
46 Filieri, R. (2015). What makes online reviews helpful? A diagnosticity-adoption framework to explain informational and normative influences in e-WOM. Journal of business research, 68(6), 1261-1270.   DOI
47 He, S., Hollenbeck, B., & Proserpio, D. (2022). The market for fake reviews. Marketing Science.
48 Hu, N., Bose, I., Koh, N. S., & Liu, L. (2012). Manipulation of online reviews: An analysis of ratings, readability, and sentiments. Decision support systems, 52(3), 674-684.   DOI
49 Ismagilova, E., Slade, E., Rana, N. P., & Dwivedi, Y. K. (2020). The effect of characteristics of source credibility on consumer behaviour: A meta-analysis. Journal of Retailing and Consumer Services, 53, 101736.   DOI
50 Kim, J., & Kwahk, K.-Y. (2022). Class Imbalance Resolution Method and Classification Algorithm Suggesting Based on Dataset Type Segmentation. Journal of Intelligence and Information Systems, 28(3), 23-43.   DOI
51 Kim, M. J., Kang, D. K., & Kim, H. B. (2015). Geometric mean based boosting algorithm with over-sampling to resolve data imbalance problem for bankruptcy prediction. Expert Systems with Applications, 42(3), 1074-1082.   DOI