[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.13088/jiis.2022.28.4.347

The Detection of Online Manipulated Reviews Using Machine Learning and GPT-3

Chernyaeva, Olga (College of Business Administration, Pusan National University)
Hong, Taeho (College of Business Administration, Pusan National University)

Publication Information

Journal of Intelligence and Information Systems / v.28, no.4, 2022 , pp. 347-364 More about this Journal

Abstract

Fraudulent companies or sellers strategically manipulate reviews to influence customers' purchase decisions; therefore, the reliability of reviews has become crucial for customer decision-making. Since customers increasingly rely on online reviews to search for more detailed information about products or services before purchasing, many researchers focus on detecting manipulated reviews. However, the main problem in detecting manipulated reviews is the difficulties with obtaining data with manipulated reviews to utilize machine learning techniques with sufficient data. Also, the number of manipulated reviews is insufficient compared with the number of non-manipulated reviews, so the class imbalance problem occurs. The class with fewer examples is under-represented and can hamper a model's accuracy, so machine learning methods suffer from the class imbalance problem and solving the class imbalance problem is important to build an accurate model for detecting manipulated reviews. Thus, we propose an OpenAI-based reviews generation model to solve the manipulated reviews imbalance problem, thereby enhancing the accuracy of manipulated reviews detection. In this research, we applied the novel autoregressive language model - GPT-3 to generate reviews based on manipulated reviews. Moreover, we found that applying GPT-3 model for oversampling manipulated reviews can recover a satisfactory portion of performance losses and shows better performance in classification (logit, decision tree, neural networks) than traditional oversampling models such as random oversampling and SMOTE.

Keywords

Text Mining; Online Reviews; Manipulated Reviews Detection; Text Generation; Class Imbalance Problem;

Citations & Related Records

Times Cited By KSCI : 1 (Citation Analysis)

Reference
Cited By KSCI

1	Gossling, S., Hall, C. M., & Andersson, A. C. (2018). The manager's dilemma: a conceptualization of online review manipulation strategies. Current Issues in Tourism, 21(5), 484-503. DOI
2	Hu, N., Liu, L., & Sambamurthy, V. (2011). Fraud detection in online consumer reviews. Decision Support Systems, 50(3), 614-626. DOI
3	Khurshid, F., Zhu, Y., Xu, Z., Ahmad, M., & Ahmad, M. (2019). Enactment of ensemble learning for review spam detection on selected features. International Journal of Computational Intelligence Systems, 12(1), 387-394. DOI
4	Shmueli, G., Patel, N. R., & Bruce, P. C. (2011). Data mining for business intelligence: Concepts, techniques, and applications in Microsoft Office Excel with XLMiner. John Wiley and Sons.
5	Kumar, A., Gopal, R. D., Shankar, R., & Tan, K. H. (2022). Fraudulent review detection model focusing on emotional expressions and explicit aspects: investigating the potential of feature engineering. Decision Support Systems, 155, 113728. DOI
6	Li, H., Li, J., Chang, P. C., & Sun, J. (2013). Parametric prediction on default risk of Chinese listed tourism companies by using random oversampling, isomap, and locally linear embeddings on imbalanced samples. International Journal of Hospitality Management, 35, 141-151. DOI
7	Li, L., Qin, B., Ren, W., & Liu, T. (2017). Document representation and feature combination for deceptive spam review detection. Neurocomputing, 254, 33-41. DOI
8	Li, X., Yun, H., Li, Q., & Kim, J. (2022). A multi-channel CNN based online review helpfulness prediction model. Journal of Intelligence and Information Systems, 28(2), 171-189. DOI
9	Liang, Y., & Zhu, K. (2018, April). Automatic generation of text descriptive comments for code blocks. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 32, No. 1).
10	Liu, Y., Pang, B., & Wang, X. (2019). Opinion spam detection by incorporating multimodal embedded representation into a probabilistic review graph. Neurocomputing, 366, 276-283. DOI
11	Lopez, V., Fernandez, A., Garcia, S., Palade, V., & Herrera, F. (2013). An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Information sciences, 250, 113-141. DOI
12	Luca, M. (2016). Reviews, reputation, and revenue: The case of Yelp. com. Com (March 15, 2016). Harvard Business School NOM Unit Working Paper, (12-016).
13	Rajamohana, S. P., & Umamaheswari, K. (2018). Hybrid approach of improved binary particle swarm optimization and shuffled frog leaping for feature selection. Computers & Electrical Engineering, 67, 497-508. DOI
14	Kotsiantis, S., Kanellopoulos, D., & Pintelas, P. (2006). Handling imbalanced datasets: A review. GESTS international transactions on computer science and engineering, 30(1), 25-36.
15	Majumdar, S., Kulkarni, D., & Ravishankar, C. V. (2007, May). Addressing click fraud in content delivery systems. In IEEE INFOCOM 2007-26th IEEE International Conference on Computer Communications (pp. 240-248). IEEE.
16	Crawford, M., Khoshgoftaar, T. M., Prusa, J. D., Richter, A. N., & Al Najada, H. (2015). Survey of review spam detection using machine learning techniques. Journal of Big Data, 2(1), 1-24. DOI
17	Mayzlin, D., Dover, Y., & Chevalier, J. (2014). Promotional reviews: An empirical investigation of online review manipulation. American Economic Review, 104(8), 2421-55. DOI
18	Mouratidis, D., Nikiforos, M. N., & Kermanidis, K. L. (2021). Deep learning for fake news detection in a pairwise textual input schema. Computation, 9(2), 20. DOI
19	Jalther, D., & Priya, G. (2019). Reputation reporting system using text based classification. Int. J. Innov. Technol. and Expl. Eng., 8(8), 1555-1558.
20	Lim, E. P., Nguyen, V. A., Jindal, N., Liu, B., & Lauw, H. W. (2010, October). Detecting product review spammers using rating behaviors. In Proceedings of the 19th ACM international conference on Information and knowledge management (pp. 939-948).
21	Anderson, E. T., & Simester, D. I. (2014). Reviews without a purchase: Low ratings, loyal customers, and deception. Journal of Marketing Research, 51(3), 249-269. DOI
22	Nunamaker Jr, J. F., Burgoon, J. K., & Giboney, J. S. (2016). Information systems for deception detection. Journal of Management Information Systems, 33(2), 327-331. DOI
23	Tian, K., Shao, M., Wang, Y., Guan, J., & Zhou, S. (2016). Boosting compound-protein interaction prediction by deep learning. Methods, 110, 64-72. DOI
24	Ott, M., Choi, Y., Cardie, C., & Hancock, J. T. (2011). Finding deceptive opinion spam by any stretch of the imagination. arXiv preprint arXiv:1107.4557. DOI
25	Park, Y.-J., & Kim, K.-j. (2017). Impact of Semantic Characteristics on Perceived Helpfulness of Online Reviews. Journal of Intelligence and Information Systems, 23(3), 29-44. DOI
26	Rajamohana, S. P., Umamaheswari, K., & Abirami, B. (2017). Performance analysis of iBPSO and BFPA based feature selection techniques for improving classification accuracy in review spam detection. Appl. Math, 11(4), 1149-1153.
27	Ren, Y., & Ji, D. (2017). Neural networks for deceptive opinion spam detection: An empirical study. Information Sciences, 385, 213-224. DOI
28	Salminen, J., Kandpal, C., Kamel, A. M., Jung, S. G., & Jansen, B. J. (2022). Creating and detecting fake reviews of online products. Journal of Retailing and Consumer Services, 64, 102771. DOI
29	Suh, Y., Yu, J., Mo, J., Song, L., & Kim, C. (2017). A comparison of oversampling methods on imbalanced topic classification of Korean news articles. Journal of Cognitive Science, 18(4), 391-437. DOI
30	Weisberg, J., Te'eni, D., & Arman, L. (2011). Past purchase and intention to purchase in e-commerce: The mediation of social presence and trust. Internet research.
31	Veganzones, D., & Severin, E. (2018). An investigation of bankruptcy prediction in imbalanced datasets. Decision Support Systems, 112, 111-124. DOI
32	Yelp Trust & Safety. Trust & Safety Report. https://trust.yelp.com/trust-and-safety-report/
33	Zhang, D., Zhou, L., Kehoe, J. L., & Kilic, I. Y. (2016). What online reviewer behaviors really matter? Effects of verbal and nonverbal behaviors on detection of fake online reviews. Journal of Management Information Systems, 33(2), 456-481. DOI
34	Chen, L. S., & Lin, J. Y. (2013, July). A study on review manipulation classification using decision tree. In 2013 10th international conference on service systems and service management (pp. 680-685). IEEE.
35	Scott, K. (2020). Microsoft teams up with OpenAI to exclusively license GPT-3 language model. Official Microsoft Blog.
36	Banerjee, S., Bhattacharyya, S., & Bose, I. (2017). Whose online reviews to trust? Understanding reviewer trustworthiness and its impact on business. Decision Support Systems, 96, 17-26 DOI
37	Cao, Q., Duan, W., & Gan, Q. (2011). Exploring determinants of voting for the "helpfulness" of online user reviews: A text mining approach. Decision Support Systems, 50(2), 511-521. DOI
38	Cheng, Y. H., & Ho, H. Y. (2015). Social influence's impact on reader perceptions of online reviews. Journal of Business Research, 68(4), 883-887. DOI
39	Ball, L., & Elworthy, J. (2014). Fake or real? The computational detection of online deceptive text. Journal of Marketing Analytics, 2(3), 187-201. DOI
40	Banerjee, S., & Chua, A. Y. (2014). A theoretical framework to identify authentic online reviews. Online Information Review.
41	Douzas, G., Bacao, F., & Last, F. (2018). Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Information Sciences, 465, 1-20. DOI
42	Fernandez, A., Garcia, S., Luengo, J., Bernado-Mansilla, E., & Herrera, F. (2010). Genetics-based machine learning for rule induction: state of the art, taxonomy, and comparative study. IEEE Transactions on Evolutionary Computation, 14(6), 913-941. DOI
43	Gobi, N., & Rathinavelu, A. (2019). Analyzing cloud based reviews for product ranking using feature based clustering algorithm. Cluster Computing, 22(3), 6977-6984. DOI
44	Dwivedi, Y. K., Ismagilova, E., Hughes, D. L., Carlson, J., Filieri, R., Jacobson, J., ... & Wang, Y. (2021). Setting the future of digital and social media marketing research: Perspectives and research propositions. International Journal of Information Management, 59, 102168. DOI
45	Eslami, S. P., & Ghasemaghaei, M. (2018). Effects of online review positiveness and review score inconsistency on sales: A comparison by product involvement. Journal of Retailing and Consumer Services, 45, 74-80. DOI
46	Filieri, R. (2015). What makes online reviews helpful? A diagnosticity-adoption framework to explain informational and normative influences in e-WOM. Journal of business research, 68(6), 1261-1270. DOI
47	He, S., Hollenbeck, B., & Proserpio, D. (2022). The market for fake reviews. Marketing Science.
48	Hu, N., Bose, I., Koh, N. S., & Liu, L. (2012). Manipulation of online reviews: An analysis of ratings, readability, and sentiments. Decision support systems, 52(3), 674-684. DOI
49	Ismagilova, E., Slade, E., Rana, N. P., & Dwivedi, Y. K. (2020). The effect of characteristics of source credibility on consumer behaviour: A meta-analysis. Journal of Retailing and Consumer Services, 53, 101736. DOI
50	Kim, J., & Kwahk, K.-Y. (2022). Class Imbalance Resolution Method and Classification Algorithm Suggesting Based on Dataset Type Segmentation. Journal of Intelligence and Information Systems, 28(3), 23-43. DOI
51	Kim, M. J., Kang, D. K., & Kim, H. B. (2015). Geometric mean based boosting algorithm with over-sampling to resolve data imbalance problem for bankruptcy prediction. Expert Systems with Applications, 42(3), 1074-1082. DOI

KSCI

The Detection of Online Manipulated Reviews Using Machine Learning and GPT-3 기계학습과 GPT3를 시용한 조작된 리뷰의 탐지

The Detection of Online Manipulated Reviews Using Machine Learning and GPT-3