[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3837/tiis.2022.03.005

Sentiment Analysis of Product Reviews to Identify Deceptive Rating Information in Social Media: A SentiDeceptive Approach

Marwat, M. Irfan (Department of Software Engineering, University of Science and Technology Bannu)
Khan, Javed Ali (Department of Software Engineering, University of Science and Technology Bannu)
Alshehri, Dr. Mohammad Dahman (Department of Computer Science, College of Computers and Information Technology, Taif University)
Ali, Muhammad Asghar (Department of Software Engineering, University of Science and Technology Bannu)
Hizbullah (Department of Software Engineering, University of Science and Technology Bannu)
Ali, Haider (Department of Software Engineering, University of Science and Technology Bannu)
Assam, Muhammad (College of computer Science, Zheijiang University)

Publication Information

KSII Transactions on Internet and Information Systems (TIIS) / v.16, no.3, 2022 , pp. 830-860 More about this Journal

Abstract

[Introduction] Nowadays, many companies are shifting their businesses online due to the growing trend among customers to buy and shop online, as people prefer online purchasing products. [Problem] Users share a vast amount of information about products, making it difficult and challenging for the end-users to make certain decisions. [Motivation] Therefore, we need a mechanism to automatically analyze end-user opinions, thoughts, or feelings in the social media platform about the products that might be useful for the customers to make or change their decisions about buying or purchasing specific products. [Proposed Solution] For this purpose, we proposed an automated SentiDecpective approach, which classifies end-user reviews into negative, positive, and neutral sentiments and identifies deceptive crowd-users rating information in the social media platform to help the user in decision-making. [Methodology] For this purpose, we first collected 11781 end-users comments from the Amazon store and Flipkart web application covering distant products, such as watches, mobile, shoes, clothes, and perfumes. Next, we develop a coding guideline used as a base for the comments annotation process. We then applied the content analysis approach and existing VADER library to annotate the end-user comments in the data set with the identified codes, which results in a labelled data set used as an input to the machine learning classifiers. Finally, we applied the sentiment analysis approach to identify the end-users opinions and overcome the deceptive rating information in the social media platforms by first preprocessing the input data to remove the irrelevant (stop words, special characters, etc.) data from the dataset, employing two standard resampling approaches to balance the data set, i-e, oversampling, and under-sampling, extract different features (TF-IDF and BOW) from the textual data in the data set and then train & test the machine learning algorithms by applying a standard cross-validation approach (KFold and Shuffle Split). [Results/Outcomes] Furthermore, to support our research study, we developed an automated tool that automatically analyzes each customer feedback and displays the collective sentiments of customers about a specific product with the help of a graph, which helps customers to make certain decisions. In a nutshell, our proposed sentiments approach produces good results when identifying the customer sentiments from the online user feedbacks, i-e, obtained an average 94.01% precision, 93.69% recall, and 93.81% F-measure value for classifying positive sentiments.

Keywords

Sentiment Analysis; Opinion Mining; Customer Reviews; Natural Language Processing; Imbalance; Deceptive reviews; Flipkart; Amazon;

Citations & Related Records

Times Cited By KSCI : 2 (Citation Analysis)

Reference
Cited By KSCI

1	M. D. Alshehri, F. K. Hussain, "A centralized trust management mechanism for the internet of things (ctm-iot)," in Proc. of International conference on broadband and wireless computing, communication and applications, Fukuoka, Japan, pp. 533-543, 2017.
2	P. V. Rajeev, V. S. Rekha, "Recommending Products to Customers using Opinion Mining of Online Product Reviews and Features," in Proc. of 2015 International Conference on Circuit, Power and Computing Technologies(ICCPCT), Nagercoil, India, 2015.
3	M. Yasen, and S. Tedmori, "Movies reviews sentiment analysis and classification," in Proc. of 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT), Amman, Jordan, pp. 860-865, 2019.
4	S. Ahmed, and A. Danti, "A novel approach for Sentimental Analysis and Opinion Mining based on SentiWordNet using web data," in Proc. of 2015 International Conference on Trends in Automation, Communications and Computing Technology (I-TACT-15), Bangalore, India, pp. 1-5, 2016.
5	E. Boiy, P. Hens, K. Deschacht, and M. F. Moens, "Automatic Sentiment Analysis in On-line Text," in Proc. of ELPUB, Leuven, Belgium, pp. 349-360, 2007.
6	J. Corbin and A. Strauss, Basics of qualitative research: Techniques and procedures for developing grounded theory, 4th ed. California, USA: Sage publications, 2014.
7	M. Ott, Y. Choi, C. Cardie, and J. T. Hancock, "Finding deceptive opinion spam by any stretch of the imagination," arXiv preprint arXiv: 1107.4557, 2011.
8	W. P. Risk, G. S. Kino, and H. J. Shaw, "Fiber-optic frequency shifter using a surface acoustic wave incident at an oblique angle," Opt. Lett., vol. 11, no. 2, pp. 115-117, 1986. DOI
9	O. Cocarascu and F. Toni, "Detecting deceptive reviews using argumentation," in Proc. of the 1st International Workshop on AI for Privacy and Security, Imperial College London, pp. 1-8, 2016.
10	D.H. Fusilier, M. Montes-y-Gomez, P. Rosso, and R. G. Cabrera, "Detecting positive and negative deceptive opinions using PU-learning," Information processing & management, vol. 51, no. 4, pp. 433-443, 2015. DOI
11	Z. Singla, S. Randhawa, and S Jain, "Sentiment analysis of customer product reviews using machine learning," in Proc. of 2017 international conference on intelligent computing and control (I2C2), Coimbatore, India, pp. 1-5, 2017.
12	D. D. Lewis, "Naive (Bayes) at forty: The independence assumption in information retrieval," in Proc. of European conference on machine learning, Berlin, Heidelberg, pp. 4-15, 1998.
13	K. A. Neuendorf, The content analysis guidebook, 2nd Ed., California, USA: Sage publications, 2017.
14	S. Dey, S. Wasif, D. S. Tonmoy, S. Sultana, J. Sarkar, and M. Dey, "A comparative study of support vector machine and Naive Bayes classifier for sentiment analysis on Amazon product reviews," in Proc. of 2020 International Conference on Contemporary Computing and Applications (IC3A), Lucknow, India, pp. 217-220, 2020.
15	C. Rain, "Sentiment Analysis in Amazon Reviews Using Probabilistic Machine Learning," M.S thesis, Department of Computer Science, Swarthmore College, Swarthmore, PA, USA, 2013.
16	S. S. Sikarwar, Dr. N. Tiwari, "Analysis The Sentiments Of Amazon Reviews Dataset By Using Linear SVC And Voting Classifier," International journal of science and technology research, vol. 9, no. 6, pp. 461-465, 2020.
17	G. Kaur, and A. Singla, "Sentimental analysis of Flipkart reviews using Naive Bayes and decision tree algorithm," International Journal of Advanced Research in Computer Engineering & Technology, vol. 5, no. 1, 2016.
18	J. A. Khan, L. Liu, L. Wen, and R. Ali, "Conceptualising, extracting and analysing requirements arguments in users' forums: The CrowdRE-Arg framework," Journal of Software: Evolution and Process, vol. 32, no. 12, pp. 1-34, 2020.
19	J. A. Khan, L. Liu, and L. Wen, "Requirements knowledge acquisition from online user forums," IET Software, vol. 14, no. 3, pp. 242-253, 2020. DOI
20	D. R. Cox, "The regression analysis of binary sequences," Journal of the Royal Statistical Society: Series B (Methodological), vol. 20, no. 2, pp. 215-232, 1958. DOI
21	J. A. Hanley, and B. J. McNeil, "The meaning and use of the area under a receiver operating characteristic (ROC) curve," Radiology, vol.143, no. 1, 1982.
22	G. Zhao, X. Lei, X. Qian, and T. Mei, "Exploring users' internal influence from reviews for social recommendation," IEEE transactions on multimedia, vol. 21, no. 3, pp. 771-781, 2018. DOI
23	N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: synthetic minority over-sampling technique," Journal of artificial intelligence research, vol. 16, pp. 321-357, 2002. DOI
24	T. U. Haque, N. N. Saber, and F. M. Shah, "Sentiment analysis on large scale Amazon product reviews," in Proc. of 2018 IEEE international conference on innovative research and development (ICIRD), Bangkok, Thailand, pp. 1-6, 2018.
25	M. S. Lakshmi, S. P. Kumar, M. Janardhan, "Machine Learning Centric Product Endorsement on Flipkart Database," International Journal of Engineering and Advanced Technology (IJEAT), vol. 8, no. 6, pp. 2750-2753, 2019. DOI
26	D. Pagano, and W. Maalej, "User feedback in the appstore: An empirical study," in Proc. of 2013 21st IEEE international requirements engineering conference (RE), Rio de Janeiro, Brazil, pp. 125-134, 15 July 2013.
27	R. Abinaya., P. Aishwaryaa, S. Baavana, and N. T. Selvi, "Automatic sentiment analysis of user reviews," in Proc. of 2016 IEEE Technological Innovations in ICT for Agriculture and Rural Development (TIAR), Chennai, India, pp. 158-162, 16 July 2016.
28	B. Pang, and L. Lee, "Opinion mining and sentiment analysis," Foundations and trends in information retrieval, vol. 2, no. 1-2, pp. 1-135, 07 Jul 2008. DOI
29	G. Zhao, Z. Liu, Y. Chao, and X. Qian, "CAPER: Context-aware personalized emoji recommendation," IEEE Transactions on Knowledge and Data Engineering, vol. 33, no. 9, pp. 3160-3172, 2021. DOI
30	R. M. Czekster, P. Fernandes, J. M. Vincent, and T. Webber, "Split: a flexible and efficient algorithm to vector-descriptor product," in Proc. of VALUETOOLS, p. 83. 2007.
31	T. Joachims, "A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization," Carnegie-mellon univ pittsburgh pa dept of computer science, 01 March 1996.
32	Jr. D. W. Hosmer, S. Lemeshow, and R. X. Sturdivant, Applied logistic regression, Vol. 398, Hoboken, New Jersey, USA: John Wiley & Sons, 2013.
33	J. H. Friedman, "Greedy function approximation: a gradient boosting machine," Annals of statistics, vol. 29, no. 5, pp. 1189-1232, 2001. DOI
34	C. Cortes, and V. Vapnik, "Support-vector networks," Machine learning, vol. 20, no. 3, pp. 273-297, 1995. DOI
35	K. Atkinson, P. Baroni, M. Giacomin, A. Hunter, H. Prakken, C. Reed, G. Simari, M. Thimm, S. Cillata, "Towards Artificial Argumentation," AI Magazine, vol. 38, no. 3, pp. 25-36, 2017. DOI
36	M. Elkhodr, B. Alsinglawi, M. Alshehri, "Data provenance in the internet of things," in Proc. of 32nd International Conference on Advanced Information Networking and Applications Workshops (WAINA), Krakow, Poland, pp. 727-731, 2018.
37	M. D. Alshehri, F. K. Hussain, "A comparative analysis of scalable and context-aware trust management approaches for internet of things," in Proc. of International conference on neural information processing, Sanur, Bali, Indonesia, pp. 596-605, 2015.
38	T. K. Ho, "Random decision forests," in Proc. of 3rd international conference on document analysis and recognition, Montreal QC, Canada, pp. 278-282, 2002.
39	B. Vamsi, N. Suneetha, Ch. Sudhakar and K. Amaravati, "Sentiment Analysis on Online Reviews using Supervised Learning: A Survey," International Journal of Control Theory and Applications, vol. 10, no. 30, pp. 143-152, 2017.
40	G. Lemaitre, F. Nogueira, and C. K. Aridas, "Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning," The Journal of Machine Learning Research, vol. 18, no. 1, pp. 1-5, 2017.
41	J. A. Khan, L. Liu, L. Wen, and A. Raian, "Crowd Intelligence in Requirements Engineering: Current Status and Future Directions," in Proc. of Int. Conf. Requirements Engineering: Foundation for Software quality, Essen, Germany, pp 245-261, 2019.
42	C. Hutto, and E. Gilbert, "Vader: A parsimonious rule-based model for sentiment analysis of social media text," in Proc. of the International AAAI Conference on Web and Social Media, Michigan, USA, pp. 1-10, 2015.
43	J. A. Khan, Y. Xie, L. Liu, L. Wen, "Analysis of requirements-related arguments in user forums," in Proc. of the IEEE International Conference on Requirements Engineering, Jeju, South Korea, pp. 63-74, 2019.
44	J. Keilwagen, I. Grosse, and J. Grau, "Area under precision-recall curves for weighted and unweighted data," PloS one, vol. 9, no. 3, pp. e92209, 2014. DOI
45	N. V. Chawla, N. Japkowicz, and A. Kotcz, "Special issue on learning from imbalanced data sets," ACM SIGKDD explorations newsletter, vol. 6, no. 1, pp. 1-6, 2004. DOI
46	S. Kotsiantis, D. Kanellopoulos, and P. Pintelas, "Handling imbalanced datasets: A review," GESTS International Transactions on Computer Science and Engineering, vol. 30, no. 1, pp.25-36, 2005.
47	A. Agarwal, Bi. Xie, I. Vovsha, O. Rambow, and R. J. Passonneau, "Sentiment analysis of twitter data," in Proc. of the workshop on language in social media (LSM 2011), Portland, Oregon, USA, pp. 30-38, 2011.
48	M. Abbas, K. A. Memon, A. A. Jamali, S. Memon, and A. Ahmed, "Multinomial Naive Bayes classification model for sentiment analysis," Int. J. Comput. Sci. Netw. Secur, vol. 19, no. 3, 2019.
49	A. Tripathy, A. Agrawal, and S. K. Rath, "Classification of sentiment reviews using n-gram machine learning approach," Expert Systems with Applications, vol. 57, no. 15, pp. 117-126, 2016. DOI
50	J. Hartmann, J. Huppertz, C. Schamp, and M. Heitmann, "Comparing automated text classification methods," International Journal of Research in Marketing, vol. 36, no. 1, pp. 20-38, 2018.
51	T. Saito, and M. Rehmsmeier, "The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets," PloS one, vol. 10, no. 3, no. 4, pp. e011843, 2015.
52	M. D. Alshehri, F. K. Hussain, "A fuzzy security protocol for trust management in the internet of things (Fuzzy-IoT)," Computing, vol. 101, no. 7, pp. 791-818, 2019. DOI
53	A. S. M. Al-Qahtani, "Product Sentiment Analysis for Amazon Reviews," International Journal of Computer Science & Information Technology (IJCSIT), vol. 13, no. 3, 2021.
54	A. Shah, "Sentiment Analysis of Product Reviews Using Supervised Learning," Reliability: Theory & Applications, vol. 16, no. S1, pp. 243-253, 2021.
55	P. Nandwani, and R. Verma, "A review on sentiment analysis and emotion detection from text," Social Network Analysis and Mining, vol. 11, no. 1, pp. 1-19, 2021. DOI
56	S. Hu, A. Kumar, F. Al-Turjman, S. Gupta, and S. Seth, "Reviewer credibility and sentiment analysis based user profile modelling for online product recommendation," IEEE Access, vol. 8, pp. 26172-26189, 2020. DOI
57	J. D. Rodriguez, A. Perez, and J. A. Lozano, "Sensitivity analysis of k-fold cross validation in prediction error estimation," IEEE transactions on pattern analysis and machine intelligence, vol. 32, no. 3, pp. 569-575, 2009. DOI
58	D. D. Lewis, "Feature selection and feature extraction for text categorization," in Proc. of Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, pp. 212-217, February 1992.
59	M. Lippi, and P. Torroni, "Argumentation mining: State of the art and emerging trends," ACM Transactions on Internet Technology, vol. 16, no. 2, pp. 1-25, 2016.
60	M. D. Alshehri, F. Hussain, M. Elkhodr, B. S. Alsinglawi, "A distributed trust management model for the internet of things (DTM-IoT)," Recent Trends and Advances in Wireless and IoT-enabled Networks, Springer, Cham, pp. 1-9, 2019.
61	M. Elkhodr, B. Alsinglawi, M. Alshehri, "A privacy risk assessment for the Internet of Things in healthcare," Applications of intelligent technologies in healthcare, Springer Nature Switzerland, Springer, Cham, pp. 47-54, 2019.
62	G. Zhao, P. Lou, X. Qian, and X. Hou, "Personalized location recommendation by fusing sentimental and spatial context," Knowledge-Based Systems, vol. 196, pp. 1-16, 2020.
63	M. D. Alshehri, F. K. Hussain, O. K. Hussain, "Clustering-driven intelligent trust management methodology for the internet of things (CITM-IoT)," Mobile networks and applications, vol. 23, no. 3, pp. 419-431, 2018. DOI