Browse > Article
http://dx.doi.org/10.13088/jiis.2016.22.3.129

Increasing Accuracy of Classifying Useful Reviews by Removing Neutral Terms  

Lee, Minsik (Department of Business Administration, The Catholic University of Korea)
Lee, Hong Joo (Department of Business Administration, The Catholic University of Korea)
Publication Information
Journal of Intelligence and Information Systems / v.22, no.3, 2016 , pp. 129-142 More about this Journal
Abstract
Customer product reviews have become one of the important factors for purchase decision makings. Customers believe that reviews written by others who have already had an experience with the product offer more reliable information than that provided by sellers. However, there are too many products and reviews, the advantage of e-commerce can be overwhelmed by increasing search costs. Reading all of the reviews to find out the pros and cons of a certain product can be exhausting. To help users find the most useful information about products without much difficulty, e-commerce companies try to provide various ways for customers to write and rate product reviews. To assist potential customers, online stores have devised various ways to provide useful customer reviews. Different methods have been developed to classify and recommend useful reviews to customers, primarily using feedback provided by customers about the helpfulness of reviews. Most shopping websites provide customer reviews and offer the following information: the average preference of a product, the number of customers who have participated in preference voting, and preference distribution. Most information on the helpfulness of product reviews is collected through a voting system. Amazon.com asks customers whether a review on a certain product is helpful, and it places the most helpful favorable and the most helpful critical review at the top of the list of product reviews. Some companies also predict the usefulness of a review based on certain attributes including length, author(s), and the words used, publishing only reviews that are likely to be useful. Text mining approaches have been used for classifying useful reviews in advance. To apply a text mining approach based on all reviews for a product, we need to build a term-document matrix. We have to extract all words from reviews and build a matrix with the number of occurrences of a term in a review. Since there are many reviews, the size of term-document matrix is so large. It caused difficulties to apply text mining algorithms with the large term-document matrix. Thus, researchers need to delete some terms in terms of sparsity since sparse words have little effects on classifications or predictions. The purpose of this study is to suggest a better way of building term-document matrix by deleting useless terms for review classification. In this study, we propose neutrality index to select words to be deleted. Many words still appear in both classifications - useful and not useful - and these words have little or negative effects on classification performances. Thus, we defined these words as neutral terms and deleted neutral terms which are appeared in both classifications similarly. After deleting sparse words, we selected words to be deleted in terms of neutrality. We tested our approach with Amazon.com's review data from five different product categories: Cellphones & Accessories, Movies & TV program, Automotive, CDs & Vinyl, Clothing, Shoes & Jewelry. We used reviews which got greater than four votes by users and 60% of the ratio of useful votes among total votes is the threshold to classify useful and not-useful reviews. We randomly selected 1,500 useful reviews and 1,500 not-useful reviews for each product category. And then we applied Information Gain and Support Vector Machine algorithms to classify the reviews and compared the classification performances in terms of precision, recall, and F-measure. Though the performances vary according to product categories and data sets, deleting terms with sparsity and neutrality showed the best performances in terms of F-measure for the two classification algorithms. However, deleting terms with sparsity only showed the best performances in terms of Recall for Information Gain and using all terms showed the best performances in terms of precision for SVM. Thus, it needs to be careful for selecting term deleting methods and classification algorithms based on data sets.
Keywords
Neutrality; Term Remove; Customer Review Classification; Usefulness Index;
Citations & Related Records
Times Cited By KSCI : 5  (Citation Analysis)
연도 인용수 순위
1 Hong, E. S., "Earlv Software Oualitv Prediction Using Support Vector Machine," Journal of Information Technology Services, Vol.10, No.12(2011), 235-245.
2 Lee, H. W. and H. C. Ahn, "An Intelligent Intrusion Detection Model Based on Support Vector Machines and the Classification Threshold Optimization for Considering the Asymmetric Error Cost," Journal of Intelligence and Information Systems, Vol.17, No.4(2011), 157-173.
3 Lee, S. J., J. Y. Choeh and J. H. Choi, "The Determinant Factors Affecting Economic Impact, Helpfulness, and Helpfulness Votes of Online," Journal of Information Technology Services, Vol.13, No.1(2014), 43-55.
4 Liu, Y., X. Huang, A. An and X. Yu, "Modeling andPredicting the Helpfulness of Online Reviews," Proceedings of the Eighth IEEE International Conference on Data Mining (2008), 443-452.
5 McAuley, J., C. Targett, J. Shi and A. van den Hengel, "Image-based recommendations onstyles and substitutes," Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (2015), 43-52.
6 McAuley, J., R. Pandey and J. Leskovec, "Inferringnetworks of substitutable and complementary products," Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2015), 785-794.
7 Naji, I., "10 Tips to Improve your TextClassification Algorithm Accuracy and Performance," Accessed at http://thinknook.com/10-ways-to-improve-your-classification-algorithm-performance-2013-01-21/
8 Pak, A. and P. Paroubek, "Twitter as a Corpus for Sentiment Analysis and Opinion Mining," Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC '10)(2010).
9 Park, S. C., S. W. Kim and H. S. Choi, "Selection Model of System Trading Strategies using SVM," Journal of Intelligence and Information Systems, Vol.20, No.2(2014), 59-71.
10 Perkins, J., Python 3 Text Processing with NLTK 3Cookbook, Packt Publishing, 2014.
11 Zhang, R. and T. Tran, "An Information gain-basedapproach for recommending useful product reviews," Knowledge and Information Systems, Vol.26, No.3(2011), 419-434.   DOI
12 Meyer, D., E. Dimitriadou, K. Hornik, A. Weingessel and F. Leisch, "e1071: Misc Functionsof the Department of Statistics, Probability Theory Group (Formerly: E1071)," TUWien. R package version 1.6-7. https://CRAN.R-project.org/package=e1071, 2015.
13 Dellarocas, C., G. Gao and R. Narayan, "Are consumers more likelyto contribute online reviews for hit or niche products?," Journal of Management Information Systems, Vol.27, No.2(2010), 127-157.   DOI
14 Cao, Q., W. Duan and Q. Gan, "Exploring determinants of voting for the 'helpfulness' online userreviews: A text mining approach," Decision Support Systems, Vol.50, No.2(2011), 511-521.   DOI
15 Choeh, J. Y., H. J. Lee and S. J. Park, "A Personalized Approach for Recommending Useful Product Reviews Basedon Information Gain," KSII Transactions on Internet and Information Systems, Vol.9, No.5(2015), 1702-1716.   DOI
16 Cruz, R. A. and H. J. Lee, "The Effects of Sentiment and Readability on Useful Votes for Customer Reviews with Count Type Review Usefulness Index," Journal of Intelligence and Information Systems, Vol.22, No.1(2016), 43-61.   DOI
17 David, S. and T. Pinch, "Six Degrees of Reputation: The Use and Abuse of Online Review and Recommendation Systems," First Monday, Vol.11, No.3(2006), Available at http://dx.doi.org/10.5210/fm.v11i3.1315 (Downloaded 15 September, 2016)   DOI
18 Dellarocas, C., "The Digitization of Word of Mouth: Promise and Challenges of Online Feedback Mechanisms," Management Science, Vol.49, No.10(2003), 1407-1424.   DOI
19 Feinerer, I., K. Hornik and D. Meyer, "TextMining Infrastructure in R," Journal of Statistical Software, Vol.25, No.5(2008), 1-54.