Browse > Article
http://dx.doi.org/10.6109/jkiice.2018.22.9.1153

Evaluation of Classification Algorithm Performance of Sentiment Analysis Using Entropy Score  

Park, Man-Hee (Department of Business Administration, Catholic University of Pusan)
Abstract
Online customer evaluations and social media information among a variety of information sources are critical for businesses as it influences the customer's decision making. There are limitations on the time and money that the survey will ask to identify a variety of customers' needs and complaints. The customer review data at online shopping malls provide the ideal data sources for analyzing customer sentiment about their products. In this study, we collected product reviews data on the smartphone of Samsung and Apple from Amazon. We applied five classification algorithms which are used as representative sentiment analysis techniques in previous studies. The five algorithms are based on support vector machines, bagging, random forest, classification or regression tree and maximum entropy. In this study, we proposed entropy score which can comprehensively evaluate the performance of classification algorithm. As a result of evaluating five algorithms using an entropy score, the SVMs algorithm's entropy score was ranked highest.
Keywords
Entropy score; sentiment analysis; classification algorithm; performance evaluation;
Citations & Related Records
연도 인용수 순위
  • Reference
1 B. Liu, Sentiment Analysis and Opinion Mining, Morgan & Claypool Publishers, 2015.
2 B. Gregorutti, B. Michel and P. Saint-Pierre, "Correlation and variable importance in random forests," Statistics and Computing, vol. 27, no. 3, pp. 659-678, Apr. 2017.   DOI
3 V. A. Kharde and S. S. Sonawane, "Sentiment Analysis of Twitter Data: A Survey of Techniques," International Journal of Computer Applications, vol. 139, no. 11, pp. 5-15, Apr. 2016.   DOI
4 G. Vinodhini and RM. Chandrasekaran, "Performance Evaluation of Machine Learning Classifiers in Sentiment Mining," International Journal of Computer Trends and Technology, vol. 4, no. 6, pp. 1783-1786, Jun. 2013.
5 R. Polikar, "Ensemble based systems in decision making," IEEE Circuits and Systems Magazine, vol. 6, no. 3, pp. 21-45, Sep. 2006.   DOI
6 L. Breiman, "Bagging predictors," Machine Learning, vol. 24, no. 2, pp. 123-140, Jan. 1996.   DOI
7 C. D. Sutton, "Classification and Regression Trees, Bagging, and Boosting," Handbook of Statistics, vol. 24, pp. 303-329, Apr. 2005.
8 R. Kohavi and F. Provost, "Glossary of terms. Editorial for the Special Issue on Applications of Machine Learning and the Knowledge Discovery Process," Machine Learning, vol. 30, pp. 271-274, Feb. 1998.   DOI
9 Q. Xie, Q. Dai, Y. Li and A. Jiang, "Increasing the Discriminatory Power of DEA Using Shannon's Entropy," Entropy, vol. 16. pp. 1571-1585, Mar. 2014.   DOI
10 D. Meyer, E. Dimitriadou, K. Hornik, A. Weingessel, and F. Leisch, e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071) [Internet]. Available: http://CRAN.R-project.org/package=e1071.
11 A. Peters and T. Hothorn, ipred: Improved Predictors [Internet]. Available: http://CRAN.R-project.org/package=ipred.
12 A. Liaw and M. Wiener, "Classification and regression by randomForest," R News, vol. 2, no. 3, pp. 18-22, Dec. 2002.
13 B. Ripley, tree: Classification and Regression Trees [Internet]. Available: http://CRAN.R-project.org/package=tree.
14 T. P. Jurka, L. Collingwood, A. E. Boydstun, Grossman, and W. E. Atteveldt, "RTextTools: A Supervised Learning Package for Text Classification," The R Journal, vol. 5, no. 1, pp. 6-12, Jun. 2013.
15 T. P. Jurka, "maxent: An R package for low-memory multinomial logistic regression with support for semi- automated text classification," The R Journal, vol. 4, no. 1, pp. 56-59, Jun. 2012.
16 Y. Wan and Q. Gao, "An Ensemble Sentiment Classification System of Twitter Data for Airline Services Analysis," in Proceedings of 15th IEEE International Conference on Data Mining Workshop, pp. 1318-1325, 2015.