Browse > Article
http://dx.doi.org/10.9708/jksci.2020.25.08.181

Impact of Word Embedding Methods on Performance of Sentiment Analysis with Machine Learning Techniques  

Park, Hoyeon (Dept. of MIS, Graduate School, Dongguk University)
Kim, Kyoung-jae (Dept. of MIS, Business School, Dongguk University)
Abstract
In this study, we propose a comparative study to confirm the impact of various word embedding techniques on the performance of sentiment analysis. Sentiment analysis is one of opinion mining techniques to identify and extract subjective information from text using natural language processing and can be used to classify the sentiment of product reviews or comments. Since sentiment can be classified as either positive or negative, it can be considered one of the general classification problems. For sentiment analysis, the text must be converted into a language that can be recognized by a computer. Therefore, text such as a word or document is transformed into a vector in natural language processing called word embedding. Various techniques, such as Bag of Words, TF-IDF, and Word2Vec are used as word embedding techniques. Until now, there have not been many studies on word embedding techniques suitable for emotional analysis. In this study, among various word embedding techniques, Bag of Words, TF-IDF, and Word2Vec are used to compare and analyze the performance of movie review sentiment analysis. The research data set for this study is the IMDB data set, which is widely used in text mining. As a result, it was found that the performance of TF-IDF and Bag of Words was superior to that of Word2Vec and TF-IDF performed better than Bag of Words, but the difference was not very significant.
Keywords
sentiment analysis; Bag of words; TF-IDF; Word2Vec; machine learning;
Citations & Related Records
Times Cited By KSCI : 9  (Citation Analysis)
연도 인용수 순위
1 E. H. Kim, "A deeping learning-based article and paragraph-level classification," Journal of the Korea Society of Computer and Information, vol. 23, no. 11, pp. 31-41, Nov. 2018.   DOI
2 J. Park, H. Kim, H. G. Kim, T. K. Ahn, and H. Yi, "Structuring of unstructured SNS messages on rail services using deep learning techniques," Journal of The Korea Society of Computer and Information, vol. 23, no. 7, pp. 19-26, Jul. 2018.   DOI
3 S. M. Liu and J.-H. Chen, "A multi-label classification based approach for sentiment classification," Expert Systems with Applications, vol. 42, no. 3, pp. 1083-1093, Feb. 2015.   DOI
4 G. Gautam and D. Yadav, "Sentiment analysis of twitter data using machine learning approaches and semantic analysis," in Proc. of IC3, IEEE, pp. 437-442, Aug. 2014.
5 J. Read, "Using emoticons to reduce dependency in machine learning techniques for sentiment classification," in Proceedings of the ACL Student Research Workshop, pp. 43-48, Jun. 2005.
6 L. Dey, S. Chakraborty, A. Biswas, B. Bose, and S. Tiwari, "Sentiment analysis of review datasets using Naive Bayes and k-nn classifier," International Journal of Information Engineering and Electronic Business, vol. 8, no. 4, pp. 54-62, Jul. 2016.   DOI
7 T. A. Rana and Y.-N. Cheah, "Aspect extraction in sentiment analysis: comparative analysis and survey," Artificial Intelligence Review, vol. 46, no. 4, pp. 459-483, Feb. 2016.   DOI
8 Q. T. Ain, M. Ali, A. Riaz, A. Noureen, M. Kamran, B. Hayat, and A. Rehman, "Sentiment analysis using deep learning techniques: a review," International Journal of Advanced Computer Science and Applications, vol. 8, no. 6, pp. 424-433, Jun. 2017.
9 B. Pang, L. Lee, and S. Vaithyanathan, "Thumbs up? Sentiment classification using machine learning techniques." in Proc. of EMNLP 2002, pp. 79-86, Jul. 2002.
10 A. Abdi, S. M. Shamsuddin, S. Hasan, and J. Piran, "Deep learning-based sentiment classification of evaluative text based on Multi-feature fusion," Information Processing & Management, vol. 56, no. 4, pp. 1245-1259, Jul. 2019.   DOI
11 T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality," in Advances in Neural Information Processing Systems, pp. 3111-3119, 2013.
12 F. H. Khan, U. Qamar, and S. Bashir, "SentiMI: Introducing point-wise mutual information with SentiWordNet to improve sentiment polarity detection," Applied Soft Computing, vol. 39, pp. 140-153, Feb. 2016.   DOI
13 F. Tang, L. Fu, B. Yao, and W. Xu, "Aspect based fine-grained sentiment analysis for online reviews," Information Sciences, vol. 488, pp. 190-204, Jul. 2019.   DOI
14 C. Bhadane, H. Dalal, and H. Doshi, "Sentiment analysis: Measuring opinions," Procedia Computer Science, vol. 45, no. 0, pp. 808-814, Mar. 2015.   DOI
15 W. J. Kim, D. H. Kim and H. W. Jang, "Semantic extension search for documents using the Word2vec," Journal of the Korea Contents Association, vol. 16, no. 10, pp. 687-692, Oct. 2016.   DOI
16 D. K. Sung, and Y. S. Jeong, "Political opinion mining from article comments using deep learning," Journal of The Korea Society of Computer and Information, vol. 23, no. 1, pp. 9-15, Jan. 2018.   DOI
17 T. Lee, K. Kim, J. Lee, and S. Lee, "An efficient BotNet detection scheme exploiting Word2Vec and accelerated hierarchical density-based clustering," Journal of Internet Computing and Services, vol. 20, no. 6, pp. 11-20, Dec. 2019.   DOI