Browse > Article
http://dx.doi.org/10.9728/dcs.2018.19.1.133

Effective Korean sentiment classification method using word2vec and ensemble classifier  

Park, Sung Soo (SKK Business School, Sungkunkwan University)
Lee, Kun Chang (SKK Business School, Sungkunkwan University)
Publication Information
Journal of Digital Contents Society / v.19, no.1, 2018 , pp. 133-140 More about this Journal
Abstract
Accurate sentiment classification is an important research topic in sentiment analysis. This study suggests an efficient classification method of Korean sentiment using word2vec and ensemble methods which have been recently studied variously. For the 200,000 Korean movie review texts, we generate a POS-based BOW feature and a feature using word2vec, and integrated features of two feature representation. We used a single classifier of Logistic Regression, Decision Tree, Naive Bayes, and Support Vector Machine and an ensemble classifier of Adaptive Boost, Bagging, Gradient Boosting, and Random Forest for sentiment classification. As a result of this study, the integrated feature representation composed of BOW feature including adjective and adverb and word2vec feature showed the highest sentiment classification accuracy. Empirical results show that SVM, a single classifier, has the highest performance but ensemble classifiers show similar or slightly lower performance than the single classifier.
Keywords
Sentiment classification; feature representation; Bag-of-words(BOW); word2vec; ensemble classifier;
Citations & Related Records
Times Cited By KSCI : 4  (Citation Analysis)
연도 인용수 순위
1 B. Pang, L. Lee, and S. Vaithyanathan, "Thumbs up?: sentiment classification using machine learning techniques," in Proceedings of the ACL-02 conference on Empirical methods in natural language processing, Vol. 10, 2002, pp. 79-86.
2 D. Chatzakou and A. Vakali, "Harvesting opinions and emotions from social media textual resources," IEEE Internet Computing, vol. 19, no. 4, pp. 46-50, 2015.   DOI
3 R. Xia, C. Zong, and S. Li, "Ensemble of feature sets and classification algorithms for sentiment classification," Information Sciences, vol. 181, no. 6, pp. 1138-1152, 2011.   DOI
4 G. Wang, J. Sun, J. Ma, K. Xu, and J. Gu, "Sentiment classification: The contribution of ensemble learning," Decision Support Systems, vol. 57, no. 1, pp. 77-93, 2014.   DOI
5 H.-S. L. Dong-yub Lee Jae-Choon Jo, "User Sentiment Analysis on Amazon Fashion Product Review Using Word Embedding," Journal of the Korea Convergence Society, vol. 8, no. 4, pp. 1-8, 2017.   DOI
6 J. Lilleberg, Y. Zhu, and Y. Zhang, "Support Vector Machines and Word2vec for Text Classification with Semantic Features," Cognitive Informatics & Cognitive Computing (ICCI* CC), 2015 IEEE 14th International Conference on. IEEE, 2015., pp. 136-140, 2015.
7 Y. Ren, R. Wang, and D. Ji, "A topic-enhanced word embedding for Twitter sentiment classification," Information Sciences, vol. 369, pp. 188-198, 2016.   DOI
8 Y. Bengio, H. Schwenk, J.-S. Senecal, F. Morin, and J.-L. Gauvain, "Neural Probabilistic Language Models," in Innovations in Machine Learning: Theory and Applications, D. E. Holmes and L. C. Jain, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006, pp. 137-186.
9 K.-M. Ahn, Y.-S. Kim, Y.-H. Kim, and Y.-H. Seo, "Sentiment Classification of Movie Reviews using Levenshtein Distance," Journal of Digital Contents Society, vol. 14, no. 4, pp. 581-587, Dec. 2013.   DOI
10 B. Liu, "Sentiment analysis and opinion mining," Synthesis lectures on human language technologies, vol. 5, no. 1, pp. 1-167, 2012.   DOI
11 N. F. F. Da Silva, E. R. Hruschka, and E. R. Hruschka, "Tweet sentiment analysis with classifier ensembles," Decision Support Systems, vol. 66, pp. 170-179, 2014.   DOI
12 M. Giatsoglou, M. G. Vozalis, K. Diamantaras, A. Vakali, G. Sarigiannidis, and K. C. Chatzisavvas, "Sentiment analysis leveraging emotions and word embeddings," Expert Systems with Applications, vol. 69, pp. 214-224, 2017.   DOI
13 P. Zhang and Z. He, "Using data-driven feature enrichment of text representation and ensemble technique for sentence-level polarity classification," Journal of Information Science, vol. 41, no. 4, pp. 531-549, 2015.   DOI
14 Z. Zhai, B. Liu, H. Xu, and P. Jia, "Clustering product features for opinion mining," in Proceedings of the fourth ACM international conference on Web search and data mining, 2011, pp. 347-354.
15 A. Tripathy, A. Agrawal, and S. K. Rath, "Classification of sentiment reviews using n-gram machine learning approach," Expert Systems with Applications, vol. 57, pp. 117-126, 2016.   DOI
16 S. Wang and C. D. Manning, "Baselines and bigrams: Simple, good sentiment and topic classification," in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Vol. 2, 2012, pp. 90-94.
17 Q. Le and T. Mikolov, "Distributed representations of sentences and documents," in Proceedings of the 31st International Conference on Machine Learning (ICML-14), 2014, pp. 1188-1196.
18 A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts, "Learning Word Vectors for Sentiment Analysis," Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142-150, 2011.
19 Y. Kim and M. Song, "A Study on Analyzing Sentiments on Movie Reviews by Multi-Level Sentiment Classifier," Journal of Intelligence and Information Systems, vol. 22, no. 3, pp. 71-89, 2016.   DOI
20 Y. Jung, K. Park, T. Lee, J. Chae, and S. Jung, "A corpus-based approach to classifying emotions using Korean linguistic features," Cluster Computing, vol. 20, no. 1, pp. 583-595, 2017.   DOI
21 C. Lee, K. Hyun, Y. Byeong, M. Mun, and S. Joo, "Informal Quality Data Analysis via Sentimental analysis and," Journal of the Korean Society for Quality Management, vol. 45, no. 1, pp. 117-127, 2017.   DOI
22 Y. Kim and H. Shin, "Finding Sentiment Dimension in Vector Space of Movie Reviews: An Unsupervised Approach," Journal of Cognitive Science, pp. 85-101, 2017.
23 S.-Y. O. Chan Heo, "A Novel Method for Constructing Sentiment Dictionaries using Word2vec and Label Propagation," Journal of Korean Institute of Next Generation Computing, vol. 13, no. 2, pp. 93-101, 2017.
24 E. L. Park and S. Cho, "KoNLPy: Korean natural language processing in Python," in Proceedings of the 26th Annual Conference on Human & Cognitive Language Technology, 2014, pp. 133-136.
25 T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient Estimation of Word Representations in Vector Space," In Proceedings of workshop at ICLR, pp. 1-12, 2013.