[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.14400/JDC.2019.17.2.163

Investigating Opinion Mining Performance by Combining Feature Selection Methods with Word Embedding and BOW (Bag-of-Words)

Eo, Kyun Sun (SKK Business School, Sungkyunkwan University)
Lee, Kun Chang (Global Business Administration/Dept of Health Sciences & & Technology, SHAIHST Sungkyunkwan University)

Publication Information

Journal of Digital Convergence / v.17, no.2, 2019 , pp. 163-170 More about this Journal

Abstract

Over the past decade, the development of the Web explosively increased the data. Feature selection step is an important step in extracting valuable data from a large amount of data. This study proposes a novel opinion mining model based on combining feature selection (FS) methods with Word embedding to vector (Word2vec) and BOW (Bag-of-words). FS methods adopted for this study are CFS (Correlation based FS) and IG (Information Gain). To select an optimal FS method, a number of classifiers ranging from LR (logistic regression), NN (neural network), NBN (naive Bayesian network) to RF (random forest), RS (random subspace), ST (stacking). Empirical results with electronics and kitchen datasets showed that LR and ST classifiers combined with IG applied to BOW features yield best performance in opinion mining. Results with laptop and restaurant datasets revealed that the RF classifier using IG applied to Word2vec features represents best performance in opinion mining.

Keywords

Word embedding; Opinion mining; Sentiment analysis; Feature selection; Machine learning;

Citations & Related Records

Reference

1	R. J. Schalkoff. Artificial neural networks, 1, New York: McGraw-Hill.
2	N. Friedman, D. Geiger & M. Goldszmidt. (1997). Bayesian network classifiers. Machine learning, 29(2-3), 131-163. DOI
3	L. Breiman. (2001). Random forests. Machine learning, 45(1), 5-32. DOI
4	T. K. Ho. (1998). The Random Subspace Method for Constructing Decision Forests, IEEE Trans. Pattern Analysis and Machine Intelligence, 20(8), 832-844. DOI
5	D. H. Wolpert. (1992). Stacked generalization. Neural networks, 5(2), 241-259. DOI
6	J. Blitzer, M. Dredze & F. Pereira. (2007). Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Proceedings of the 45th annual meeting of the association of computational linguistics, (pp. 440-447).
7	S. Poria, E. Cambria & A. Gelbukh. (2016). Aspect extraction for opinion mining with a deep convolutional neural network. Knowledge-Based Systems, 108, 42-49. DOI
8	T. Mikolov, K. Chen, G. Corrado & J. Dean. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
9	L. P. Ni, Z. W. Ni & Y. Z. Gao. (2011). Stock trend prediction based on fractal feature selection and support vector machine. Expert Systems with Applications, 38(5), 5569-5576. DOI
10	Y. Liu, J. W. Bi & Z. P. Fan. (2017). Multi-class sentiment classification: The experimental comparisons of feature selection and machine learning algorithms. Expert Systems with Applications, 80, 323-339. DOI
11	F. Corea. (2016). Can Twitter Proxy the Investors' Sentiment? The Case for the Technology Sector. Big Data Research, 4, 70-74. DOI
12	Y. Ruan, A. Durresi & L. Alfantoukh. (2018). Using Twitter trust network for stock market analysis. Knowledge-Based Systems, 145, 207-218. DOI
13	S. Yoo, J. Song & O. Jeong. (2018). Social media contents based sentiment analysis and prediction system. Expert Systems with Applications, 105, 102-111. DOI
14	M. Ghiassi, J. Skinner & D. Zimbra. (2013). Twitter brand sentiment analysis: A hybrid system using n-gram analysis and dynamic artificial neural network. Expert Systems with applications, 40(16), 6266-6282. DOI
15	N. F. Da Silva, E. R. Hruschka & E. R. Hruschka Jr. (2014). Tweet sentiment analysis with classifier ensembles. Decision Support Systems, 66, 170-179. DOI
16	G. Wang, J. Sun, J. Ma, K. Xu & J. Gu. (2014). Sentiment classification: The contribution of ensemble learning. Decision support systems, 57, 77-93. DOI
17	A. Garcia-Pablos, M. Cuadros & G. Rigau. (2018). W2vlda: almost unsupervised system for aspect based sentiment analysis. Expert Systems with Applications, 91, 127-137. DOI
18	S. Menard. (2002). Applied logistic regression analysis, 106, Sage.
19	J. R. Pineiro-Chousa, M. A. Lopez-Cabarcos & A. M. Perez-Pico. (2016). Examining the influence of stock market variables on microblogging sentiment. Journal of Business Research, 69(6), 2087-2092. DOI
20	M. Kang, J. Ahn & K. Lee. (2018). Opinion mining using ensemble text hidden Markov models for text classification. Expert Systems with Applications, 94, 218-227. DOI
21	A. Yadollahi, A. G. Shahraki & O. R. Zaiane. (2017). Current state of text sentiment analysis from opinion to emotion mining. ACM Computing Surveys (CSUR), 50(2), 25.
22	M. Y. Chen & T. H. Chen. (2017). Modeling public mood and emotion: Blog and news sentiment and socio-economic phenomena. Future Generation Computer Systems.

KSCI

Investigating Opinion Mining Performance by Combining Feature Selection Methods with Word Embedding and BOW (Bag-of-Words) 속성선택방법과 워드임베딩 및 BOW (Bag-of-Words)를 결합한 오피니언 마이닝 성과에 관한 연구

Investigating Opinion Mining Performance by Combining Feature Selection Methods with Word Embedding and BOW (Bag-of-Words)