[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.9708/jksci.2019.24.02.171

Exploring an Optimal Feature Selection Method for Effective Opinion Mining Tasks

Eo, Kyun Sun (SKKU Business School, Sungkyunkwan University)
Lee, Kun Chang (SKKU Business School/SAIHST (Samsung Advanced Institute of Health Sciences & Technology), Sungkyunkwan University)

Publication Information

Journal of the Korea Society of Computer and Information / v.24, no.2, 2019 , pp. 171-177 More about this Journal

Abstract

This paper aims to find the most effective feature selection method for the sake of opinion mining tasks. Basically, opinion mining tasks belong to sentiment analysis, which is to categorize opinions of the online texts into positive and negative from a text mining point of view. By using the five product groups dataset such as apparel, books, DVDs, electronics, and kitchen, TF-IDF and Bag-of-Words(BOW) fare calculated to form the product review feature sets. Next, we applied the feature selection methods to see which method reveals most robust results. The results show that the stacking classifier based on those features out of applying Information Gain feature selection method yields best result.

Keywords

Opinion mining; Sentiment analysis; Feature selection;

Citations & Related Records

Reference

1	M. Kang, J. Ahn, & K. Lee, "Opinion mining using ensemble text hidden Markov models for text classification". Expert Systems with Applications, Vol. 94, pp. 218-227, 2018. DOI
2	Z. Li, W. Xu, L. Zhang, & R. Y. Lau, "An ontology-based Web mining method for unemployment rate prediction", Decision Support Systems, Vol. 66, pp. 114-122, 2014. DOI
3	A. Yadollahi, A. G. Shahraki, & O. R. Zaiane, "Current state of text sentiment analysis from opinion to emotion mining". Association for computing machinery computing surveys, Vol. 50, No. 2, Article 25, 2017.
4	J. Blitzer, M. Dredze, & F. Pereira, "Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification". In Proceedings of the 45th annual meeting of the association of computational linguistics pp. 440-447, 2007.
5	A. P. Bradley, "The use of the area under the ROC curve in the evaluation of machine learning algorithms", Pattern recognition, Vol. 30, No. 7, pp. 1145-1159, 1997. DOI
6	S. Arlot, & A. Celisse, "A survey of cross-validation procedures for model selection", Statistics surveys, Vol. 4, pp. 40-79, 2010. DOI
7	C. Catal, & M. Nangir, "A sentiment classification model based on multiple classifiers". Applied Soft Computing, Vol. 50, pp. 135-141, 2017. DOI
8	L. Breiman, "Bagging predictors". Machine learning, Vol. 24, No. 2, pp. 123-140, 1996. DOI
9	D. H. Wolpert, "Stacked generalization". Neural networks, Vol. 5, No. 2, pp. 241-259, 1992. DOI
10	M. V. Mantyla, D. Graziotin, & M. Kuutila, "The evolution of sentiment analysis? A review of research topics, venues, and top cited papers", Computer Science Review, Vol. 27, pp. 16-32, 2018. DOI
11	M. Ghiassi, J. Skinner, & D. Zimbra, "Twitter brand sentiment analysis: A hybrid system using n-gram analysis and dynamic artificial neural network.", Expert systems with applications, Vol. 40, No. 16, pp. 6266-6282, 2013. DOI
12	M. Robnik-Sikonja, & I. Kononenko, "Theoretical and empirical analysis of ReliefF and RReliefF". Machine learning, Vol. 53, No. (1-2), pp. 23-69, 2003. DOI
13	G. Wang, J. Sun, J. Ma, K. Xu, & J. Gu, "Sentiment classification: The contribution of ensemble learning.", Decision support systems, Vol. 57, pp. 77-93, 2014. DOI
14	Y. Liu, J. W. Bi, & Z. P. Fan, "Multi-class sentiment classification: The experimental comparisons of feature selection and machine learning algorithms.", Expert systems with applications, Vol. 80, pp. 323-339, 2017. DOI
15	M. A. Hall, "Correlation-based feature selection for machine learning", 1999.
16	S. Menard, "Applied logistic regression analysis, Vol. 106, Sage", 2002.
17	W. L. Buntine, "Operations for learning with graphical models". Journal of Atificial Intelligence Research, Vol. 2, pp. 159-225, 1994. DOI
18	N. F. Da Silva, E. R. Hruschka, & E. R. Hruschka, "Tweet sentiment analysis with classifier ensembles." Decision support systems, Vol. 66, pp. 170-179, 2014. DOI
19	M. Ballings, D. Van den Poel, N. Hespeels, & R. Gryp, "Evaluating multiple classifiers for stock price direction prediction". Expert Systems with Applications, Vo. 42, No. 20, pp. 7046-7056, 2015. DOI
20	L. Breiman, "Random forests. Machine learning, Vol. 45, No. 1, pp. 5-32, 2001. DOI
21	T.K. Ho, "The Random Subspace Method for Constructing Decision Forests," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 8, pp. 832-844, 1998. DOI
22	S. K. Murthy, "Automatic construction of decision trees from data: A multi-disciplinary survey". Data mining and knowledge discovery, Vol. 2, No. 4, pp. 345-389, 1998. DOI
23	V. Vapnik, "The nature of statistical learning theory. Springer science & business media", 2013.