Browse > Article
http://dx.doi.org/10.14400/JDC.2019.17.3.107

Exploring Feature Selection Methods for Effective Emotion Mining  

Eo, Kyun Sun (KK Business School, Sungkyunkwan University)
Lee, Kun Chang (Global Business Administration/Dept. of Health Sciences & & Technology, SAIHST Sungkyunkwan University)
Publication Information
Journal of Digital Convergence / v.17, no.3, 2019 , pp. 107-117 More about this Journal
Abstract
In the era of SNS, many people relies on it to express their emotions about various kinds of products and services. Therefore, for the companies eagerly seeking to investigate how their products and services are perceived in the market, emotion mining tasks using dataset from SNSs become important much more than ever. Basically, emotion mining is a branch of sentiment analysis which is based on BOW (bag-of-words) and TF-IDF. However, there are few studies on the emotion mining which adopt feature selection (FS) methods to look for optimal set of features ensuring better results. In this sense, this study aims to propose FS methods to conduct emotion mining tasks more effectively with better outcomes. This study uses Twitter and SemEval2007 dataset for the sake of emotion mining experiments. We applied three FS methods such as CFS (Correlation based FS), IG (Information Gain), and ReliefF. Emotion mining results were obtained from applying the selected features to nine classifiers. When applying DT (decision tree) to Tweet dataset, accuracy increases with CFS, IG, and ReliefF methods. When applying LR (logistic regression) to SemEval2007 dataset, accuracy increases with ReliefF method.
Keywords
Text mining; Feature selection; Sentiment analysis; Emotion mining; Classifiers;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 H. L. Yang & Q. F. Lin. (2018). Opinion mining for multiple types of emotion-embedded products/services through evolutionary strategy. Expert Systems with Applications, 99, 44-55.   DOI
2 M. V. Mantyla, D. Graziotin & M. Kuutila. (2018). The evolution of sentiment analysis-A review of research topics, venues, and top cited papers. Computer Science Review, 27, 16-32.   DOI
3 Y. Liu, J. W. Bi & Z. P. Fan. (2017). Multi-class sentiment classification: The experimental comparisons of feature selection and machine learning algorithms. Expert Systems with Applications, 80, 323-339.   DOI
4 T. Danisman & A. Alpkocak. (2008, April). Feeler: Emotion classification of text using vector space model. In AISB 2008 Convention Communication, Interaction and Social Intelligence (Vol. 1, p. 53).
5 C., Strapparava & R. Mihalcea. (2007). Semeval-2007 task 14: Affective text. In Proceedings of the 4th international workshop on semantic evaluations (pp. 70-74). Association for Computational Linguistics.
6 N. Gupta, M. Gilbert & G. D. Fabbrizio. (2013). Emotion detection in email customer care. Computational Intelligence, 29(3), 489-505.   DOI
7 M. Hasan, E. Agu & E. Rundensteiner. (2014). Using hashtags as labels for supervised learning of emotions in twitter messages. In ACM SIGKDD Workshop on Health Informatics, New York, USA.
8 C. Quan & F. Ren. (2016). Weighted high-order hidden Markov models for compound emotions recognition in text. Information Sciences, 329, 581-596.   DOI
9 M. A. Hall. (1999). Correlation-based feature selection for machine learning.
10 M. Robnik-Sikonja & I. Kononenko. (2003). Theoretical and empirical analysis of ReliefF and RReliefF. Machine learning, 53(1-2), 23-69.   DOI
11 S. Park, K. M. Yang & S. B. Cho. (2013). A Hierarchical CPV Solar Generation Tracking System based on Modular Bayesian Network. Journal of KIISE: Software and Applications, 41.
12 D. R. Cox. (1958). The regression analysis of binary sequences. Journal of the Royal Statistical Society. Series B (Methodological), 215-242.
13 S. K. Murthy. (1998). Automatic construction of decision trees from data: A multi-disciplinary survey. Data mining and knowledge discovery, 2(4), 345-389.   DOI
14 E. C. Bae & K. C. Lee. (2016). Predicting Stock Liquidity by Using Ensemble Data Mining Methods", Journal of The Korea Society of computer and Information, 21(6), 9-19,   DOI
15 V. Vapnik. (2013). The nature of statistical learning theory. Springer science & business media.
16 J. H. Lee & J. G. Baek. (2018). RTC(Real-Time Contrast) Control Chart using Random Forest based Multi-Class Classifier, Journal of the Korean Institute of Industrial Engineers, 44(4), 306-315.   DOI
17 M. H. Song, J. Lee, S. P. Cho & K. J. Lee. (2005). SVM Classifier for the Detection of Ventricular Fibrillation, The Institute of Electronics Engineers of Korea - System and Control, 42(5), 27-34.
18 M. Ballings, D. Van den Poel, N. Hespeels & R. Gryp. (2015). Evaluating multiple classifiers for stock price direction prediction. Expert Systems with Applications, 42(20), 7046-7056.   DOI
19 D. H. Wolpert. (1992). Stacked generalization. Neural networks, 5(2), 241-259.   DOI
20 T. K. Ho. (1998). The Random Subspace Method for Constructing Decision Forests, IEEE Trans. Pattern Analysis and Machine Intelligence, 20(8), 832-844.   DOI
21 J. A. Balazs & J. D. Velasquez. (2016). Opinion mining and information fusion: a survey. Information Fusion, 27, 95-110.   DOI
22 W. Wang, L. Chen, K. Thirunarayan & A. P. Sheth. (2012). Harnessing twitter big data for automatic emotion identification. In Privacy, Security, Risk and Trust (PASSAT), 2012 International Conference on and 2012 International Confernece on Social Computing (SocialCom), IEEE, 587-592.
23 A. Yadollahi, A. G. Shahraki & O. R. Zaiane. (2017). Current state of text sentiment analysis from opinion to emotion mining. ACM Computing Surveys (CSUR), 50(2), 25.
24 S. Arlot & A. Celisse. (2010). A survey of cross-validation procedures for model selection. Statistics surveys, 4, 40-79.   DOI