Browse > Article
http://dx.doi.org/10.13088/jiis.2015.21.2.69

A Study of 'Emotion Trigger' by Text Mining Techniques  

An, Juyoung (Department of Library and Information Science, College of Liberal Arts, Yonsei University)
Bae, Junghwan (Department of Library and Information Science, College of Liberal Arts, Yonsei University)
Han, Namgi (Department of Library and Information Science, College of Liberal Arts, Yonsei University)
Song, Min (Department. of Library and Information Science, Yonsei University)
Publication Information
Journal of Intelligence and Information Systems / v.21, no.2, 2015 , pp. 69-92 More about this Journal
Abstract
The explosion of social media data has led to apply text-mining techniques to analyze big social media data in a more rigorous manner. Even if social media text analysis algorithms were improved, previous approaches to social media text analysis have some limitations. In the field of sentiment analysis of social media written in Korean, there are two typical approaches. One is the linguistic approach using machine learning, which is the most common approach. Some studies have been conducted by adding grammatical factors to feature sets for training classification model. The other approach adopts the semantic analysis method to sentiment analysis, but this approach is mainly applied to English texts. To overcome these limitations, this study applies the Word2Vec algorithm which is an extension of the neural network algorithms to deal with more extensive semantic features that were underestimated in existing sentiment analysis. The result from adopting the Word2Vec algorithm is compared to the result from co-occurrence analysis to identify the difference between two approaches. The results show that the distribution related word extracted by Word2Vec algorithm in that the words represent some emotion about the keyword used are three times more than extracted by co-occurrence analysis. The reason of the difference between two results comes from Word2Vec's semantic features vectorization. Therefore, it is possible to say that Word2Vec algorithm is able to catch the hidden related words which have not been found in traditional analysis. In addition, Part Of Speech (POS) tagging for Korean is used to detect adjective as "emotional word" in Korean. In addition, the emotion words extracted from the text are converted into word vector by the Word2Vec algorithm to find related words. Among these related words, noun words are selected because each word of them would have causal relationship with "emotional word" in the sentence. The process of extracting these trigger factor of emotional word is named "Emotion Trigger" in this study. As a case study, the datasets used in the study are collected by searching using three keywords: professor, prosecutor, and doctor in that these keywords contain rich public emotion and opinion. Advanced data collecting was conducted to select secondary keywords for data gathering. The secondary keywords for each keyword used to gather the data to be used in actual analysis are followed: Professor (sexual assault, misappropriation of research money, recruitment irregularities, polifessor), Doctor (Shin hae-chul sky hospital, drinking and plastic surgery, rebate) Prosecutor (lewd behavior, sponsor). The size of the text data is about to 100,000(Professor: 25720, Doctor: 35110, Prosecutor: 43225) and the data are gathered from news, blog, and twitter to reflect various level of public emotion into text data analysis. As a visualization method, Gephi (http://gephi.github.io) was used and every program used in text processing and analysis are java coding. The contributions of this study are as follows: First, different approaches for sentiment analysis are integrated to overcome the limitations of existing approaches. Secondly, finding Emotion Trigger can detect the hidden connections to public emotion which existing method cannot detect. Finally, the approach used in this study could be generalized regardless of types of text data. The limitation of this study is that it is hard to say the word extracted by Emotion Trigger processing has significantly causal relationship with emotional word in a sentence. The future study will be conducted to clarify the causal relationship between emotional words and the words extracted by Emotion Trigger by comparing with the relationships manually tagged. Furthermore, the text data used in Emotion Trigger are twitter, so the data have a number of distinct features which we did not deal with in this study. These features will be considered in further study.
Keywords
Emotion Trigger; Word2Vec; Sentimental Analysis; Text Mining; Social Issues;
Citations & Related Records
Times Cited By KSCI : 6  (Citation Analysis)
연도 인용수 순위
1 Jang, K. A., S. H. Park, and W. J. Kim, "Automatic Construction of a Negative/positive Corpus and Emotional Classification using the Internet Emotional Sign," Korean Institute of Information Scientists and Engineers, Vol. 42, No. 4(2015), 512-521.
2 Kang, H. H., S. J. Yoo, and D. H. Han, "Design and Implementation of System for Classifying Review of Product Attribute to Positive/ Negative," Korean Institute of Information Scientists and Engineers, 36(2C), (2009), 1-6.
3 Kang, H. H., S. J. Yoo, and D. H. Han, "Sentilexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews," Expert Systems with Applications, Vol. 39, No. 5(2012), 6000-6010.   DOI
4 Kim, J. O., S. S. Lee, and H. S. Yong, "Automatic Classification Scheme of Opinions Written in Korean," Korean Institute of Information Scientists and Engineers : Database, Vol. 38, No. 6(2011), 423-428.
5 Kim, K. M. and J. H. Lee, "Sentiment Analysis of Twitter using Lexical Functional Information," Korean Institute of Information Scientists and Engineers, (2014), 734-736.
6 Kim, S. W. and N. Kim, "A Study on the Effect of Using Sentiment Lexicon in Opinion Classification," Korea Intelligent Information System Society, (2013), 121-128.
7 Kim, Y. S. and Y. H. Seo, "Journal of Korea Entertainment Industry Association," Korea Entertainment Industry Association, (2013), 206-210.
8 Kouloumpis, E., T. Wilson, and J. Moore, "Twitter sentiment analysis: The good the bad and the omg!," ICWSM, Vol. 11(2011), 538-541.
9 Lee, C. S., D. H. Choi, S. S. Kim, and J. W. Kang, "Classification and Analysis of Emotion in Korean Microblog Texts," Korean Institute of Information Scientists and Engineers,: Database, Vol. 40, No. 3(2013), 159-167.
10 Liu, B., "Sentiment analysis and subjectivity," Handbook of natural language processing, Vol. 2(2010), 627-666.
11 Mikolov, T., K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," 2013, arXiv preprint arXiv:1301.3781.
12 Narayanan, R., B. Liu, and A. Choudhary, "Sentiment analysis of conditional sentences," Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Vol. 1(2009), 180-189.
13 Ohana, B. and B. Tierney, "Sentiment classification of reviews using SentiWordNet," 9th. IT & T Conference, (2009), 13.
14 Sadamitsu, K., S. Sekine, and M. Yamamoto, "Sentiment Analysis Based on Probabilistic Models Using Inter-Sentence Information," LREC, (2008).
15 Saggiona, H., and A. Funk, "Interpreting Senti WordNet for opinion classification," Proceedings of the seventh conference on international language resources and evaluation LREC10, (2010), 1129-1133.
16 Saif, H., Y. He, and H. Alani, "Alleviating data sparsity for twitter sentiment analysis," CEUR Workshop Proceedings (CEUR-WS. org), (2012), 2-9.
17 Saif, H., Y. He, and H. Alani, "Semantic sentiment analysis of twitter," The Semantic Web-ISWC 2012, 2012b, 508-524.
18 Seo, J. H., H. J. Cho, and J. T. Choi, "Design for Opinion Dictionary of Emotion Applying Rules for Antonym of the Korean Grammar," JKIIT, Vol. 13, No. 2( 2015), 109-117.
19 Seo, J. R. and C. Ko, "Big Data Analysis by Sensitivity Analysis," Journal of The Society of Convergence Knowledge, Vol. 2, No. 1(2014), 15-21.
20 Song, J. S., S. W. Lee, "Automatic Construction of Positive/Negative Feature-Predicate Dictionary for Polarity Classification of Product Reviews," Korean Institute of Information Scientists and Engineers : Software and Application, Vol. 38, No. 3(2011), 157-168.
21 Hamouda, A., and M. Rohaim, "Reviews classification using sentiwordnet lexicon," World Congress on Computer Science and Information Technology, 2011.
22 An, J. K. and H. W. Kim, "Building a Korean Sentiment Dictionary and Applications of Natural Language Processing," Korea Intelligent Information System Society, (2014), 177-182.
23 Choi, S. J., and O. B. Kwon, "The Study of Developing Korean SentiWordNet for Big Data Analytics - Focusing on Anger Emotion -", The Journal of Society for e-Business Studies, Vol. 19, No. 4(2014), 1-19.   DOI
24 Go, A., R. Bhayani, and L. Huang, "Twitter sentiment classification using distant supervision", CS224N Project Report, Stanford, 2009, 1-12.
25 Harris, Zellig S., "Distributional structure," Word, 1954.
26 Hong, S. R., Y. O. Jeong, and J. H. Lee, "Semi-supervised learning for sentiment analysis in mass social media," Journal of Korean Institute of Intelligent Systems, Vol. 24, No. 5(2014), 482-488.   DOI
27 Hung, C. and H. K. Lin, "Using objective words in SentiWordNet to improve word-of-mouth sentiment classification," IEEE Intelligent Systems, Vol. 28, No. 2(2013), 47-54.   DOI
28 Jang, H. J., "Classification System for Emotional Verbs and Adjectives," Korea Society for Information Management, (2001), 29-34.