Browse > Article
http://dx.doi.org/10.3745/KTSDE.2020.9.2.53

Generating a Korean Sentiment Lexicon Through Sentiment Score Propagation  

Park, Ho-Min (한국해양대학교 컴퓨터공학과)
Kim, Chang-Hyun (한국전자통신연구원 인공지능연구소)
Kim, Jae-Hoon (한국해양대학교 컴퓨터공학과)
Publication Information
KIPS Transactions on Software and Data Engineering / v.9, no.2, 2020 , pp. 53-60 More about this Journal
Abstract
Sentiment analysis is the automated process of understanding attitudes and opinions about a given topic from written or spoken text. One of the sentiment analysis approaches is a dictionary-based approach, in which a sentiment dictionary plays an much important role. In this paper, we propose a method to automatically generate Korean sentiment lexicon from the well-known English sentiment lexicon called VADER (Valence Aware Dictionary and sEntiment Reasoner). The proposed method consists of three steps. The first step is to build a Korean-English bilingual lexicon using a Korean-English parallel corpus. The bilingual lexicon is a set of pairs between VADER sentiment words and Korean morphemes as candidates of Korean sentiment words. The second step is to construct a bilingual words graph using the bilingual lexicon. The third step is to run the label propagation algorithm throughout the bilingual graph. Finally a new Korean sentiment lexicon is generated by repeatedly applying the propagation algorithm until the values of all vertices converge. Empirically, the dictionary-based sentiment classifier using the Korean sentiment lexicon outperforms machine learning-based approaches on the KMU sentiment corpus and the Naver sentiment corpus. In the future, we will apply the proposed approach to generate multilingual sentiment lexica.
Keywords
Sentiment Lexicon; Sentiment Analysis; Word Embedding; Label Propagation; Word Graph;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 K. W. Church and P. Hanks, "Word association norms, mutual information and lexicography," Computational Linguistics, Vol.16, No.1, pp.22-29, 1990.
2 P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, "Enriching word vectors with subword information," Transactions of the Association for Computational Linguistics, Vol.5, pp.135-146, 2017.   DOI
3 T. Loughran, and B. McDonald, "When is a liability not a liability? Textual analysis, dictionaries and 10-Ks," The Journal of Finance, Vol.66, No.1, pp.35-66, 2011.   DOI
4 J.-C. Shin and C.-Y. Ock, "A Korean morphological analyzer using a pre-analyzed partial word-phrase dictionary," Journal of The Korean Institute of Information Scientists and Engineers, Vol.39, No.5, pp.415-424, 2012 (in Korean).
5 A. Geron, "Hands-On Machine Learning with Scikit-Learn & TensorFlow," O'Reilly, 2017.
6 M.-H. Kim, Y.-M. Jo, H.-Y. Jang, and H.-P. Shin, "KOSAC: Korean Sentiment Analysis Corpus," Proceedings of the Korean Institute of Information Scientists and Engineers, pp.650-652, 2013 (in Korean).
7 S. Sohn, M.S. Park, J.-E. Park, J.-H. Sohn, "Korean emotion vocabulary: Extraction and categorization of feeling words," Korean Journal of the Science of Emotion & Sensibility, Vol.15, No.1, pp.105-120, 2012 (in Korean).
8 S.-M. Park, C.-W. Na, M.-S. Choi, D.-H. Lee, and B.-W. On, "KNU Korean sentiment lexicon: Bi-LSTM-based method for building a Korean sentiment lexicon," Journal. of the Intelligence Information Systtem, Vol.24, No.4, pp. 219-240, 2018 (in Korean).
9 I. J. Park and K. H. Min, "Making a list of Korean emotion terms and exploring dimensions underlying them," Korean Journal of Social and Personality Psychology, Vol.19, No. 1, pp.109-129, 2005 (in Korean).
10 TTA, Sentiment Ontology for Social Web, Telecommunications Technology Association Report TTAK.KO-10.0639/R1, 2013 (in Korean).
11 B. Liu, "Web Data Mining," Springer, 2007.
12 D. Hussein, "A survey on sentiment analysis challenges," Journal of King Saud University - Engineering Sciences, Vol.30, No.4, pp.330-338, 2018.   DOI
13 L. Zhang, S. Wang, and B. Liu, "Deep learning for sentiment analysis: A survey," arXiv:1801.07883, 2018.
14 K.-J. Lee, J.-H. Kim, H.-W. Seo, and K.-S. Ryoo, "Feature weighting for opinion classification of comments on news articles," Journal of the Korean Society of Marine Engineering, Vol.34, No.6, pp.871-879, 2010 (in Korean).   DOI
15 P. Turney, "Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews," Proceedings of the Association for Computational Linguistics, pp.417-424, 2002.
16 E. Gilbert and C. J. Hutto, "VADER : A parsimonious rulebased model for sentiment analysis of social media text," Proceedings of the 8th International Conference on Weblogs and Social Media, pp.216-225, 2014.
17 M. Taboada and J. Brooke, "Lexicon-based methods for sentiment analysis," Computational Linguistics, Vol.37, No.2, pp.272-274, 2011.
18 L. Augustyniak, P. Szymanski, T. Kajdanowicz, and W. Tuliglowicz, "Comprehensive study on lexicon-based ensemble classification sentiment analysis," Entropy, Vol. 18, No.1, pp.1-29, 2015.
19 C. Heo and S.-Y. Ohn, "A novel method for constructing sentiment dictionaries using word2vec and label propagation," The Journal of Korean Institute of Next Generation Computing, Vol.13, No.2, pp.93-101, 2017 (in Korean).
20 Z. Xiaojin and G. Zoubin, "Learning from Labeled and Unlabeled Data from Label Propagation," Technical Report CMU-CALD-02-107, Carnegie Mellon University, 2002.
21 G. A. Miller, R. Beckwith, C. D. Fellbaum, D. Gross, and K. Miller. "WordNet: An online lexical database," International Journal of Lexicograph, Vol.3, No.4, pp.235-244, 1990.   DOI
22 A. Hassan, V. Qazvinian, and D. Radev, "What's with the attitude? identifying sentences with attitude in online discussion," Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp.1245-1255, 2010.
23 J. An and H.-W. Kim, "Building a Korean sentiment lexicon using collective intelligence," Journal of Intelligence System, Vol.21, No.2, pp.49-67, 2015 (in Korean).
24 E. C. Dragut, C. Yu, P. Sistla, and W. Meng, "Construction of a sentimental word dictionary," Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp.1761-1764, 2010.
25 S. Mohammed, C. Dunne, and B. Dorr, "Generating highcoverage semantic orientation lexicons from overtly marked words and a thesaurus," Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp.599-608, 2009.
26 J.-H. Kim, "The Graph-based Method for Construction of Domain-oriented Sentiment Dictionary," MS. Thesis, Dept. of Computer Engineering, Korea Aerospace University, Seoul, Republic of Korea, 2015 (in Korean).
27 J. W. Pennebaker, C. K. Chung, M. Ireland, A. Gonzales, and R. J. Booth, "The development and psychometric properties of LIWC2007," Austin, TX: LIWC net, 2007.
28 A. Esuli and F. Sebastiani, "SentiWordNet: A publicly available lexical resource for opinion mining," Proceedings of the 5th International Conference on Language Resources and Evaluation, pp. 417-422, 2006.
29 H.-W. Seo, H.-S. Kwon, and J.-H. Kim, "Extended pivotbased approach for bilingual lexicon extraction," Journal of the Korean Society of Marine Engineering, Vol.38, No. 5, pp.557-565, 2014.   DOI
30 J.-H. Kim, H.-S. Kwon, and H.-W. Seo, "Evaluating a pivotbased approach for bilingual lexicon extraction," Computational Intelligence and Neuroscience, Vol.2015, pp.1-13, 2015.
31 H.-W. Seo, H.-C. Kim, H.-Y. Cho, J.-H. Kim, and S.-I. Yang, "Automatically constructing Korean-English parallel corpus from Web documents," Proceedings of the 26th KIPS Fall Conference, Vol.13, No.2, pp.161-164, 2006 (in Korean).