DOI QR코드

DOI QR Code

Generating a Korean Sentiment Lexicon Through Sentiment Score Propagation

감정점수의 전파를 통한 한국어 감정사전 생성

  • 박호민 (한국해양대학교 컴퓨터공학과) ;
  • 김창현 (한국전자통신연구원 인공지능연구소) ;
  • 김재훈 (한국해양대학교 컴퓨터공학과)
  • Received : 2019.07.08
  • Accepted : 2019.10.15
  • Published : 2020.02.29

Abstract

Sentiment analysis is the automated process of understanding attitudes and opinions about a given topic from written or spoken text. One of the sentiment analysis approaches is a dictionary-based approach, in which a sentiment dictionary plays an much important role. In this paper, we propose a method to automatically generate Korean sentiment lexicon from the well-known English sentiment lexicon called VADER (Valence Aware Dictionary and sEntiment Reasoner). The proposed method consists of three steps. The first step is to build a Korean-English bilingual lexicon using a Korean-English parallel corpus. The bilingual lexicon is a set of pairs between VADER sentiment words and Korean morphemes as candidates of Korean sentiment words. The second step is to construct a bilingual words graph using the bilingual lexicon. The third step is to run the label propagation algorithm throughout the bilingual graph. Finally a new Korean sentiment lexicon is generated by repeatedly applying the propagation algorithm until the values of all vertices converge. Empirically, the dictionary-based sentiment classifier using the Korean sentiment lexicon outperforms machine learning-based approaches on the KMU sentiment corpus and the Naver sentiment corpus. In the future, we will apply the proposed approach to generate multilingual sentiment lexica.

감정분석은 문서 또는 대화상에서 주어진 주제에 대한 태도와 의견을 이해하는 과정이다. 감정분석에는 다양한 접근법이 있다. 그 중 하나는 감정사전을 이용하는 사전 기반 접근법이다. 본 논문에서는 널리 알려진 영어 감정사전인 VADER를 활용하여 한국어 감정사전을 자동으로 생성하는 방법을 제안한다. 제안된 방법은 세 단계로 구성된다. 첫 번째 단계는 한영 병렬 말뭉치를 사용하여 한영 이중언어 사전을 제작한다. 제작된 이중언어 사전은 VADER 감정어와 한국어 형태소 쌍들의 집합이다. 두 번째 단계는 그 이중언어 사전을 사용하여 한영 단어 그래프를 생성한다. 세 번째 단계는 생성된 단어 그래프 상에서 레이블 전파 알고리즘을 실행하여 새로운 감정사전을 구축한다. 이와 같은 과정으로 생성된 한국어 감정사전을 유용성을 보이려고 몇 가지 실험을 수행하였다. 본 논문에서 생성된 감정사전을 이용한 감정 분류기가 기존의 기계학습 기반 감정분류기보다 좋은 성능을 보였다. 앞으로 본 논문에서 제안된 방법을 적용하여 여러 언어의 감정사전을 생성하려고 한다.

Keywords

References

  1. TTA, Sentiment Ontology for Social Web, Telecommunications Technology Association Report TTAK.KO-10.0639/R1, 2013 (in Korean).
  2. B. Liu, "Web Data Mining," Springer, 2007.
  3. D. Hussein, "A survey on sentiment analysis challenges," Journal of King Saud University - Engineering Sciences, Vol.30, No.4, pp.330-338, 2018. https://doi.org/10.1016/j.jksues.2016.04.002
  4. K.-J. Lee, J.-H. Kim, H.-W. Seo, and K.-S. Ryoo, "Feature weighting for opinion classification of comments on news articles," Journal of the Korean Society of Marine Engineering, Vol.34, No.6, pp.871-879, 2010 (in Korean). https://doi.org/10.5916/jkosme.2010.34.6.871
  5. P. Turney, "Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews," Proceedings of the Association for Computational Linguistics, pp.417-424, 2002.
  6. E. Gilbert and C. J. Hutto, "VADER : A parsimonious rulebased model for sentiment analysis of social media text," Proceedings of the 8th International Conference on Weblogs and Social Media, pp.216-225, 2014.
  7. M. Taboada and J. Brooke, "Lexicon-based methods for sentiment analysis," Computational Linguistics, Vol.37, No.2, pp.272-274, 2011.
  8. L. Augustyniak, P. Szymanski, T. Kajdanowicz, and W. Tuliglowicz, "Comprehensive study on lexicon-based ensemble classification sentiment analysis," Entropy, Vol. 18, No.1, pp.1-29, 2015.
  9. L. Zhang, S. Wang, and B. Liu, "Deep learning for sentiment analysis: A survey," arXiv:1801.07883, 2018.
  10. C. Heo and S.-Y. Ohn, "A novel method for constructing sentiment dictionaries using word2vec and label propagation," The Journal of Korean Institute of Next Generation Computing, Vol.13, No.2, pp.93-101, 2017 (in Korean).
  11. Z. Xiaojin and G. Zoubin, "Learning from Labeled and Unlabeled Data from Label Propagation," Technical Report CMU-CALD-02-107, Carnegie Mellon University, 2002.
  12. G. A. Miller, R. Beckwith, C. D. Fellbaum, D. Gross, and K. Miller. "WordNet: An online lexical database," International Journal of Lexicograph, Vol.3, No.4, pp.235-244, 1990. https://doi.org/10.1093/ijl/3.4.235
  13. A. Hassan, V. Qazvinian, and D. Radev, "What's with the attitude? identifying sentences with attitude in online discussion," Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp.1245-1255, 2010.
  14. E. C. Dragut, C. Yu, P. Sistla, and W. Meng, "Construction of a sentimental word dictionary," Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp.1761-1764, 2010.
  15. S. Mohammed, C. Dunne, and B. Dorr, "Generating highcoverage semantic orientation lexicons from overtly marked words and a thesaurus," Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp.599-608, 2009.
  16. J.-H. Kim, "The Graph-based Method for Construction of Domain-oriented Sentiment Dictionary," MS. Thesis, Dept. of Computer Engineering, Korea Aerospace University, Seoul, Republic of Korea, 2015 (in Korean).
  17. J. An and H.-W. Kim, "Building a Korean sentiment lexicon using collective intelligence," Journal of Intelligence System, Vol.21, No.2, pp.49-67, 2015 (in Korean).
  18. J. W. Pennebaker, C. K. Chung, M. Ireland, A. Gonzales, and R. J. Booth, "The development and psychometric properties of LIWC2007," Austin, TX: LIWC net, 2007.
  19. A. Esuli and F. Sebastiani, "SentiWordNet: A publicly available lexical resource for opinion mining," Proceedings of the 5th International Conference on Language Resources and Evaluation, pp. 417-422, 2006.
  20. H.-W. Seo, H.-S. Kwon, and J.-H. Kim, "Extended pivotbased approach for bilingual lexicon extraction," Journal of the Korean Society of Marine Engineering, Vol.38, No. 5, pp.557-565, 2014. https://doi.org/10.5916/jkosme.2014.38.5.557
  21. J.-H. Kim, H.-S. Kwon, and H.-W. Seo, "Evaluating a pivotbased approach for bilingual lexicon extraction," Computational Intelligence and Neuroscience, Vol.2015, pp.1-13, 2015.
  22. H.-W. Seo, H.-C. Kim, H.-Y. Cho, J.-H. Kim, and S.-I. Yang, "Automatically constructing Korean-English parallel corpus from Web documents," Proceedings of the 26th KIPS Fall Conference, Vol.13, No.2, pp.161-164, 2006 (in Korean).
  23. K. W. Church and P. Hanks, "Word association norms, mutual information and lexicography," Computational Linguistics, Vol.16, No.1, pp.22-29, 1990.
  24. P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, "Enriching word vectors with subword information," Transactions of the Association for Computational Linguistics, Vol.5, pp.135-146, 2017. https://doi.org/10.1162/tacl_a_00051
  25. T. Loughran, and B. McDonald, "When is a liability not a liability? Textual analysis, dictionaries and 10-Ks," The Journal of Finance, Vol.66, No.1, pp.35-66, 2011. https://doi.org/10.1111/j.1540-6261.2010.01625.x
  26. J.-C. Shin and C.-Y. Ock, "A Korean morphological analyzer using a pre-analyzed partial word-phrase dictionary," Journal of The Korean Institute of Information Scientists and Engineers, Vol.39, No.5, pp.415-424, 2012 (in Korean).
  27. A. Geron, "Hands-On Machine Learning with Scikit-Learn & TensorFlow," O'Reilly, 2017.
  28. M.-H. Kim, Y.-M. Jo, H.-Y. Jang, and H.-P. Shin, "KOSAC: Korean Sentiment Analysis Corpus," Proceedings of the Korean Institute of Information Scientists and Engineers, pp.650-652, 2013 (in Korean).
  29. S. Sohn, M.S. Park, J.-E. Park, J.-H. Sohn, "Korean emotion vocabulary: Extraction and categorization of feeling words," Korean Journal of the Science of Emotion & Sensibility, Vol.15, No.1, pp.105-120, 2012 (in Korean).
  30. S.-M. Park, C.-W. Na, M.-S. Choi, D.-H. Lee, and B.-W. On, "KNU Korean sentiment lexicon: Bi-LSTM-based method for building a Korean sentiment lexicon," Journal. of the Intelligence Information Systtem, Vol.24, No.4, pp. 219-240, 2018 (in Korean).
  31. I. J. Park and K. H. Min, "Making a list of Korean emotion terms and exploring dimensions underlying them," Korean Journal of Social and Personality Psychology, Vol.19, No. 1, pp.109-129, 2005 (in Korean).