An Empirical Comparison of Machine Learning Models for Classifying Emotions in Korean Twitter

Lim, Joa-Sang;Kim, Jin-Man;

doi:10.9717/kmms.2014.17.2.232

한국멀티미디어학회논문지 (Journal of Korea Multimedia Society)

제17권2호
/
Pages.232-239
/
2014
/
1229-7771(pISSN)
/
2384-0102(eISSN)

한국멀티미디어학회 (Korea Multimedia Society)

DOI QR Code

한국어 트위터의 감정 분류를 위한 기계학습의 실증적 비교

An Empirical Comparison of Machine Learning Models for Classifying Emotions in Korean Twitter

임좌상 (상명대학교 미디어소프트웨어학과) ;
김진만 (상명대학교 일반대학원 컴퓨터과학과)

투고 : 2014.01.15
심사 : 2014.02.19
발행 : 2014.02.28

https://doi.org/10.9717/kmms.2014.17.2.232 인용 PDF KSCI KPUBS

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

온라인에서의 글쓰기가 늘어나면서, 기계학습을 통해 이를 분류하는 연구가 늘고 있다. 그럼에도 불구하고 한국어로 작성된 마이크로블로그를 대상으로 한 연구는 많지 않다. 또한 통계적으로 기계학습을 평가한 연구를 찾아보기 힘들다. 본 논문에서는 트위터를 대상으로, 표본을 추출하고, 형태소와 음절을 자질로 사용하여 기계학습에 따라 감정을 분류하였다. 그 결과 약 76%정도 트위터에 포함된 감정이 분류되었다. Support Vector Machine이 Na$\ddot{i}$ve Bayes보다 정확했고, 선형모델도 비구조적인 텍스트 처리에 비선형모델에 상응하는 정확성을 보였다. 또한 형태소가 음절 자질에 비해 높은 정확성을 보이지 않았다.

As online texts have been rapidly growing, their automatic classification gains more interest with machine learning methods. Nevertheless, comparatively few research could be found, aiming for Korean texts. Evaluating them with statistical methods are also rare. This study took a sample of tweets and used machine learning methods to classify emotions with features of morphemes and n-grams. As a result, about 76% of emotions contained in tweets was correctly classified. Of the two methods compared in this study, Support Vector Machines were found more accurate than Na$\ddot{i}$ve Bayes. The linear model of SVM was not inferior to the non-linear one. Morphological features did not contribute to accuracy more than did the n-grams.

키워드

참고문헌

Gerald L Clore, Norbert Schwarz, and Michael Conway, Handbook of Social Cognition, Psychology Press, New York, pp. 323-417, 1994.
Michael W Morris and Dacher Keltner, "How Emotions Work: the Social Functions of Emotional Expression in Negotiations," Research in Organizational Behavior, Vol. 22, pp. 1-50, 2000. https://doi.org/10.1016/S0191-3085(00)22002-9
Peggy A Thoits, "The Sociology of Emotions," Annual Review of Sociology, Vol. 15, pp. 317-342, 1989. https://doi.org/10.1146/annurev.so.15.080189.001533
홍초희, 김학수, "트윗 감정 분류를 위한 다양한 기계학습 자질에 대한 비교 연구," 한국콘텐츠학회논문지, 제12권, 제12호, pp. 471-478, 2012. https://doi.org/10.5392/JKCA.2012.12.12.471
이철성, 최동희, 김성순, 강재우, "한글 마이크로블로그 텍스트의 감정 분류 및 분석," 정보과학회논문지:데이타베이스, 제40권, 제3호, pp. 159-167, 2013.
김민철, 심규승, 한남기, 김예은, 송민, "트위터상의 악의적 이용 자동분류," 한국문헌정보학회지, 제47권, 제1호, pp. 269-286, 2013.
Angela Fahrni and Manfred Klenner, "Old Wine or Warm Beer: Target-specific Sentiment Analysis of Adjectives," Proc. The Symposium on Affective Language in Human and Machine , pp. 60-63, 2008.
Minqing Hu and Bing Liu, "Mining and Summarizing Customer Reviews," Proc. The Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168-177, 2004.
Xiaowen Ding, Bing Liu, and Philip S Yu, "A Holistic Lexicon-based Approach to Opinion Mining," Proc. The International Conference on Web Search and Web Data Mining, pp. 231-240, 2008.
Maite Taboada, Julian Brroke, Milan Tofiloski, Kimberly Voll, and Manfred Stede, "Lexicon-based Methods for Sentiment Analysis," Computational Linguistics, Vol. 37, No. 2, pp. 267-307, 2011. https://doi.org/10.1162/COLI_a_00049
Ley Zhang, Riddhiman Ghosh, Mohamed Dekhil, Meichun Hsu, and Bing Liu, Combining Lexiconbased and Learning-based Methods for Twitter Sentiment Analysis, HP Laboratories, Technical Report HPL-2011, Vol. 89, 2011.
Bo Pang and Lillian Lee, "A Sentimental Education: Sentiment Analysis using Subjectivity Summarization based on Minimum Cuts," Proc. The 42nd Annual Meeting on Association for Computational Linguistics, pp. 271, 2004.
Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan, "Thumbs Up? Sentiment Classification using Machine Learning Techniques," Proc. Emnlp 2002, pp. 79-86, 2002.
이공주, 김재훈, 서형원, 류길수, "뉴스 댓글의 감정 분류를 위한 자질 가중치 설정," 한국마린엔지니어링학회지, 제34권, 제6호, pp. 871-879, 2010. https://doi.org/10.5916/jkosme.2010.34.6.871
Alec Go, Richa Bhayani, and Lei Huang, Twitter Sentiment Classification using Distant Supervision, CS224N Project Report, Stanford, pp. 1-12, 2009.
Taku Kudo, MeCab. version 0.996, 2013.
이준호, 안정수, 박현주, 김명호, "한글 문서의 효과적인 검색을 위한 n-Gram 기반의 색인 방법," 정보관리학회지, 제13권, 제1호, pp. 47-63, 1996.
김철수, 김양범, "대용량 전자사전 구축을 위한 국어 대사전의 통계 정보," 한국콘텐츠학회논문지, 제7권, 제6호, pp. 60-68, 2007. https://doi.org/10.5392/JKCA.2007.7.6.060
J Susan Milton and Jesse C Arnold, Introduction to Probability and Statistics: Principles and Applications for Engineering and the Computing Sciences, McGraw-Hill, Inc., New York, 2002.
Bernhard E Boser, Isabelle M Guyon, and Vladimir N Vapnik, "A Training Algorithm for Optimal Margin Classifiers," Proc. The Fifth Annual Workshop on Computational Learning Theory, pp. 144-152, 1992.
Jiawei Han, Micheline Kamber, and Jian Pei, Data Mining: Concepts and Techniques, Morgan kaufmann, San Francisco, California, 2006.
Yiming Yang and Xin Liu, "A Re-examination of Text Categorization Methods," Proc. The 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 42-49, 1999.
Jason DM Rennie and Ryan Rifkin, Improving Multi Class Text Classification with the Support Vector Machine, Technical Report 2001-026, MIT. 2001.
황두성, "지지벡터기계를 이용한 다중 분류 문제의 학습과 성능 비교," 멀티미디어학회논문지, 제11권, 제7호, pp. 1035-1042, 2008.
Thorsten Joachims, "Text Categorization with Support Vector Machines: Learning with Many Relevant Features," 1998.
Sotiris B Kotsiantis, "Supervised Machine Learning: a Review of Classification Techniques," Informatica, Vol. 31, No. 3, pp. 249-268, 2007.
Fabrice Colas and Pavel. Brazdil, "Comparison of Svm and Some Older Classification Algorithms in Text Classification Tasks," In Artificial Intelligence in Theory and Practice, Vol. 217, pp. 169-178, 2006. https://doi.org/10.1007/978-0-387-34747-9_18

피인용 문헌

Real-time Spatial Recommendation System based on Sentiment Analysis of Twitter vol.21, pp.3, 2016, https://doi.org/10.7838/jsebs.2016.21.3.015
Hotspot Analysis of Korean Twitter Sentiments vol.18, pp.2, 2015, https://doi.org/10.9717/kmms.2015.18.2.233
A Case Study on Machine Learning Applications and Performance Improvement in Learning Algorithm vol.14, pp.2, 2016, https://doi.org/10.14400/JDC.2016.14.2.245
A User Emotion Information Measurement Using Image and Text on Instagram-Based vol.17, pp.9, 2014, https://doi.org/10.9717/kmms.2014.17.9.1125
Emotion Prediction of Document using Paragraph Analysis vol.12, pp.12, 2014, https://doi.org/10.14400/JDC.2014.12.12.249
A Comparative Analysis of Social Commerce and Open Market Using User Reviews in Korean Mobile Commerce vol.21, pp.4, 2015, https://doi.org/10.13088/jiis.2015.21.4.053
Competitive intelligence in social media Twitter: iPhone 6 vs. Galaxy S5 vol.40, pp.1, 2016, https://doi.org/10.1108/OIR-03-2015-0068
Comparing Machine Learning Classifiers for Movie WOM Opinion Mining vol.9, pp.8, 2014, https://doi.org/10.3837/tiis.2015.08.025
텍스트 분석 기술 및 활용 동향 vol.42, pp.2, 2017, https://doi.org/10.7840/kics.2017.42.2.471
도플갱어 브랜드 이미지 효과에 대한 실증적 분석: 인터넷 커뮤니티를 중심으로 vol.26, pp.1, 2014, https://doi.org/10.5859/kais.2017.26.1.21
심박 정보 기반 위치 정보 융합형 감정 추론 어플리케이션 개발 vol.8, pp.8, 2014, https://doi.org/10.15207/jkcs.2017.8.8.083
비정형 데이터를 이용한 층간소음 탐지 : 네이버 카페를 대상으로 vol.25, pp.3, 2014, https://doi.org/10.7319/kogsis.2017.25.3.087
소셜 미디어 텍스트를 이용한 장소 선호도 분석 기법 vol.25, pp.4, 2014, https://doi.org/10.7319/kogsis.2017.25.4.055
빅데이터 분석을 위한 비용효과적 오픈 소스 시스템 설계 vol.19, pp.1, 2014, https://doi.org/10.15813/kmr.2018.19.1.007
Text Mining and Sentiment Analysis for Predicting Box Office Success vol.12, pp.8, 2018, https://doi.org/10.3837/tiis.2018.08.030
고객 감성 분석을 위한 학습 기반 토크나이저 비교 연구 vol.48, pp.3, 2020, https://doi.org/10.7469/jksqm.2020.48.3.421

한국멀티미디어학회논문지 (Journal of Korea Multimedia Society)

한국어 트위터의 감정 분류를 위한 기계학습의 실증적 비교

An Empirical Comparison of Machine Learning Models for Classifying Emotions in Korean Twitter

초록

키워드

참고문헌

피인용 문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)