문장 감정 강도를 반영한 개선된 자질 가중치 기법 기반의 문서 감정 분류 시스템

A Document Sentiment Classification System Based on the Feature Weighting Method Improved by Measuring Sentence Sentiment Intensity

  • 발행 : 2009.06.15

초록

본 논문은 한국어 문서감정 분류에서 각 문장의 감정 정도의 차이를 고려하여 자질의 가중치를 계산하는 방법을 제안한다. 감정자질은 어휘 자원으로서 감정을 가지는 단어들의 집합이며, 학습데이터를 이용하여 이 감정자질의 카이제곱 통계량 값(${\chi}^2$ statistic)을 얻을 수 있다. 이렇게 얻어진 카이제곱 통계량 값으로 문서에서 출현한 각 문장의 감정강도를 수치화 할 수 있다. 각 문장의 감정강도는 문서에서 가장 강한 감정을 가진 문장에 근한 비율로 계산되며, 이 값을 TF-IDF 가중치 기법에 적용하여 최종적인 자질의 가중치를 결정하게 된다. 그리고 일반적으로 문서 분류에서 뛰어난 성능을 보여주는 지지벡터기계(Support Vector Machine)를 사용하여 기계학습을 수행한 후 성능을 평가한다. 성능평가에서 제안된 기법은 문장감정의 강도를 고려하지 않은 내용어(Content Word) 기반의 자질을 사용한 경우보다 약 2.0%의 성능향상을 얻었다.

This paper proposes a new feature weighting method for document sentiment classification. The proposed method considers the difference of sentiment intensities among sentences in a document. Sentiment features consist of sentiment vocabulary words and the sentiment intensity scores of them are estimated by the chi-square statistics. Sentiment intensity of each sentence can be measured by using the obtained chi-square statistics value of each sentiment feature. The calculated intensity values of each sentence are finally applied to the TF-IDF weighting method for whole features in the document. In this paper, we evaluate the proposed method using support vector machine. Our experimental results show that the proposed method performs about 2.0% better than the baseline which doesn't consider the sentiment intensity of a sentence.

키워드

참고문헌

  1. M. Rimon, 'Sentiment Classification: Linguistic and Non-Linguistic Issues,' Hebrew University
  2. B. Pang, L. Lee and S. Vaithyanathan, 'Thumbs up? Sentiment Classification Using Machine Learning Techniques,' In Proceedings of the EMNLP, pp. 79-86, 2002
  3. K. Dave, S. Lawrence, D.M. Pennock, 'Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews,' In Proceedings of the 12th WWW, pp. 519-528, 2003
  4. L.W. Ku, L.Y. Lee, T.H. Wu, and H.H. Chen, 'Major Topic Detection and Its Application to Opinion Summarization,' In Proceedings of the ACM SIGIR, pp. 627-628, 2005
  5. S.M. Kim and E. Hovy, 'Determining the Sentiment of Opinions,' In Proceedings of the COLING conference, pp. 1367-1373, 2004
  6. M. Hu and B. Liu, 'Mining and Summarizing Customer Reviews,' In Proceedings of the KDD, pp. 168-177, 2004
  7. J. Yi, T. Nasukawa, R. Bunescu, and W. Niblack, 'Sentimental Analyzer : Extracting Sentiments about a Given Topic using Natural Language Processing Techniques,' In Proceedings of International Conference on Data Mining, pp. 427-434, 2003
  8. N. Hiroshima, S. Yamada, O. Furuse and R. Kataoka, 'Searching for Sentences Expressing Opinions by Using Declaratively Subjective Clues,' In Proceedings of the Workshop on Sentiment and Subjectivity in Text, pp. 39-46, 2006
  9. P.D. Turney and M.L. Littman, 'Measuring Praise and Criticism: Inference of Semantic Orientation from Association,' In Proceedings of the ACM Transactions on Information Systems, pp. 315-346, 2003 https://doi.org/10.1145/944012.944013
  10. B. Pang and L. Lee, 'A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts,' In Proceedings of the ACL, pp. 271-278, 2004 https://doi.org/10.3115/1218955.1218990
  11. Y. Mao and G. Lebanon, 'Isotonic Conditional Random Fields and Local Sentiment Flow,' In Proceedings of the NIPS, 2007
  12. P. Turney, 'Thumbs up or thumbs down? Sentiment orientation applied to unsupervised classification of reviews,' In Proceedings of the ACL, pp. 417-424, 2002 https://doi.org/10.3115/1073083.1073153
  13. Y. Choi, C. Cardie, E. Riloff, and S. Patwardhan, 'Identifying sources of opinions with conditional random fields and extraction patterns,' In Proceedings of the HLT/EMNLP, pp. 355-362, 2005 https://doi.org/10.3115/1220575.1220620
  14. M. Thomas, B. Pang, and L. Lee, 'Get out the vote: Determining support or opposition from congressional floor-debate transcripts,' In Proceedings of the EMNLP, pp. 327-335, 2006
  15. A. Esuli and F. Sebastiani, 'Determining the Semantic Orientation of Terms through Gloss Classification,' In Proceedings of the CIKM, pp. 617-624, 2005 https://doi.org/10.1145/1099554.1099713
  16. E. Riloff and J. Wiebe, 'Learning extraction patterns for subjective expressions,' In Proceedings of the EMNLP, pp. 105-112, 2003 https://doi.org/10.3115/1119355.1119369
  17. 황재원, 고영중, '감정 분류를 위한 한국어 감정 자질 추출 기법과 감정 자질의 유용성 평가', 한국정보과학회논문지: 컴퓨팅의 실제 및 레터, 제14권 제3호, pp. 336-340, 2008
  18. A. Esuli and F. Sebastiani, 'PageRanking WordNet Synsets: An Application to Opinoin Mining,' In Proceedings of the ACL, pp. 424-431, 2007
  19. M. Murata, Q. Ma, K. Uchimoto, H. Ozaku, H. Isahara, and M. Utiyama, 'Information Retrieval Using Location and Category Information,' Journal of the Association for Natural Language Processing, Vol.7, No.2, 2000
  20. Y. Ko, J. Park, and J, Seo, 'Automatic Text Categorization using the Importance of Sentences,' In Proceedings of the 19th International Conference on COLING, pp. 474-480, 2002 https://doi.org/10.3115/1072228.1072331
  21. C. Cortes and V. Vapnik 'Support-Vector Networks,' Machine Learning, Vol.20, pp. 273-297, 1995
  22. T. Joachims, 'Text Categorization with Support Vector Machines: Learning with Many relevant Features,' In Proceedings of the ECML, pp. 137- 142, 1998