DOI QR코드

DOI QR Code

Automatic Construction of a Negative/positive Corpus and Emotional Classification using the Internet Emotional Sign

인터넷 감정기호를 이용한 긍정/부정 말뭉치 구축 및 감정분류 자동화

  • 장경애 (서울과학기술대학교 IT정책전문대학원 산업정보시스템) ;
  • 박상현 (연세대학교 컴퓨터과학과) ;
  • 김우제 (서울과학기술대학교 글로벌융합산업공학과)
  • Received : 2014.10.13
  • Accepted : 2015.02.11
  • Published : 2015.04.15

Abstract

Internet users purchase goods on the Internet and express their positive or negative emotions of the goods in product reviews. Analysis of the product reviews become critical data to both potential consumers and to the decision making of enterprises. Therefore, the importance of opinion mining techniques which derive opinions by analyzing meaningful data from large numbers of Internet reviews. Existing studies were mostly based on comments written in English, yet analysis in Korean has not actively been done. Unlike English, Korean has characteristics of complex adjectives and suffixes. Existing studies did not consider the characteristics of the Internet language. This study proposes an emotional classification method which increases the accuracy of emotional classification by analyzing the characteristics of the Internet language connoting feelings. We can classify positive and negative comments about products automatically using the Internet emoticon. Also we can check the validity of the proposed algorithm through the result of high precision, recall and coverage for the evaluation of this method.

네티즌은 인터넷을 통해서 상품을 구매하고 상품에 대한 감정을 긍정 혹은 부정으로 상품평에 표현한다. 상품평에 대한 분석은 잠재적 소비자뿐만 아니라 기업의 의사결정에 중요한 자료가 된다. 따라서 인터넷의 대량 리뷰에서 의미 있는 정보를 분석하여 의견을 도출하는 오피니언 마이닝 기술의 중요성이 증대되고 있다. 기존의 연구는 대부분이 영어를 기반으로 진행되었고 아직 한글에 대한 상품평 분석은 활발히 이루어 지지 않고 있다. 또한 한글은 영어와 달라 꾸미는 말과 어미가 복잡한 특성을 갖고 있다. 그리고 기존의 연구는 통계적 기법, 사전 기법, 기계학습 기법 등을 사용하여 연구되었으나 인터넷 언어의 특성을 감안하지는 못하였다. 본 연구에서는 감정이 포함된 인터넷 언어의 특성을 분석하여 감정분석의 정확률을 높이는 감정분류 방법을 제안한다. 이를 통해 데이터에 독립적인 인터넷 감정기호를 이용해서 자동으로 긍정 및 부정 상품평을 분류할 수 있었고 높은 정확률, 재현율, Coverage 결과를 통해서 제안 알고리즘의 유효성을 확인할 수 있었다.

Keywords

References

  1. KISA, "Internet Use Survey 2012 Survey," KISA, pp. 23-37, 2012.
  2. Kook Yong Lee and Seung Woon Kim, "The Impact of Online Reviews in Purchasing Decision Making," Academy of customer satisfaction management, Vol. 14, No. 3, pp. 85-102, 2012.
  3. Eun Ah Seo, Speaking as a writing or linguistic analysis of Quote, Reply, good reply, bad reply, ID and emoticons, Communication Books, Seoul, 2007.
  4. Kyungmi Park, Hogun Park, Hyunggun Kim and Heedong Ko, "Opinion mining research in SNS," Journal of KIISE, Vol. 29, No. 11, pp. 54-60, 2011.
  5. Jaeseok Myung, Dongjoo Lee and Sang-goo Lee, "A Korean Product Review Analysis System Using a Semi-Automatically Constructed Semantic Dictionary," Journal of KIISE: Software and Application, Vol. 35, No. 6, pp. 392-403, 2008.
  6. Junsoo Shin, Harksoo Kim, "A Robust Patternbased Feature Extraction Method for Sentiment Categorization of Korean Customer Reviews," Journal of KIISE:Software and Application, Vol. 37, No. 12, pp. 946-950, 2010.
  7. Jung-yeon Yang, Jaeseok Myung and Sang-goo Lee, "A Sentiment Classification Method Using Context Information in Product Review Summarization," Journal of KIISE:Database, Vol. 36, No. 4, pp. 254-262, 2009.
  8. Jongseok Song and Soowon Lee, "Automatic Construction of Positive/Negative Feature-Predicate Dictionary for Polarity Classification of Product Reviews," Journal of KIISE:Software and Application, Vol. 38, No. 3, pp. 157-168, 2011.
  9. Likun Qiu, WeishiZhang, Changjian Hu,KaiZhao. "SELC:A Self-Supervised Model for Sentiment Classification," Conference on Information and Knowledge Management, Proc. of the 18th ACM Conferenceon Information and Knowledge Management, Hong Kong, China, 929-936, 2009.
  10. Hu, M. and Liu, B., "Mining and summarizing customer reviews," Proc. of the 10th ACM SIGKDD Conf., pp. 168-177, 2004.
  11. Jae-Young Chang, "A Sentiment Analysis Algorithm for Automatic Product Reviews Classification in On-Line Shopping Mall," The Journal of Society for e-Business Studies, Vol. 14, No. 4, pp. 19-33, 2009.
  12. Gi young Kim, Haiin Lee, Suhwan Yook and Woojin Paik, "Customer Preference Identification System using Natural Language Processing-based Analysis and Automatic Classification of Product Reviews," Korea Society for Information Management, Vol. 16, pp. 65-70, 2009.
  13. Hanhoon Kang, Seong Joon Yoo and Dongil Han, "Automatic Extraction of Opinion Words from Korean Product Reviews Using the k-Structure," Journal of KIISE:Software and Application, Vol. 37, No. 6, pp. 470-479, 2010.
  14. Xiaowen Ding, Bing Liu., "The Utility of Linguistic Rules in Opinion Mining," Proc. of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 811-812, 2007.
  15. Theresa Wilson, Janyce Wiebe and Paul Hoffmann. "Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis," HLT/EMNLP, pp. 347-354, 2005.
  16. Alexander Pak and Patrick Paroubek, "Twitter based system: Using Twitter for Disambiguating Sentiment Ambiguous Adjectives," Proc. of International Workshop of Semantic Evaluations, 2010.
  17. Courses, E., and Surveys, T., "Using SentiWordNet for multilingual sentiment analysis," Data Engineering Workshop ICDEW, 2008.
  18. Pavel Smrz, "Using WordNet for Opinion Mining," Proc. of the International WordNet Conference 2006, pp. 333-335, 2006.
  19. Peter D. Turney, "Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews," Proc. of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, pp. 417-424, 2002.
  20. Jonathon Read, "Using emoticons to reduce dependency in machine learning techniques for sentiment classification," In ACL, The Association for Computer Linguistics, 2005.
  21. Alexander Pak and Patrick Paroubek, "Twitter as a Corpus for Sentiment Analysis and Opinion Mining," Proc. of theEuropean Language Resources Association (ELRA), 2010.
  22. Hongjune Yune, Han-joon Kim and Jae-Young Jang, "An Efficient Search Method of Product Reviews Using Opinion Mining Techniques," KIISE Transactions on Computing Practices, Vol. 16, No. 2, pp. 222-226, 2010.
  23. H. Nishikawa, T. Hasegawa, Y. Matsuo, and G. Kikui, "Opinion summarization with integer linear programming formulation for sentence extraction and ordering," In COLING, 2010.
  24. K. Dave, S. Lawrence, D. Pennock, "Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews," Proc. of the 12th Intl. World Wide Web Conference (WWW '03), pp. 512-528, 2003.
  25. Qiang Ye, Ziqiong Zhang and Rob Law, "Sentiment classification of online reviews to travel destination by supervised machine learning approaches," Expert Systems with Applications, Elsevier, pp. 1-9, 2008.
  26. P. Turney and M. Littman, "Measuring praise and criticism: Inference of semantic orientation from association," Proc. of ACL-02, 40th Annual Meeting of the Association for Computational Linguistics, pp. 417-424, 2002.
  27. Minqing Hu and Bing Liu, "Mining and Summarizing Customer Reviews," KDD'04, Seattle, Washington, USA, 2004.
  28. V. Vapnik, "Estimation of Dependences Based on Empirical Data," Springer-Verlag, 1982.
  29. J.C Platt, "Fast training of support vector machines using sequential minimal optimization. Advances in kernel methods: support vector learning," MIT Press, Cambridge, MA, 1999.
  30. G. H. John, P. Langley, "Estimating Continuous Distributions in Bayesian Classifiers," Uncertainty in Artificial Intelligence, Vol. 11, pp. 338-345, 1995.
  31. J.R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo: CA, 1993.
  32. Internet shopping mall page: [Online]. Available: http://www.auction.co.kr
  33. Internet shopping mall page: [Online]. Available: http://www.gmarket.co.kr

Cited by

  1. Study on the social issue sentiment classification using text mining vol.26, pp.5, 2015, https://doi.org/10.7465/jkdi.2015.26.5.1167