k-Structure를 이용한 한국어 상품평 단어 자동 추출 방법

Automatic Extraction of Opinion Words from Korean Product Reviews Using the k-Structure

  • 강한훈 (세종대학교 컴퓨터공학과) ;
  • 유성준 (세종대학교 컴퓨터공학과) ;
  • 한동일 (세종대학교 컴퓨터공학과)
  • 투고 : 2009.12.08
  • 심사 : 2010.04.02
  • 발행 : 2010.06.15

초록

감정어 추출과 관련하여 기존 영어권 연구에서 제시된 방법의 대부분은 한국어에 직접 적용이 쉽지 않다. 한국어권 연구에서 제시된 방법 중 수작업에 의한 방법은 감정어 추출에 많은 시간이 걸린다는 문제점이 있다. 영어 시소러스 기반 한국어 감정어 추출 기술은 한국어와 영어 단어간 일대일 부정합에서부터 기인하는 정확도의 저하를 제고해야 하는 과제를 갖고 있다. 한국어 구문 분석기를 기반으로 한 연구는 출현 빈도가 낮은 감정어를 선정하지 못할 수 있는 문제점을 내포하고 있다. 본 논문에서는 한국어 상품평 중 단순한 문장에서 감정어를 자동으로 추출하는 데 있어 기존에 제안된 한국어권 연구에 상호 보완적으로 정확도를 향상시킬 수 있는 k-Structure(k=5 또는 8) 기법을 제안한다. 단순한 문장이라 함은 패턴 길이를 최대 3으로 한다. 이는 평가 대상 상품(예를 들어 '카메라')의 속성 명 f (예를 들어 카메라의 '배터리')를 기준으로 ${\pm}2$의 거리에 감정어가 포함되어 있는 문장을 의미한다. 성능 실험은 국내 주요 쇼핑몰로부터 수집한 1,868개의 상품평을 대상으로 미리 주어진 8개의 속성 명에 대한 감정어를 k-Structure를 이용하여 자동으로 추출하고 그 정확도를 평가하였다. 그 결과, k=5일 경우 평균 79.0%의 재현률, 87.0%의 정확률을 보였고, k=8일 경우 평균 92.35%의 재현률, 89.3%의 정확률을 얻을 수 있었다. 또한, 영어권 연구에서 제안된 방법 중 PMI-IR(Pointwise Mutual Information-Information Retrieval) 기법을 이용하여 실험을 수행하였다. 이 결과, 평균 55%의 재현률과 57%의 정확률을 보였다.

In relation to the extraction of opinion words, it may be difficult to directly apply most of the methods suggested in existing English studies to the Korean language. Additionally, the manual method suggested by studies in Korea poses a problem with the extraction of opinion words in that it takes a long time. In addition, English thesaurus-based extraction of Korean opinion words leaves a challenge to reconsider the deterioration of precision attributed to the one to one mismatching between Korean and English words. Studies based on Korean phrase analyzers may potentially fail due to the fact that they select opinion words with a low level of frequency. Therefore, this study will suggest the k-Structure (k=5 or 8) method, which may possibly improve the precision while mutually complementing existing studies in Korea, in automatically extracting opinion words from a simple sentence in a given Korean product review. A simple sentence is defined to be composed of at least 3 words, i.e., a sentence including an opinion word in ${\pm}2$ distance from the attribute name (e.g., the 'battery' of a camera) of a evaluated product (e.g., a 'camera'). In the performance experiment, the precision of those opinion words for 8 previously given attribute names were automatically extracted and estimated for 1,868 product reviews collected from major domestic shopping malls, by using k-Structure. The results showed that k=5 led to a recall of 79.0% and a precision of 87.0%; while k=8 led to a recall of 92.35% and a precision of 89.3%. Also, a test was conducted using PMI-IR (Pointwise Mutual Information - Information Retrieval) out of those methods suggested in English studies, which resulted in a recall of 55% and a precision of 57%.

키워드

참고문헌

  1. P. Turney, "Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews," In Proceedings of the Meeting of the Association for Computational Linguistics(ACL'02), pp.417-424 (2002).
  2. Bo Pang, Lillian Lee and Shivakumar Vaithyanathan, "Thumbs up? Sentiment Classification using Machine Learning Techniques," In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp.79-86, 2002.
  3. K.Dave, S. Lawrence, and D. Pennock, "Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews," In Proceedings of the 12th Intl. World Wide Web Conference (WWW '03), pp. 512-528, 2003.
  4. Qiang Ye, Ziqiong Zhang, Rob Law, "Sentiment classification of online reviews to travel destination by supervised machine learning approaches," Expert Systems with Applications, Elsevier, pp.1-9, 2008.
  5. Hanhoon Kang, Seong Joon Yoo, Dongil Han, "Accessing Positive and Negative Online Opinions," In Proceedings of the 13th International Conference on Human-Computer Interaction, HCII 2009, LNCS 5616, pp.359-368.
  6. Youngho Kim, Yuchul Jung, and Sung-Hyon Myaeng," An Opinion Analysis System Using Domain-Specific Lexical Knowledge," In Proceedings of the 4th Asia Information Retrieval Symposium, AIRS 2008, LNCS 4993, pp.466-471.
  7. M. Hu and B. Liu, "Mining and Summarizing Customer Reviews," In Proceedings of ACM SIGKDD Intl. Conf on Knowledge Discovery and Data Mining(KDD '04), pp.168-177, 2004.
  8. Soo-Min Kim, Eduard Hovy, "Determining the Sentiment of Opinons," Proceedings of the COLING conference, pp.1-8, 2004.
  9. Ana-Maria Popescu, Oren Etzioni, "Extracting Product Features and Opinions from Reviews," Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp.339-346, 2005.
  10. Qi Su, Kun Xiang, Houfeng Wang, Bin Sun, Shiwen Yu, "Using Pointwise Mutual Information to Identify Implicit Features in Customer Reviews," International Conference on the Computer Processing of Oriental Languages, pp.22-30, 2006.
  11. Qingliang Miao, Qiudan Li, Ruwei Dai, "An integration strategy for mining product features and opinions," Proceeding of the 17th ACM conference on Information and knowledge management, pp. 1369-1370, 2008.
  12. http://www.moransoft.com/sentidict.pdf,Technical Report
  13. Jaewon Hwang and Youngjoong Ko, "A Korean Sentence and Document Sentiment Classification System Using Sentiment Features," Journal of Korean Institute of Information Scientists and Engineers (KIISE): Computing Practices and Letters, vol.14, no.3, pp.336-340, May, 2008. (ISSN 1229- 6848)
  14. J. Myung, D. Lee, S. Lee, "A Korean Product Review Analysis System Using a Semi-Automatically Constructed Semantic Dictionary," Journal of KIISE : Software and Applications, vol.35, no.6, pp.347-405, Jun. 2008. (in Korean)
  15. Morphemic Analyzer Tool (Korean Language Technology Ver. 2.10b), http://nlp.kookmin.ac.kr
  16. Hanhoon Kang, Seong Joon Yoo, Dongil Han, "Modeling Web Crawler Wrappers to Collect User Reviews on Shopping Mall with Various Hierarchical Tree Structure," In Proceedings of The 2009 International Conference on Web Information Systems and Mining, IEEE Computer Society, pp.69-73, 2009.
  17. A. Esuli and F. Sebastiani, "Determining the Semantic Orientation of Terms through Gloss Classification," ACM, pp.617-624, 2005.