DOI QR코드

DOI QR Code

대용량 오피니언 문서에 대한 특성 기반 요약 기법

Feature-Based Summarization Method for a Large Opinion Documents Collection

  • 장재영 (한성대학교 컴퓨터공학과)
  • 투고 : 2015.11.25
  • 심사 : 2016.02.05
  • 발행 : 2016.02.29

초록

최근 SNS나 포털을 중심으로 다양한 분야 대해 대중들의 의견이 표현될 수 있는 환경이 확대되고 있고, 이로 인해 오피니언 문서들은 빠르게 대량화 되고 있다. 이러한 환경에서 대용량의 오피니언 문서들의 내용을 파악하기 위해서는 자동 요약 기술의 적용이 필수적이다. 하지만 오피니언 문서 내에는 대상 객체가 갖는 특성들과 주관적 표현들이 내재되어 있어 일반적인 요약 기법으로는 효율적인 요약이 불가능하다. 본 논문에서는 대용량의 오피니언 문서를 대상으로 주요 문장들을 추출하여 요약하는 기법을 제안한다. 제안된 기법에서는 사전에 정의된 오피니언 문서의 특성들에 대해서, 특성들에 대한 오피니언이 표현된 대표적인 문장들이 추출되도록 설계되었다. 또한 실험을 통하여 제안된 방법의 유용성을 증명하였다.

Recently, an environment in which public opinions are expressed about various areas is expanded around SNSs or internet potals, thus, opinion documents get bigger rapidly. Under these circumstances, it is essential to utilize automatic summarization techniques for understanding whole contents of large opinion documents. However, it is hard to summarize efficiently those documents with traditional text summarization technologies since the documents include subject expressions as well as features of targets objects. Proposed method in this paper defines features of opinion documents, and designed to retrieve representative sentences expressing opinions of those features. In addition, through experiments, we prove the usefulness of proposed method.

키워드

참고문헌

  1. B. Liu, M. Hu, and J. Cheng, Opinion observer: analyzing and comparing opinions on the Web, Proceedings of the 14th international conference on WWW, pp. 10-14, 2005.
  2. C. Scaffdi, K. Bierhoff, E. Chang, M. Felker, H. Ng, and C. Jin, Red Opal: Product-Feature Scoring from Reviews, Proceedings of the 8th ACM conference on Electronic commerce, pp. 11-15, 2007.
  3. Xiaowen Ding, and Bing Lui, The Utility of Linguistic Rules in Opinion Mining, Proceedings of SIGIR 2007, pp. 811-812, 2007.
  4. E. Courses, and T. Surveys, Using SentiWordNet for multilingual sentiment analysis, Proceedings of IEEE 24th International Conference on Data Engineering Workshop, ICDEW 2008, 2008.
  5. A. Popescu, and O. Extracting product features and opinions from reviews, Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 339-396, 2005.
  6. J. Liu, Y. Cao, C. Lin, Y. Huang, and M. Zhou, Low-Quality Product Review Detection in Opinion Summarization, Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 334-342, 2007.
  7. A Pak, and P Paroubek, Twitter as a Corpus for Sentiment Analysis and Opinion Mining, Proceedings of The International Conference on Language Resources and Evaluation, pp. 1320-1326, 2010.
  8. R. Mihalcea and P. Tarau, TextRank: Bringing order into texts. Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), Barcelona, Spain, 2004.
  9. X. Wan, TimedTextRank: Adding the Temporal Dimension to Multi-Document Summarization, Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, pp. 867-868, 2007.
  10. Y. Ouyang, W. Li and Q. Lu, An Integrated Multi-document Summarization Approach based on Word Hierarchical Representation, Proceedings of the ACL-IJCNLP Conference Short Papers, Suntec, Singapore, pp. 113-116, 2009.
  11. N. Garg, B. Favre, K. Reidhammer, D. Hakkani-Tuer, ClusterRank: A Graph Based Method for Meeting Summarization, Proceedings of Interspeech 2009: 10th Annual Conference Of The International Speech Communication Association, Vols 1-5 (ISBN: 978-1-61567-692-7), pp. 1507-1510, 2009.
  12. D. R. Radev, et al. Centroid-based summarization of multiple documents, Information Processing & Management, Vol. 40, No. 6, pp. 919-938. 2004. https://doi.org/10.1016/j.ipm.2003.10.006
  13. H. Zha, Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering. Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 113-120, 2002.
  14. G. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. Miler, Introduction to WordNet: An on-line lexical database, International Journal of Lexicography, pp. 235-244. 1990.
  15. G. Carenini and J. C. K. Cheng, Extractive vs. NLG-based abstractive summarization of evaluative text: The effect of corpus controversiality. In: Proceedings of the Fifth International Natural Language Generation Conference. Association for Computational Linguistics, p. 33-41, 2008.
  16. M. Litvak, and M. Last, Graph-based keyword extraction for single-document summarization, Proceedings of the workshop on Multi-source Multilingual Information Extraction and Summarization. Association for Computational Linguistics, 2008.
  17. http://en.wikipedia.org/wiki/PageRank
  18. F. Li, et al. Structure-aware review mining and summarization. Proceedings of the 23rd international conference on computational linguistics. Association for Computational Linguistics, pp. 653-661, 2010.
  19. G. Somprasertsri, and L. Pattarachai, Feature-Opinion in Online Customer Reviews for Opinion Summarization. J. of UCS, Vol. 16, No. 6, pp. 938-955, 2010.
  20. Y. Lu, and C. Zhai, N. Sundaresan, Rated aspect summarization of short comments, Proceedings of the 18th international conference on World wide web, pp. 131-140, 2009.
  21. X. Meng, et al. Entity-centric topic-oriented opinion summarization in twitter. Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 379-387, 2012.
  22. H. KIM, et al. Comprehensive review of opinion summarization. Technical report, University of Illinois at Urbana-Champaign, 2011.
  23. M. Hu, and B. Liu, Mining and summarizing customer reviews. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 168-177, 2004.
  24. L. Zhuangm, L. Huang, F. Jing, and X. Zhu, Movie review mining and summarization, Proceedings of the 15th ACM international conference on Information and knowledge management, pp. 43-50, 2006.
  25. J. Chang, and I. Kim, An Experimental Evaluation of Short Opinion Document Classification Using A Word Pattern Frequency, Journal of the Institute of Internet, Broadcasting and Communication, Vol. 12, No. 5, 2012.
  26. J. Shim, and H. C. Lee, The Development of Automatic Ontology Generation System Using Extended Search Keywords, Journal of the Korea Academia-Industrial cooperation Society, Vol. 11, No. 6, 2009.