동적 연결 그래프를 이용한 자동 문서 요약 시스템

A Document Summarization System Using Dynamic Connection Graph

  • 발행 : 2009.01.15

초록

문서 요약은 쉽고 빠르게 문서의 내용을 파악할 수 있도록 방대한 내용을 가지는 다양한 형태의 문서로부터 핵심 내용만을 추출하거나 생성하여 제공하는 것을 목적으로 한다. 본 논문에서는 효율적 문서 요약을 위해 주어진 문서의 평균 문장 길이(핵심어 개수)를 고려하여 문장 간의 핵심어 유사도를 나타내는 연결 그래프를 생성하고 분석하여 요약을 생성하는 기법을 제안한다. 또한 이러한 기법을 이용하여 응용 프로그램 문서로부터 자동으로 요약을 생성하는 자동 문서 요약 시스템을 개발한다. 제안한 방법의 객관적인 요약 성능 측정을 위해 정확한 요약문이 실린 20개의 테스트 문서를 이용하여 생성된 요약에 대해 precision(정확률)과 recall(재현율), F-measure를 측정하였으며, 실험 결과를 통해 기존 기법에 비해 우수한 요약 성능을 보임을 증명하였다.

The purpose of document summarization is to provide easy and quick understanding of documents by extracting summarized information from the documents produced by various application programs. In this paper, we propose a document summarization method that creates and analyzes a connection graph representing the similarity of keyword lists of sentences in a document taking into account the mean length(the number of keywords) of sentences of the document. We implemented a system that automatically generate a summary from a document using the proposed method. To evaluate the performance of the method, we used a set of 20 documents associated with their correct summaries and measured the precision, the recall and the F-measure. The experiment results show that the proposed method is more efficient compared with the existing methods.

키워드

참고문헌

  1. Inderjeet Mani, Automatic Summarization, Kohn Benjamins Publishing Co., 2001
  2. Ohm Sornil, Kornnika Gree-ut, 'An Automatic Text Summarization Approach using Context-Based and Graph-Based Characteristics,' IEEE Conference on Cybernetics and Intelligent Systems, pp. 1-6, 2006 https://doi.org/10.1109/ICCIS.2006.252361
  3. Daniel Mallett, James Elding, Mario A. Nascimento, 'Information-Content Based Sentence Extraction for Text Summarization,' IEEE International Conference on Information Technology: Coding and Computing, Vol.2, pp.214-218, 2004 https://doi.org/10.1109/ITCC.2004.1286634
  4. Ani Nenkova, Lucy Vanderwende, Kathleen Mc- Keown, 'A Compositional Context Sensitive Multi- Document Summarizer: Exploring The Factors That Influence Summarization,' Annual ACM Conference on Research and Development in Information Retrieval, pp.573-580, 2006 https://doi.org/10.1145/1148170.1148269
  5. Takaharu Takeda, Atsuhiro Takasu, 'UpdateNews: A News Clustering and Summarization System Using Efficient Text Processing,' International Conference on Digital Libraries, pp.438-439, 2007 https://doi.org/10.1145/1255175.1255264
  6. Il joo Lee, Minkoo Kim, 'Document Summarization Based on Sentence Clustering Using Graph Division,' Journal of Korea Information Processing Society, Vol.13-B, No.2, pp.149-154, 2006
  7. Philipp Cimiano, Ontology Learning and Population from Text, Springer, 2006
  8. Lei Yu, Jia Ma, Ren, F., Kuroiwa, S., 'Automatic Text Summarization Based on Lexical Chains and Structural Features,' IEEE ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, Vol.2, pp.574-578, 2007 https://doi.org/10.1109/SNPD.2007.385
  9. Il joo Lee, Minkoo Kim, 'Multi-Document Summarization Based on Cluster using Term Cooccurrence,' Journal of Korea Institute of Information Scientists and Engineers: Software and Application, Vol.33, No.2, pp.243-251, 2006
  10. Chang-Beom Lee, Min-SOO Kim, Jang-Sun Baek, Hyuk-Ro Park, 'Text Summarization using PCA and SVD,' Journal of Korea Information Processing Society, Vol.10-B, No.7, pp.725-734, 2003
  11. http://www.kings.co.kr, Kings Information & Networks.
  12. KLT 2.10b, http://nlp.kookmin.ac.kr/, Kookmin University
  13. DBpia, http://dbpia.co.kr, 교보문고, 누리미디어
  14. Chin-Yew Lin, Franz Josef Och, 'Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics,' Annual Meeting on Association for Computational Linguistics, No.605, 2004 https://doi.org/10.3115/1218955.1219032