Topic-based Multi-document Summarization Using Non-negative Matrix Factorization and K-means

비음수 행렬 분해와 K-means를 이용한 주제기반의 다중문서요약

  • 박선 (호남대학교 컴퓨터공학과) ;
  • 이주홍 (인하대학교 컴퓨터 정보공학과)
  • Published : 2008.04.15

Abstract

This paper proposes a novel method using K-means and Non-negative matrix factorization (NMF) for topic -based multi-document summarization. NMF decomposes weighted term by sentence matrix into two sparse non-negative matrices: semantic feature matrix and semantic variable matrix. Obtained semantic features are comprehensible intuitively. Weighted similarity between topic and semantic features can prevent meaningless sentences that are similar to a topic from being selected. K-means clustering removes noises from sentences so that biased semantics of documents are not reflected to summaries. Besides, coherence of document summaries can be enhanced by arranging selected sentences in the order of their ranks. The experimental results show that the proposed method achieves better performance than other methods.

본 논문은 K-means과 비음수 행렬 분해(NMF)를 이용하여 주제기반의 다중문서를 요약하는 새로운 방법을 제안하였다. 제안방법은 비음수 행렬 분해를 이용하여 가중치가 부여된 용어-문장 행렬을 희소(Sparse)한 비음수 의미특징 행렬과 비음수 변수 행렬로 분해함으로써 직관적으로 이해할 수 있는 형태의 의미적 특징을 추출할 수 있고, 주제와 의미특징간의 유사도에 가중치를 부여하여 유사도는 높으나 실제 의미 없는 문장이 추출되는 것을 막는다. 또한 K-means 군집을 이용하여 문장에 포함된 노이즈를 제거함으로써 문서의 의미가 요약에 편향되게 반영하는 것을 피할 수 있고, 추출된 문장에 부여된 순위순서대로 정렬하여 보여 줌으로써 응집성을 높인다. 실험 결과 제안방법이 다른 방법에 비하여 좋은 성능을 보인다.

Keywords

References

  1. Mani, I., "Automatic Summarization," John Benjamins Publishing Company, 2001
  2. Radev, D. R., Hovy, E. and Mckeown, K., "Introduction to the Special Issue on Summarization," Computational Linguistics, volume 28, 399-408, 2002 https://doi.org/10.1162/089120102762671927
  3. Goldstein. J., Mittal. V., Carbonell. J., Kantrowitz. M., "Multi-Document Summarization By Sentence Extraction," The Proceeding of the ANLP/NAACL Workshop, 2000
  4. Park, S., Lee, J. H., Kim, D. W., Ahn, C. M., "Multi-document Summarization Based on Cluster Using Non-negative Matrix Factorization," In proceeding of SOFSEM, 2007
  5. Park, S., Lee, J. H., Kim, D. W., Ahn, C. M., "Multi-document Summarization Using Weighted Similarity Between Topic and Clustering-Based Non-negative Semantic Feature," In proceeding of APWeb, 2007
  6. Lee, D. D., Seung, H. S., "Learning the parts of objects by non-negative matrix factorization," Nature 401:788-791, 1999 https://doi.org/10.1038/44565
  7. Lee, D. D., Seung, H. S., "Algorithms for non- negative matrix factorization," In Advances in Neural Information Processing Systems, volume 13:556-562, 2000
  8. Ricardo, B. Y., Berthier, R. N., "Moden Information Retrieval," ACM Press, 1999
  9. Harabagiu, S. Finley L., "Topic Themes for Multi- Document Summarization," In proceeding of ACM SIGIR, 202-209, 2005
  10. Sakurai, T., Utsumi, A., "Query-based Multidocument Summarization for Information Retrieval," The Proceeding of NTCIR, 2004
  11. Goldstein. J., Mittal. V., Carbonell. J., Callan. J., "Creating and Evaluating Multi-Document Sentence Extract Summaries," The Proceeding of CIKM, 165-172, 2000
  12. Nomoto, T., Matsumoto, Y., "A New Approach to Unsupervised Text Summarization," In proceeding of ACM SIGIR, 26-34, 2001
  13. Gong, Y., Liu, X., "Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis," In proceeding of ACM SIGIR, 19-25, 2001
  14. Hachey. B., Murray. G., Reitter. D., "The Embra System at DUC 2005: Query-oriented Multi- document Summarization with a Very Large Latent Semantic Space," In Proceedings of the DUC, 2005
  15. Xu, W., Liu X., Gong, Y., "Document Clustering Based On Non-negative Matrix Factorization," In proceeding of ACM SIGIR, 267-273, 2003
  16. Chuang, W. T., Yang, J., "Extracting Sentence Segments for Text Summarization: A Machine Learning Approach," In Proceeding of ACM SIGIR, 152-159, 2000
  17. Han. J., Kamber., M., "Data Mining Concepts and Techniques," Morgan Kaufmann, 2001
  18. Chin-Yew, L., "ROUGE: A Package for Automatic Evaluation of Summaries," In Proceedings of Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL, 2004
  19. Hoa., H., D., "Overview of DUC 2005," In Proceedings of the DUC, 2005