Browse > Article

Topic-based Multi-document Summarization Using Non-negative Matrix Factorization and K-means  

Park, Sun (호남대학교 컴퓨터공학과)
Lee, Ju-Hong (인하대학교 컴퓨터 정보공학과)
Abstract
This paper proposes a novel method using K-means and Non-negative matrix factorization (NMF) for topic -based multi-document summarization. NMF decomposes weighted term by sentence matrix into two sparse non-negative matrices: semantic feature matrix and semantic variable matrix. Obtained semantic features are comprehensible intuitively. Weighted similarity between topic and semantic features can prevent meaningless sentences that are similar to a topic from being selected. K-means clustering removes noises from sentences so that biased semantics of documents are not reflected to summaries. Besides, coherence of document summaries can be enhanced by arranging selected sentences in the order of their ranks. The experimental results show that the proposed method achieves better performance than other methods.
Keywords
multi-document summarization; non-negative matrix factorization; clustering; topic-based summarization; weighted similarity;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Nomoto, T., Matsumoto, Y., "A New Approach to Unsupervised Text Summarization," In proceeding of ACM SIGIR, 26-34, 2001
2 Chin-Yew, L., "ROUGE: A Package for Automatic Evaluation of Summaries," In Proceedings of Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL, 2004
3 Mani, I., "Automatic Summarization," John Benjamins Publishing Company, 2001
4 Lee, D. D., Seung, H. S., "Learning the parts of objects by non-negative matrix factorization," Nature 401:788-791, 1999   DOI   ScienceOn
5 Harabagiu, S. Finley L., "Topic Themes for Multi- Document Summarization," In proceeding of ACM SIGIR, 202-209, 2005
6 Park, S., Lee, J. H., Kim, D. W., Ahn, C. M., "Multi-document Summarization Based on Cluster Using Non-negative Matrix Factorization," In proceeding of SOFSEM, 2007
7 Ricardo, B. Y., Berthier, R. N., "Moden Information Retrieval," ACM Press, 1999
8 Lee, D. D., Seung, H. S., "Algorithms for non- negative matrix factorization," In Advances in Neural Information Processing Systems, volume 13:556-562, 2000
9 Hachey. B., Murray. G., Reitter. D., "The Embra System at DUC 2005: Query-oriented Multi- document Summarization with a Very Large Latent Semantic Space," In Proceedings of the DUC, 2005
10 Goldstein. J., Mittal. V., Carbonell. J., Kantrowitz. M., "Multi-Document Summarization By Sentence Extraction," The Proceeding of the ANLP/NAACL Workshop, 2000
11 Sakurai, T., Utsumi, A., "Query-based Multidocument Summarization for Information Retrieval," The Proceeding of NTCIR, 2004
12 Gong, Y., Liu, X., "Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis," In proceeding of ACM SIGIR, 19-25, 2001
13 Chuang, W. T., Yang, J., "Extracting Sentence Segments for Text Summarization: A Machine Learning Approach," In Proceeding of ACM SIGIR, 152-159, 2000
14 Park, S., Lee, J. H., Kim, D. W., Ahn, C. M., "Multi-document Summarization Using Weighted Similarity Between Topic and Clustering-Based Non-negative Semantic Feature," In proceeding of APWeb, 2007
15 Han. J., Kamber., M., "Data Mining Concepts and Techniques," Morgan Kaufmann, 2001
16 Goldstein. J., Mittal. V., Carbonell. J., Callan. J., "Creating and Evaluating Multi-Document Sentence Extract Summaries," The Proceeding of CIKM, 165-172, 2000
17 Xu, W., Liu X., Gong, Y., "Document Clustering Based On Non-negative Matrix Factorization," In proceeding of ACM SIGIR, 267-273, 2003
18 Hoa., H., D., "Overview of DUC 2005," In Proceedings of the DUC, 2005
19 Radev, D. R., Hovy, E. and Mckeown, K., "Introduction to the Special Issue on Summarization," Computational Linguistics, volume 28, 399-408, 2002   DOI   ScienceOn