Browse > Article

Multi-document Summarization Based on Cluster using Term Co-occurrence  

Lee, Il-Joo (동원대학 모바일컨텐츠과)
Kim, Min-Koo (아주대학교 컴퓨터공학과)
Abstract
In multi-document summarization by means of salient sentence extraction, it is important to remove redundant information. In the removal process, the similarities and differences of sentences are considered. In this paper, we propose a method for multi-document summarization which extracts salient sentences without having redundant sentences by way of cohesive term clustering method that utilizes co-occurrence Information. In the cohesive term clustering method, we assume that each term does not exist independently, but rather it is related to each other in meanings. To find the relations between terms, we cluster sentences according to topics and use the co-occurrence information oi terms in the same topic. We conduct experimental tests with the DUC(Document Understanding Conferences) data. In the tests, our method shows better performance of summarization than other summarization methods which use term co-occurrence information based on term cohesion of document or sentence unit, and simple statistical information.
Keywords
multi-document summarization; co-occurrence information; cohesion;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Salton.G., Singhal.A., Mitra.M. and Buckly.C., 'Automatic text structuring and summarization: Information Processing and Management, Vol. 33, no.2, 1997   DOI   ScienceOn
2 Lin, Chin-Yew and E.H. Hovy., Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics, In Proceedings of 2003 Language Technology Conference (HLT-NAACL 2003), Edmonton, Canada, May 27-June 1, 2003   DOI
3 Buckley,C.,Singhai,A.,Mitra, M. and Salton, G.,:'New retrieval approaches using SMART:TREC4', Proceedigs of the Forth Text Conference(TREC-4), pp. 25-48, 1996
4 Julian Kupiec, Jan Pedersen, and Francine Chen, 'A Trainable Document Summarizer,' In Proceedings of ACM-SIGIR'95, pp.68-73,1995   DOI
5 Mani and Inderjeet, Automatic Summarization, Amsterdam:John Benjamina Publishing Co. 2001
6 박성배, 장병탁, 'Co-Trained Support Vector Machines을 이용한 문서분류,' 한국정보과학회 봄 학술발표 논문집 (B), 제29권 1호, pp.259-261, 2002   과학기술학회마을
7 Barzilay, Regina and Michael Elhadad, 'Lexical Chains for Text Summarization,' Master's thesis, Ben-Gurion University, 1997
8 장두성, 최기선, '단서 구문과 어휘 쌍 확률을 이용한 인과관계 추출', 제 15회 한글 및 한국어 정보처리 학술대회, 2003
9 C. J. van Rijsbergen., 'A Theoritical Basis for the Use of Co-occurrence Data in Information Retrieval,' Journal of Documentation, Vol.33:106-119, 1977   DOI   ScienceOn
10 Salton.G., Automatic text Processing: The Transformation, Analysis, and Retrieval of Information by Computer, Addison-Wesley, 1989
11 Sparck Jones, K., 'Automatic summarizing:factors and directions,' In Mani and Maybury, (eds), Advances in Automatic Text Summarization, pp. 1-12. The MIT Press. 1999
12 Morris. A.H., Kasper and G.M, Adams. D.A., 'The effects and limitations of automated text condensing on reading comprehension performance,' Information systems Research, pp. 17. 35, March 1992   DOI
13 김재훈, 김준홍, '도합유사도를 이용한 한국어 문서요약 시스템', 한국 인지과학회 논문지 제12권 제1.2호, pp.35-42, 2001   과학기술학회마을
14 http://www-nlpir.nist.gov/projects/duc/index.html
15 http://www.isi.edu/~cyl/ROUGE/