Multi-document Summarization Based on Cluster using Term Co-occurrence

Lee, Il-Joo;Kim, Min-Koo;

Journal of KIISE:Software and Applications (한국정보과학회논문지:소프트웨어및응용)

Volume 33 Issue 2
/
Pages.243-251
/
2006
/
1229-6848(pISSN)

Korean Institute of Information Scientists and Engineers (한국정보과학회)

Multi-document Summarization Based on Cluster using Term Co-occurrence

단어의 공기정보를 이용한 클러스터 기반 다중문서 요약

이일주 (동원대학 모바일컨텐츠과) ;
김민구 (아주대학교 컴퓨터공학과)

Published : 2006.02.01

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

In multi-document summarization by means of salient sentence extraction, it is important to remove redundant information. In the removal process, the similarities and differences of sentences are considered. In this paper, we propose a method for multi-document summarization which extracts salient sentences without having redundant sentences by way of cohesive term clustering method that utilizes co-occurrence Information. In the cohesive term clustering method, we assume that each term does not exist independently, but rather it is related to each other in meanings. To find the relations between terms, we cluster sentences according to topics and use the co-occurrence information oi terms in the same topic. We conduct experimental tests with the DUC(Document Understanding Conferences) data. In the tests, our method shows better performance of summarization than other summarization methods which use term co-occurrence information based on term cohesion of document or sentence unit, and simple statistical information.

대표문장 추출에 의한 다중문서 요약에서는 비슷한 정보가 여러 문서에서 반복적으로 나타나는 정보의 중복문제에 대해 문장의 유사성과 차이점을 고려하여 이를 해결할 수 있는 효율적인 방법이 필요하다. 본 논문에서는 단어의 공기정보에 의한 관련단어 클러스터링 기법을 이용하여 문장의 중복성을 제거하고 중요문장을 추출하는 다중문서 요약을 제안한다. 관련단어 클러스터링 기법에서는 각 단어들은 서로 독립적으로 존재하는 것이 아니라 서로 간에 의미적으로 연관되어 있다고 보며 주제별 문장클러스터단위의 단어 연관성(cohesion)을 이용한다. 평가용 실험문서인 DUC(Document Understanding Conferences) 데이타를 이용하여 실험한 결과 본 논문에서 제안한 문장클러스터단위의 단어 공기정보를 이용한 방법이 단순 통계정보와 문서단위 단어 공기정보, 문장단위 단어 공기정보에 의한 다중문서 요약에 비해 좋은 결과를 보였다.

Keywords

References

Julian Kupiec, Jan Pedersen, and Francine Chen, 'A Trainable Document Summarizer,' In Proceedings of ACM-SIGIR'95, pp.68-73,1995 https://doi.org/10.1145/215206.215333
Mani and Inderjeet, Automatic Summarization, Amsterdam:John Benjamina Publishing Co. 2001
박성배, 장병탁, 'Co-Trained Support Vector Machines을 이용한 문서분류,' 한국정보과학회 봄 학술발표 논문집 (B), 제29권 1호, pp.259-261, 2002
Barzilay, Regina and Michael Elhadad, 'Lexical Chains for Text Summarization,' Master's thesis, Ben-Gurion University, 1997
장두성, 최기선, '단서 구문과 어휘 쌍 확률을 이용한 인과관계 추출', 제 15회 한글 및 한국어 정보처리 학술대회, 2003
C. J. van Rijsbergen., 'A Theoritical Basis for the Use of Co-occurrence Data in Information Retrieval,' Journal of Documentation, Vol.33:106-119, 1977 https://doi.org/10.1108/eb026637
Salton.G., Automatic text Processing: The Transformation, Analysis, and Retrieval of Information by Computer, Addison-Wesley, 1989
Sparck Jones, K., 'Automatic summarizing:factors and directions,' In Mani and Maybury, (eds), Advances in Automatic Text Summarization, pp. 1-12. The MIT Press. 1999
Morris. A.H., Kasper and G.M, Adams. D.A., 'The effects and limitations of automated text condensing on reading comprehension performance,' Information systems Research, pp. 17. 35, March 1992 https://doi.org/10.1287/isre.3.1.17
김재훈, 김준홍, '도합유사도를 이용한 한국어 문서요약 시스템', 한국 인지과학회 논문지 제12권 제1.2호, pp.35-42, 2001
http://www-nlpir.nist.gov/projects/duc/index.html
http://www.isi.edu/~cyl/ROUGE/
Salton.G., Singhal.A., Mitra.M. and Buckly.C., 'Automatic text structuring and summarization: Information Processing and Management, Vol. 33, no.2, 1997 https://doi.org/10.1016/S0306-4573(96)00062-3
Lin, Chin-Yew and E.H. Hovy., Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics, In Proceedings of 2003 Language Technology Conference (HLT-NAACL 2003), Edmonton, Canada, May 27-June 1, 2003 https://doi.org/10.3115/1073445.1073465
Buckley,C.,Singhai,A.,Mitra, M. and Salton, G.,:'New retrieval approaches using SMART:TREC4', Proceedigs of the Forth Text Conference(TREC-4), pp. 25-48, 1996

Journal of KIISE:Software and Applications (한국정보과학회논문지:소프트웨어및응용)

Multi-document Summarization Based on Cluster using Term Co-occurrence

단어의 공기정보를 이용한 클러스터 기반 다중문서 요약

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)