Browse > Article
http://dx.doi.org/10.6109/jkiice.2012.16.12.2607

Generic Document Summarization using Coherence of Sentence Cluster and Semantic Feature  

Park, Sun (목포대학교 정보산업연구소)
Lee, Yeonwoo (목포대학교 정보통신학과)
Shim, Chun Sik (목포대학교 조선공학과)
Lee, Seong Ro (목포대학교 정보전자공학과)
Abstract
The results of inherent knowledge based generic summarization are influenced by the composition of sentence in document set. In order to resolve the problem, this papser propses a new generic document summarization which uses clustering of semantic feature of document and coherence of document cluster. The proposed method clusters sentences using semantic feature deriving from NMF(non-negative matrix factorization), which it can classify document topic group because inherent structure of document are well represented by the sentence cluster. In addition, the method can improve the quality of summarization because the importance sentences are extracted by using coherence of sentence cluster and the cluster refinement by re-cluster. The experimental results demonstrate appling the proposed method to generic summarization achieves better performance than generic document summarization methods.
Keywords
generic summarization; semantic feature; coherence of sentence cluster; NMF(non-negative matrix factorization); kmeans clustering;
Citations & Related Records
연도 인용수 순위
  • Reference
1 I. Mani, M. T. Maybury, "dvances in Automatic Text," The MIT Press, 1999.
2 M. F. Moens, R. Angheluta, J. Dumortier, "Generic technologies for single-and multi-document summarization," Information Processing and Management 41, pp.569-586, 2005.   DOI   ScienceOn
3 K. Bellare, A. D. Sarma, A. D. Sarma, N. Loiwal, V. Mehta, G. Ramakrishnan, P. Bhattacharyya, "Generic Text Summarization using WordNet," In proceeding of LREC 2004, 2004.
4 Y. Gong, X. Liu, "Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis," In proceeding of ACM SIGIR'01, pp.19-25, 2001.
5 H. Zha, "Generic Summarization and Keyphrase Extraction Using Mutual Reinforcement Principle and Sentence Clustering," In proceeding of ACM SIGIR'02, pp.113-120, 2002.
6 S. Park, "Generic Summarization Using Non-negative Semantic Variable," Lecture Notes in Comupter Science 5226, Springer, pp.1052-1058, 2008.
7 박선, 이종훈, "의미특징의 포괄적 중요도를 이용한 포괄적 문서요약", 한국항행학회논문지, 제12권 제5호, pp.41-47, 2008.
8 W. B. Frankes, B. Y. Ricardo, "Information Retrieval : Data Structure & Algorithms," Prentice-Hall, 1992.
9 D. D. Lee, H. S. Seung, "Algorithms for non-negative matrix factorization," In Advances in Neural Information Processing Systems, vol. 13, pp.556-562, 2001.
10 주길홍, 이원석,"효율적인 문서검색을 위한 레벨별 불용어 제거에 기반한 문서 클러스터링", 컴퓨터교육학회 논문지 11권3호, 2008. 05
11 J. Han, M. Kamber, "Second Edition Data Mining Concepts and Techniques", Morgan Kaufman, 2006.
12 http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html, 2012.
13 http://kr.news.yahoo.com/, 2012.
14 S. S. Kang, "Information Retrieval and Morpheme Analysis," HongReung Science Publishing Co., 2002.