Browse > Article
http://dx.doi.org/10.5392/JKCA.2010.10.5.029

Topic based Web Document Clustering using Named Entities  

Sung, Ki-Youn (이니텍(주))
Yun, Bo-Hyun (목원대학교)
Publication Information
Abstract
Past clustering researches are focused on extraction of keyword for word similarity grouping. However, too many candidates to compare and compute bring high complexity, low speed and low accuracy. To overcome these weaknesses, this paper proposed a topical web document clustering model using not only keyword but also named entities such as person name, organization, location, and so on. By several experiments, we prove effects of our model compared with traditional model based on only keyword and analyze how different effects show according to characteristics of document collection.
Keywords
Creativity; Educational Game; Edutainment;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Gang Wei, "Named Entity Recognition and An Apply on Document Clustering," MSCs thesis, Dalhousie University, 2004.
2 C. K. Lee, Y. G. Hwang, and S. J. Lim, "Fine-Grained Named Entity Recognition Using Conditional Random Fields for Question Answering," Proc. AIRS-06, LNCS Vol.4182, pp.581-587, 2006.
3 B. William, Frakes, and Richard Baeza-Yates, “Clustering Algorithm,” Information Retrieval Data Structure and Algorithm, Chapter 16.
4 Ricardo Baeza-Yates and Berthier Ribeiro-Neto, “Modern Information Retrieval,” Addison-Wesley, 1999.
5 Soto Montalvo and Raquel Martinex, "Bilingual New Clustering Using Named Entities and Fuzzy Similarity," Proc. of 10th TSD, 2007.
6 Hiroyuki Toda and Ryoji Kataoka, "search result clustering method using informatively named entities," Proc. of ACM internationa workshop on WIDM, pp.1-86, 2005.
7 H. J. Oh, S. H. Myaeng, and M. G. Jang,“Enhancing Performance with a Learnable Strategy for Multiple Question Answering Modules,” ETRI Journal, Vol.31, No.4, 2009.
8 Oren Zamir, “Fast and Intuitive Clustering of Web Documents,” Qual's Paper, University of Washington.
9 Oren Zamir and Oren Etzioni, “Grouper: A Dynamic Clustering Interface to Web Search Results,” Proc. of WWW8, pp.1361-1374, 2009.
10 Oren Zamir and Oren Etzioni, “Web Document Clustering: A Feasibility Demonstration,” Proc. of ACM SIGIR'98, 1998.