Browse > Article

A Search-Result Clustering Method based on Word Clustering for Effective Browsing of the Paper Retrieval Results  

Bae, Kyoung-Man (동아대학교 컴퓨터공학과)
Hwang, Jae-Won (동아대학교 컴퓨터공학과)
Ko, Young-Joong (동아대학교 컴퓨터공학과)
Kim, Jong-Hoon (동아대학교 컴퓨터공학과)
Abstract
The search-results clustering problem is defined as the automatic and on-line grouping of similar documents in search results returned from a search engine. In this paper, we propose a new search-results clustering algorithm specialized for a paper search service. Our system consists of two algorithmic phases: Category Hierarchy Generation System (CHGS) and Paper Clustering System (PCS). In CHGS, we first build up the category hierarchy, called the Field Thesaurus, for each research field using an existing research category hierarchy (KOSEF's research category hierarchy) and the keyword expansion of the field thesaurus by a word clustering method using the K-means algorithm. Then, in PCS, the proposed algorithm determines the category of each paper using top-down and bottom-up methods. The proposed system can be used in the application areas for retrieval services in a specialized field such as a paper search service.
Keywords
Search-Result Clustering; Word Clustering; K-means Algorithm; Search Engine; Information Retrieval;
Citations & Related Records
연도 인용수 순위
  • Reference
1 S. Osinski, J. Stefanowski, D. Weiss, "Lingo: Search results Clustering algorithm based on singular value decomposition," Proc. the International Conference on Intelligent Information Systems (IIPWM), pp.359-368, 2004.
2 P. Ferragina, A. Gulli, "A personalized search engine based on web snippet hierarchical clustering," Proc. the World Wide Web Conference, pp.189- 225, 2005.
3 B. Fung, K. Wang, and M. Ester, "Large hierarchical document clustering using frequent itemsets," In SDM03, 2003.
4 D. Zhang and Y. Dong, "Semantic, hierarchical, online clustering of web search results," Proc. The 3rd International Workshop on Web Information and Data, pp.69-78, 2004.
5 D. J. Lawrie and W. B. Croft, "Generating hiearchical summaries for web searches," In SIGIR03, 2003.
6 Y. Wu and X. Chen, "Extracting features from web search returned hits for hierarchical classification," Proc. International Conference on Information and Knowledge Engineering(IKE'03), pp. 103-108, 2003.
7 G. Mecca, S. Raunich, A. Pappalardo, "A new algorithm for clustering search results," Proc. Data & Knowledge Engineering, pp.504-22, 2007.
8 M. A. Hearst and J. O. Pedersen, "Reexamining the cluster hypothesis: Scatter/gather on retrieval results," Proc. SIGIR-96, pp.76-84, 1996.
9 O. Zamir and O. Etzioni, "Grouper: a dynamic clustering interface to Web search results," Proc. Computer Networks: The International Journal of Computer and Telecommunications Networking, pp.1361-1374, 1999.
10 F. Giannotti, M. Nanni, and D. Pedreschi, "Webcat: Automatic categorization of web search results," Proc. SEBD'2003, pp.507-518, 2003.
11 Z. Jiang, A. Joshi, R. Krishnapuram, and L. Yi, "Retriever: Improving web search engine results using clustering," Proc. Managing Business with Electronic Commerce 02, pp.59-81, 2002.
12 S. Osinski and D. Weiss, "Conceptual Clustering using lingo algorithm: Evaluation on open directory project data," Proc. IIPWM04, pp.369-377, 2004.