Development of Similarity-Based Document Clustering System

유사성 계수에 의한 문서 클러스터링 시스템 개발

  • 우훈식 (대전대학교 컴퓨터정보통신공학부) ;
  • 임동순 (한남대학교 산업시스템공학과)
  • Published : 2002.05.01

Abstract

Clustering of data is of a great interest in many data mining applications. In the field of document clustering, a document is represented as a data in a high dimensional space. Therefore, the document clustering can be accomplished with a general data clustering techniques. In this paper, we introduce a document clustering system based on similarity among documents. The developed system consists of three functions: 1) gatherings documents utilizing a search agent; 2) determining similarity coefficients between any two documents from term frequencies; 3) clustering documents with similarity coefficients. Especially, the document clustering is accomplished by a hybrid algorithm utilizing genetic and K-Means methods.

Keywords