Browse > Article

Incremental Clustering Algorithm by Modulating Vigilance Parameter Dynamically  

신광철 (중앙대학교 컴퓨터공학부)
한상용 (중앙대학교 컴퓨터공학부)
Abstract
This study is purported for suggesting a new clustering algorithm that enables incremental categorization of numerous documents. The suggested algorithm adopts the natures of the spherical k-means algorithm, which clusters a mass amount of high-dimensional documents, and the fuzzy ART(adaptive resonance theory) neural network, which performs clustering incrementally. In short, the suggested algorithm is a combination of the spherical k-means vector space model and concept vector and fuzzy ART vigilance parameter. The new algorithm not only supports incremental clustering and automatically sets the appropriate number of clusters, but also solves the current problems of overfitting caused by outlier and noise. Additionally, concerning the objective function value, which measures the cluster's coherence that is used to evaluate the quality of produced clusters, tests on the CLASSIC3 data set showed that the newly suggested algorithm works better than the spherical k-means by 8.04% in average.
Keywords
spherical k-means; spherical k-means; vector space model; concept vector; fuzzy ART; vigilance parameter;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Dhillon I. S., Fan J., and Guan Y., 'Efficient Clustering of Very Large Document Collections' Data Mining for Scientific and Engineering Applications, Kluwer Academic Publishers, 200l. available at http://www.cs.utexas.edu/users/jfan/dm/
2 Available at http://www.cs.utexas.edu/users/inderjit/Resources/sparse_matrices
3 Salton G., and Buckley C., 'Term-weighting approaches in automatic text retrieval,' Information Processing & Management, 4(5):513:523, 1988
4 Kolda T. G. and O'Leary D. P., 'A Semi-Discrete Matrix Decomposition for Latent Semantic Indexing in Information Retrieval,' ACM Transactions on Information Systems, 16, 322-346. 1998   DOI   ScienceOn
5 Frakes W. B. and Baeza-Yates R., 'Information Retrieval : Data Structures and Algorithms,' Prentince Hall, Englewood Cliffs, New Jersey, 1992
6 Salton G. and. McGill M. J., 'Introduction to Modern Retrieval.' McGraw-Hill Book Company, 1983
7 Hearst M. A. and Pedersen J. O., 'Reexamining the Cluster Hypothesis : Scatter/Gather on Retrieval Results,' Proceedings of ACM SIGIR'96, pp.76-84, 1996   DOI
8 Carpenter G. A., Grossberg S. and Rosen D. B., 'Fuzzy ART : An Adaptive Resonance Algorithm for Rapid, Stable Classification of Analog Patterns,' Proceedings of 1991 International Conference Neural Networks, Vol.II, pp.411-416, 1991
9 임영희, '후처리 웹 문서 클러스터링 알고리즘', 정보처리학회 논문지, 제9-B권, 제1호, pp.7-16, 2002   과학기술학회마을   DOI
10 Dhillon I. S. and Modha, D. S. 'Concept Decomposition for Large Sparse Text Data using Clustering,' Technical Report RJ 10147(9502), IBM Almaden Research Center, 1999
11 Modha D. S. and Spangler W. S., 'Clustering Hypertext with Applications to Web Searching,' Proceedings of ACM Hypertext Conference, 2000   DOI
12 Leouski A. and Croft W. B., 'An Evaluation of Techniques for Clustering Search Results,' Technical Report IR-76, University of Massachusetts at Amherst, 1996
13 Duda R. O. and Hart P. E., 'Pattern Classification and Scene Analysis,' Wiley, 1973
14 Zamir O. and Etzioni O., 'Grouper : A Dynamic Clustering Interface to Web Search Results,' Computer Networks Journal, Vol.31, pp.1361-1374, 1999   DOI   ScienceOn
15 Mitchell T., 'Machine Learning,' McGraw Hill, 1997
16 Zamir O. and Etzioni O., 'Web Document Clustering: A Feasibility Demonstration,' Proceedings of the 19th International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR '98), pp.46-54, 1988