Incremental Clustering Algorithm by Modulating Vigilance Parameter Dynamically

;;

한국정보과학회논문지:소프트웨어및응용 (Journal of KIISE:Software and Applications)

제30권11호
/
Pages.1072-1079
/
2003
/
1229-6848(pISSN)

한국정보과학회 (Korean Institute of Information Scientists and Engineers)

경계변수 값의 동적인 변경을 이용한 점층적 클러스터링 알고리즘

Incremental Clustering Algorithm by Modulating Vigilance Parameter Dynamically

신광철 (중앙대학교 컴퓨터공학부) ;
한상용 (중앙대학교 컴퓨터공학부)

발행 : 2003.12.01

PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

본 논문은 점층적으로 대규모 문서 분류를 할 수 있는 새로운 클러스터링 알고리즘에 대한 것으로, 고차원의 대규모 문서 집합에 대한 클러스터링을 수행하는 spherical k-means (SKM) 알고리즘과 점층적인 방식으로 클러스터링을 수행하는 퍼지(fuzzy) ART(adaptive resonance theory) 신경망의 특징을 이용하였다. 즉, SKM의 벡터 공간 모델과 개념벡터를 토대로 퍼지 ART의 경계변수의 개념을 결합한 것이다. 제시하는 알고리즘은 점층적 클러스터링의 지원과 함께 최적의 클러스터 수를 자동으로 결정할 뿐 아니라 이상치(outlier)와 노이즈(noise)에 의한 overfitting의 문제도 해결하였다. 또한 생성된 클러스터들의 질을 평가할 수 있는 응집도를 측정하는 목적 함수의 값에 있어서도 CLASSIC3 데이타 집합으로 실험한 결과 기존의 SKM에 비해 평균 8.04%의 향상된 응집도를 나타냈다.

This study is purported for suggesting a new clustering algorithm that enables incremental categorization of numerous documents. The suggested algorithm adopts the natures of the spherical k-means algorithm, which clusters a mass amount of high-dimensional documents, and the fuzzy ART(adaptive resonance theory) neural network, which performs clustering incrementally. In short, the suggested algorithm is a combination of the spherical k-means vector space model and concept vector and fuzzy ART vigilance parameter. The new algorithm not only supports incremental clustering and automatically sets the appropriate number of clusters, but also solves the current problems of overfitting caused by outlier and noise. Additionally, concerning the objective function value, which measures the cluster's coherence that is used to evaluate the quality of produced clusters, tests on the CLASSIC3 data set showed that the newly suggested algorithm works better than the spherical k-means by 8.04% in average.

키워드

참고문헌

Duda R. O. and Hart P. E., 'Pattern Classification and Scene Analysis,' Wiley, 1973
Mitchell T., 'Machine Learning,' McGraw Hill, 1997
Zamir O. and Etzioni O., 'Web Document Clustering: A Feasibility Demonstration,' Proceedings of the 19th International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR '98), pp.46-54, 1988
Zamir O. and Etzioni O., 'Grouper : A Dynamic Clustering Interface to Web Search Results,' Computer Networks Journal, Vol.31, pp.1361-1374, 1999 https://doi.org/10.1016/S1389-1286(99)00054-7
Modha D. S. and Spangler W. S., 'Clustering Hypertext with Applications to Web Searching,' Proceedings of ACM Hypertext Conference, 2000 https://doi.org/10.1145/336296.336351
Leouski A. and Croft W. B., 'An Evaluation of Techniques for Clustering Search Results,' Technical Report IR-76, University of Massachusetts at Amherst, 1996
Hearst M. A. and Pedersen J. O., 'Reexamining the Cluster Hypothesis : Scatter/Gather on Retrieval Results,' Proceedings of ACM SIGIR'96, pp.76-84, 1996 https://doi.org/10.1145/243199.243216
임영희, '후처리 웹 문서 클러스터링 알고리즘', 정보처리학회 논문지, 제9-B권, 제1호, pp.7-16, 2002 https://doi.org/10.3745/KIPSTB.2002.9B.1.007
Dhillon I. S. and Modha, D. S. 'Concept Decomposition for Large Sparse Text Data using Clustering,' Technical Report RJ 10147(9502), IBM Almaden Research Center, 1999
Salton G. and. McGill M. J., 'Introduction to Modern Retrieval.' McGraw-Hill Book Company, 1983
Carpenter G. A., Grossberg S. and Rosen D. B., 'Fuzzy ART : An Adaptive Resonance Algorithm for Rapid, Stable Classification of Analog Patterns,' Proceedings of 1991 International Conference Neural Networks, Vol.II, pp.411-416, 1991
Frakes W. B. and Baeza-Yates R., 'Information Retrieval : Data Structures and Algorithms,' Prentince Hall, Englewood Cliffs, New Jersey, 1992
Salton G., and Buckley C., 'Term-weighting approaches in automatic text retrieval,' Information Processing & Management, 4(5):513:523, 1988
Kolda T. G. and O'Leary D. P., 'A Semi-Discrete Matrix Decomposition for Latent Semantic Indexing in Information Retrieval,' ACM Transactions on Information Systems, 16, 322-346. 1998 https://doi.org/10.1145/291128.291131
Dhillon I. S., Fan J., and Guan Y., 'Efficient Clustering of Very Large Document Collections' Data Mining for Scientific and Engineering Applications, Kluwer Academic Publishers, 200l. available at http://www.cs.utexas.edu/users/jfan/dm/
Available at http://www.cs.utexas.edu/users/inderjit/Resources/sparse_matrices

한국정보과학회논문지:소프트웨어및응용 (Journal of KIISE:Software and Applications)

경계변수 값의 동적인 변경을 이용한 점층적 클러스터링 알고리즘

Incremental Clustering Algorithm by Modulating Vigilance Parameter Dynamically

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)