Browse > Article
http://dx.doi.org/10.9728/dcs.2011.12.3.339

A Study on Optimizing the Number of Clusters using External Cluster Relationship Criterion  

Lee, Hyun-Jin (한국사이버대학교 컴퓨터정보통신학과)
Jee, Tae-Chang (연세대학교 컴퓨터과학과)
Publication Information
Journal of Digital Contents Society / v.12, no.3, 2011 , pp. 339-345 More about this Journal
Abstract
The k-means has been one of the popular, simple and faster clustering algorithms, but the right value of k is unknown. The value of k (the number of clusters) is a very important element because the result of clustering is different depending on it. In this paper, we present a novel algorithm based on an external cluster relationship criterion which is an evaluation metric of clustering result to determine the number of clusters dynamically. Experimental results show that our algorithm is superior to other methods in terms of the accuracy of the number of clusters.
Keywords
Clustering; External Cluster Relationship Criterion; K-means; Number of Clusters;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 R. V. Ranga, "Incremental Clustering Algorithm for Earth Science Data Mining", Proceeding of the 9th International Conference on Computational Science, pp. 375-384, 2009.
2 A. J. Graaff and A. P. Engelbrecht, "Using sequential deviation to dynamically determine the number of clusters found by a local network neighbourhood artificial immune system", Journal of Applied Soft Computing archive, Vol. 11, pp. 2698-2713, 2011.   DOI   ScienceOn
3 Earl Gose, Richard Johnsonbugh and Steve Jost, "Pattern Recognition and Image Analysis", Prentice Hall, 1996.
4 Y. Yang, "Can the strength of AIC and BIC be shared?", Biometrika, Vol. 92, pp. 937-950, 2005.   DOI   ScienceOn
5 D. D. Lewis, "Reuters-21578 text categorization test collection distribution 1.0", http://www.research.att.com/-lewis, 1999.
6 S. Hettich and S. D. Bay, "The UCI KDD Archive [http://kdd.ics.uci.edu]", Irvine, CA: University of California, Department of Information and Computer Science, 1999.
7 S. Salvador and P. Chan, "Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms", In Proc. of the 16th IEEE International Conference on Tools with Artificial Intelligence, Nov., pp. 576-584, 2004.
8 W. Lu and I. Traore, "Determining the optimal number of clusters using a new evolutionary algorithm", In Proc. Of the 17th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 05), Nov., 2 pp., 2005.
9 B. Boutsinas, D. K. Tasoulis and M. N. Vrahatis, "Estimating the number of clusters using a windowing technique", Journal of Pattern Recognition an Image Analysis, Vol. 16, No. 2, April, pp. 143-154, 2006.   DOI   ScienceOn
10 지태창, 이현진, 이일병, "온라인 문서 군집화에서 군집 수 결정 방법", 정보처리학회지, Vol. 117, pp. 513-522, 2007.
11 O. Satoshi and T. Katsumi, "How Many Objects?: Determining the Number of Clusters with a Skewed Distribution", Proceeding of the 18th European Conference on Artificial Intelligence, pp. 771-772, 2008.
12 C. Rasmussen, "The infinite gaussian mixture model", Advances in neural information processing systems, Vol. 12, pp. 554-560, 2000.
13 A. K. Jain, "Data clustering: 50 years beyond K-means", Pattern Recognition Letters, Vol. 31, pp. 651-666, 2010.   DOI   ScienceOn
14 M. Figueiredo and A. K. Jain, "Unsupervised learning of finite mixture models", IEEE transactions on pattern analysis and machine intelligence, Vol. 24, pp. 381-396, 2002.   DOI   ScienceOn
15 R. Tibshirani, G. Walther and T. Hastie, "Estimating the number of clusters in a data set via the gap statistic", Journal of the royal statistical society, Vol. 63, pp. 411-423, 2001.   DOI   ScienceOn
16 M. H. Yang and N. Ahuja, "A Data Partition Method for Parallel Self-Organizing Map", Proceeding of the IJCNN 99, pp. 1929-1933, 1999.
17 D. Pelleg and A. Moore, "X-means: Extending k-means with efficient estimation of the number of clusters", In Proc. of the Seventeenth International Conference on Machine Learning (ICML2000), June, pp. 727-734, 2000.
18 R. O. Duda, P. E. Hart and Da. G. Stork, "Pattern Classification (2nd Edition)", Wiley-Interscience, Oct., 2000.
19 J. Vesanto, J. Himberg, E. Alhoniemi and J. Parkankangas, "Self-Organizing Map in Matlab: the SOM Toolbox", Proceedings of the Matlab DSP Conference, pp. 34-40, 1999.
20 Z. Huang, "Extension to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values", Data Mining and Knowledge Discovery, Vol 2, pp. 283-304, 1998.   DOI   ScienceOn
21 D. Pelleg and A. Moore, "Accelerating Exact K-means Algorithms with Geometric Reasoning", International Conference on Knowledge Discovery and Datamining '99, pp. 277-281, 1999.