Browse > Article
http://dx.doi.org/10.5391/IJFIS.2010.10.1.031

Empirical Comparisons of Clustering Algorithms using Silhouette Information  

Jun, Sung-Hae (Department of Bioinformatics & Statistics, Cheongju University)
Lee, Seung-Joo (Department of Bioinformatics & Statistics, Cheongju University)
Publication Information
International Journal of Fuzzy Logic and Intelligent Systems / v.10, no.1, 2010 , pp. 31-36 More about this Journal
Abstract
Many clustering algorithms have been used in diverse fields. When we need to group given data set into clusters, many clustering algorithms based on similarity or distance measures are considered. Most clustering works have been based on hierarchical and non-hierarchical clustering algorithms. Generally, for the clustering works, researchers have used clustering algorithms case by case from these algorithms. Also they have to determine proper clustering methods subjectively by their prior knowledge. In this paper, to solve the subjective problem of clustering we make empirical comparisons of popular clustering algorithms which are hierarchical and non hierarchical techniques using Silhouette measure. We use silhouette information to evaluate the clustering results such as the number of clusters and cluster variance. We verify our comparison study by experimental results using data sets from UCI machine learning repository. Therefore we are able to use efficient and objective clustering algorithms.
Keywords
Objective clustering; Subjective clustering; Silhouette Information; Number of clusters;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 M. Maechler, Cluster Analysis Extended Rousseeuw et al., Package cluster, 2009.
2 T. M. Mitchell, Machine Learning, McGraw-Hill, 1997.
3 A. K. Jain, M. N. Murty, P. J. Flynn, “Data clustering: a review,” ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, 1999.   DOI   ScienceOn
4 D. Dumitrescu, B. Lazzerini, L. C. Jain, Fuzzy Sets and Their Application to Clustering and Training, CRC Press, 2000.
5 The R Project for Statistical Computing, www.rproject.org
6 R. Xu, D. Wunsch II, “Survey of clustering algorithms,” IEEE Transactions on Neural Networks, vol. 16, no. 3, pp. 645-678, 2005.   DOI   ScienceOn
7 I. Oh, Pattern Recognition, Kyobo, 2008.
8 R. C. Dubes, “How many clusters are best? - an experiment,” Pattern Recognition, vol. 20, no. 6, pp. 645-663, 1987.   DOI   ScienceOn
9 A. R. Liddle, “Information criteria for astrophysical model selection,” Monthly Notices of the Royal Astronomical Society: Letters, vol. 377, iss. 1, pp. L74-L78, 2008.
10 Q. Zhao, V. Hautamaki, P. Franti, “Knee Point Detection in BIC for Detecting the Number of Clusters,” Lecture Notes in Computer Science, vol. 5259, pp. 664-673, 2008.   DOI   ScienceOn
11 J. Han, M. Kamber, Data Mining Concepts and Techniques, Morgan Kaufmann, 2001.
12 M. J. Park, S. H. Jun, K. W. Oh, “Determination of Optimal Cluster Size Using Bootstrap and Genetic Algorithm”, International Journal of Fuzzy Logic and Intelligent Systems, vol. 13, no. 1, pp. 12-17, 2003.   DOI   ScienceOn
13 P.-N. Tan, M. Steinbach, V. Kumar, Introduction to Data Mining, Addison Wesley, 2006.
14 A. S. Pandya, R. B. Macy, Pattern Recognition with Neural Networks in C++, IEEE Press, 1995.
15 S. H. Jun, “An Optimal Clustering using Hybrid Self Organizing Map”, International Journal of Fuzzy Logic and Intelligent Systems, vol. 6, no. 1, pp. 10-14, 2006.   DOI   ScienceOn
16 UCI ML Repository, http://archive.ics.uci.edu/ml/
17 P. J. Rousseeuw, “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,” Journal of Computational and Applied mathematics, vol. 20, pp. 53-65, 1987.   DOI   ScienceOn
18 B. S. Everitt, S. Landau, M. Leese, Cluster Analysis, Arnold, 2001.