Browse > Article
http://dx.doi.org/10.5351/KJAS.2007.20.1.167

Comparison of the Cluster Validation Methods for High-dimensional (Gene Expression) Data  

Jeong, Yun-Kyoung (Department of Statistics, Chonnam National University)
Baek, Jang-Sun (Department of Statistics, Chonnam National University)
Publication Information
The Korean Journal of Applied Statistics / v.20, no.1, 2007 , pp. 167-181 More about this Journal
Abstract
Many clustering algorithms and cluster validation techniques for high-dimensional gene expression data have been suggested. The evaluations of these cluster validation techniques have, however, seldom been implemented. In this paper we compared various cluster validity indices for low-dimensional simulation data and real gene expression data, and found that Dunn's index is the most effective and robust, Silhouette index is next and Davies-Bouldin index is the bottom among the internal measures. Jaccard index is much more effective than Goodman-Kruskal index and adjusted Rand index among the external measures.
Keywords
Gene expression data; cluster analysis; cluster validation;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Hubert, L. and Arabie, P. (1985). Comparing partitions, Journal of Classification, 2, 193-218   DOI
2 Pauwels, E. J. and Frederix, G. (1999). Finding salient regions in images: nonparametric clustering for image segmentation and grouping, Computer Vision and Image Understanding, 75, 73-85   DOI   ScienceOn
3 Rousseeuw, P. J. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis, Journal of Computational and Applied Mathematics, 20, 53-65   DOI   ScienceOn
4 Yeung, K. Y. and Ruzzo, W. L. (2000). An Empirical Study on Principal Component Analysis for Clustering Gene Expression Data, Technical Report UW-CSE-2000-11-03, Department of Computer Science and Engineering, University of Washington
5 Bezdek, J. C. and Pal, N. R. (1998). Some new indexes of cluster validity, IEEE Transactions on Systems, Man and Cybernetics, Part B:Cybemetics, 28, Issue 3, 301-315   DOI   ScienceOn
6 Bolshakova, N. and Azuaje, F. (2003a). Improving expression data mining through cluster validation, Conference Proceedings. 4th International IEEE EMBS Special Topic Conference on Information Technology Applications in Biomedicine 2003, 19-22
7 Jaccard, P. (1912). The distribution of flora in the alpine zone, New Phytologist, 11, 37-50   DOI   ScienceOn
8 Dunn, J. (1974). Well separated clusters and optimal fuzzy partitions, Journal Cybernet, 4, 95-104   DOI   ScienceOn
9 Fort, G. and Lambert-Lacroix, S. (2005). Classification using partial least squares with penalized logistic regression, Bioinformatics, 21, 1104-1111   DOI   ScienceOn
10 Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods, Journal of the American Statistical Association, 66, 846-850   DOI
11 Handl, J., Knowles, J. and Kell, D. B. (2005). Computational cluster validation in postgenomic data analysis, Bioinformatics, 21, 3201-3212   DOI   ScienceOn
12 Hubert, L. and Schultz, J. (1976). Quadratic assignment as a general data-analysis strategy, The British Journal of Mathematical & Statistical Psychology, 29, 190-241   DOI
13 Bolshakova, N. and Azuaje, F. (2003b). Cluster validation techniques for genome expression data classification, Signal Processing, 83, 825-833   DOI   ScienceOn
14 Davies, D. L. and Bouldin, D. W. (1979). A cluster separation measure, IEEE Transactions on Pattern Recognition and Machine Intelligence, 1, 224-227   DOI   ScienceOn
15 Goodman, L. and Kruskal, W. (1954). Measures of associations for cross-validations, Journal of the American Statistical Association, 49, 732-764   DOI