Browse > Article
http://dx.doi.org/10.5351/KJAS.2009.22.4.745

A Comparison of Cluster Analyses and Clustering of Sensory Data on Hanwoo Bulls  

Kim, Jae-Hee (Department of Statistics, Duksung Women's University)
Ko, Yoon-Sil (Department of Statistics, Duksung Women's University)
Publication Information
The Korean Journal of Applied Statistics / v.22, no.4, 2009 , pp. 745-758 More about this Journal
Abstract
Cluster analysis is the automated search for groups of related observations in a data set. To group the observations into clusters many techniques has been proposed, and a variety measures aimed at validating the results of a cluster analysis have been suggested. In this paper, we compare complete linkage, Ward's method, K-means and model-based clustering and compute validity measures such as connectivity, Dunn Index and silhouette with simulated data from multivariate distributions. We also select a clustering algorithm and determine the number of clusters of Korean consumers based on Korean consumers' palatability scores for Hanwoo bull in BBQ cooking method.
Keywords
Average Distance(AD); Average Proportion(APN); complete linkage; connectivity; Dunn lndex; K-means; model-based clustering; silhouette width; Ward's method;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Pollard, D. (1982). Central limit theorems for K-means clustering, Annals of Statistics, 10, 919-926
2 Rousseeuw, P. J. (1987). Silhouettes: Graphical aid to the interpretation and validation of cluster analysis, Journal of Computational and Applied Mathematics, 20, 53-65   DOI   ScienceOn
3 Scott, A. J. and Symons, M. (1971). Clustering methods based on likelihood ratio criteria, Biometrics, 27, 387-397   DOI   ScienceOn
4 Ward, Jr., J. H. (1963). Hierarchical grouping to optimize an objective function, Journal of the American Statistical Association, 58, 236-244   DOI   ScienceOn
5 Yeung, K. Y., Fraley, C., Murua, A., Raftery, A. E. and Ruzzo, W. L. (2001). Model-based clustering and data transformations for gene expression data, Bioinformatics, 17, 977-987   DOI   ScienceOn
6 Banfield, J. D. and Raftery, A. E. (1993). Model-based Gaussian and Non-Gaussian clustering, Biometrics, 49, 803-821   DOI   ScienceOn
7 Brock, G., Pihur, V., Datta, S. and Datta, S. (2008). clValid: An R package for cluster validation, Journal of Statistical Software, 25, 1-21
8 Datta, S. and Datta, S. (2003). Comparisons and validation of statistical clustering techniques for microarray gene expression data, Bioinformatics, 19, 459-466   DOI   ScienceOn
9 Dunn (1974). Well-separated clusters and optimal fuzzy partitions, Journal of Cybernetics, 4, 95-104   DOI   ScienceOn
10 Fraley, C. and Raftery, A. E. (1998). How many clusters? Which clustering method-answers via model-based cluster analysis, Computation Journal, 41, 578-588
11 Fraley, C. and Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation, Journal of the American Statistical Association, 97, 611-631   DOI   ScienceOn
12 Handl, J., Knowles, J. and Kell, D. B. (2005). Computational cluster validation in post-genomic data analysis, Bioinformatics, 21, 3201-3212   DOI   ScienceOn
13 Hartigan, J. A. and Wong, M. A. (1979). K-means clustering algorithm, Applied Statistics, 28, 100-108   DOI   ScienceOn
14 Kaufman, L. and Rousseeuw, P. J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis, Wiley, New York
15 Pollard, D. (1981). Strong consistency of K-means clustering, Annals of Statistics, 9, 135-140   DOI   ScienceOn