Double K-Means Clustering

이중 K-평균 군집화

  • 허명회 (고려대학교 정경대학 통계학과)
  • Published : 2000.09.01

Abstract

In this study. the author proposes a nonhierarchical clustering method. called the "Double K-Means Clustering", which performs clustering of multivariate observations with the following algorithm: Step I: Carry out the ordinary K-means clmitering and obtain k temporary clusters with sizes $n_1$,... , $n_k$, centroids $c_$1,..., $c_k$ and pooled covariance matrix S. $\bullet$ Step II-I: Allocate the observation x, to the cluster F if it satisfies ..... where N is the total number of observations, for -i = 1, . ,N. $\bullet$ Step II-2: Update cluster sizes $n_1$,... , $n_k$, centroids $c_$1,..., $c_k$ and pooled covariance matrix S. $\bullet$ Step II-3: Repeat Steps II-I and II-2 until the change becomes negligible. The double K-means clustering is nearly "optimal" under the mixture of k multivariate normal distributions with the common covariance matrix. Also, it is nearly affine invariant, with the data-analytic implication that variable standardizations are not that required. The method is numerically demonstrated on Fisher's iris data.

K-평균 군집화(K-means clustering)는 비계층적 군집화 방법이 하나로서 큰 자료에서 개체 군집화에 효율적인 것으로 알려져 있다. 그러나 종종 비교적 균일한 대군집의 일부를 소군집에 떼어주는 오류를 범하기도 한다. 이 연구에서는 그러한 현상을 정확히 인지하고 이에 대한 대책으로서 ‘이중 K-평균 군집화(double K-means clustering)’방법을 제시한다. 또한 실증적 사례에 새 방법론을 적용해보고 토의한다.

Keywords

References

  1. Cluster Analysis Everitt, B. S.
  2. Applied Multivariate Data Analysis Everitt, B. S.;Dunn, G.
  3. Clustering Algorithms Hartigan, J. A.
  4. A Study on the Partitioning Method for Cluster Analysis Jin, Seohoon
  5. Applied Multivariate Statistical Analysis Johnson, R. A.;Wichern, D. W.
  6. in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability v.1 Some methods for classification and analysis of multivariate observations MacQueen, J. B.
  7. Multivariate Analysis Mardia, K. V.;Kent, J. T.;Bibby, J. M.
  8. Multivariate Behavioral Research v.16 A review of Monte Carlo tests of cluster analysis Milligan, G. W.
  9. SAS/STAT User's Guide, Version 6 v.1 SAS Institute
  10. Applied Multivariate Techniques Sharma, S.