DOI QR코드

DOI QR Code

Identification of Cluster with Composite Mean and Variance

합성된 평균과 분산을 가진 군집 식별

  • Kim, Seung-Gu (Department of Data & Information, Sangji University)
  • 김승구 (상지대학교 컴퓨터데이터정보학과)
  • Received : 20110300
  • Accepted : 20110400
  • Published : 2011.05.31

Abstract

Consider a cluster, so called a 'son cluster', whose mean and variance is composed of the means and variances of both clusters called as a 'father cluster' and a 'mother cluster'. In this paper, a method for identifying each of three clusters is provided by modeling the relationship with father and mother clusters. Under the normal mixture model, the parameters are estimated via EM algorithm. We were able to overcome the problems of estimation using ECM approximation. Numerical examples show that our method can effectively identify the three clusters, so called a 'family of clusters'.

본 논문에서는 자료 내의 군집 중에 '부(父) 군집'과 모(母) 군집'이라 부르는 두 군집 사이에, 합성된 평균 분산을 가지는 '합성군집' 즉 '자식 군집'이라 부르는 한 군집이 있을 경우에 주목하여, 그들의 관계를 평균과 분산에 관해 모형화하고 각각의 군집을 식별하는 방법을 제공하였다. 관측치는 정규혼합모형을 따른다고 가정하고, EM 알고리즘을 통해 모형 추정을 시도하였다. 추정 과정에 여러 난제가 있었으나, 근사적 방법으로 비교적 잘 극복할수 있었다. 그리고 수치실험을 통해 제안방법은 성공적으로 주어진 세 군집 즉 '군집족(族)'을 식별할수 있음을 보였다.

Keywords

References

  1. 김승구 (2007). Normal mixture model with general linear regressive restriction: Applied to Microarray Gene Clustering, <한국통계학회논문집>, 14, 205-213. https://doi.org/10.5351/CKSS.2007.14.1.205
  2. Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion), Journal of the Royal Statistical Society B, 39, 1-38.
  3. Green, P. J. (1990). On use of the EM algorithm for penalized likelihood estimation, Journal of Royal Statistical Society B, 52, 443-452.
  4. Levin, A., Lischinski, D. and Weiss, Y. (2008). A closed form solution to natural image matting, IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 228-242. https://doi.org/10.1109/TPAMI.2007.1177
  5. McLachlan, G. and Peel, D. (2000). Finite Mixture Models, John Wiley & Sons, Inc.
  6. Meng, X.-L. and Rubin, D. (1993). Maximum likelihood estimation via the ECM algorithm: A general framework, Biometrika, 80, 267-278. https://doi.org/10.1093/biomet/80.2.267
  7. Ng, S. K., McLachlan, G. J., Wang, K., Ben-Tovim, L. and Ng, S. W. (2006). A Mixture model with random-effects components for clustering correlated gene-expression profiles, Bioinformatics, 22, 1745-1752. https://doi.org/10.1093/bioinformatics/btl165
  8. Schwarz, G. (1978). Estimating the dimension of a model, Annals of Statistics, 6, 461-464. https://doi.org/10.1214/aos/1176344136
  9. Titterington, D. M., Smith, A. F. and Makov, U. E. (1994). Statistical Analysis of Finite Mixture Distributions, John Wiely & Sons.
  10. Wang, J. and Cohen, M. F. (2005). An interactive optimization approach for unified image segmentation and matting, ICCV 2005, 936-943.
  11. Wang, S. and Zhu, J. (2008). Variable selection for model-based high-dimensional clustering and its application to Microarray data, Bioinformatics, 64, 440-448.