퍼지 클러스터링의 베이지안 검증 방법을 이용한 발아효모 세포주기 발현 데이타의 분석

Analysis of Saccharomyces Cell Cycle Expression Data using Bayesian Validation of Fuzzy Clustering

  • 유시호 (연세대학교 컴퓨터과학과) ;
  • 원홍희 (연세대학교 컴퓨터과학과) ;
  • 조성배 (연세대학교 컴퓨터과학과)
  • 발행 : 2004.12.01

초록

유전자를 분석하는 방법 중 하나인 클러스터링은 비슷한 기능을 가진 유전자들을 집단화시켜서 유전자 집단의 기능을 분석하는데 이용되고 있다. 유전자들은 다양한 functional family에 속할 수 있기 때문에 각 유전자의 클러스터를 하나로 결정짓는 기존의 클러스터링 방법보다 퍼지 클러스터링 방법이 유전자 클러스터링에 더 적합하다. 본 논문에서는 피지 클러스터 결과를 효과적으로 검증할 수 있는 베이지안 검증 방법을 제안한다. 베이지안 검증 방법은 확률기반의 방법으로 주어진 데이타에 대해 가장 큰 사후확률을 가진 클러스터 분할을 선택한다. 먼저 본 논문에서 제안하는 베이지안 검증 방법과 기존의 대표적인 4가지 퍼지 클러스터 검증 방법들을 4가지 데이타에 대해 퍼지 c-means알고리즘을 대상으로 비교 평가한다. 그리고 발아효모 세포주기 발현 데이타를 클러스터링한 후, 제안하는 방법으로 그 결과를 검증하여 분석한다.

Clustering, a technique for the analysis of the genes, organizes the patterns into groups by the similarity of the dataset and has been used for identifying the functions of the genes in the cluster or analyzing the functions of unknown gones. Since the genes usually belong to multiple functional families, fuzzy clustering methods are more appropriate than the conventional hard clustering methods which assign a sample to a group. In this paper, a Bayesian validation method is proposed to evaluate the fuzzy partitions effectively. Bayesian validation method is a probability-based approach, selecting a fuzzy partition with the largest posterior probability given the dataset. At first, the proposed Bayesian validation method is compared to the 4 representative conventional fuzzy cluster validity measures in 4 well-known datasets where foray c-means algorithm is used. Then, we have analyzed the results of Saccharomyces cell cycle expression data evaluated by the proposed method.

키워드

참고문헌

  1. U. Alon, N. Barkai, D. A. Notterman, K. Gish, Y. Barra, D. Mack and A. J. Levine, 'Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays,' Proceedings of National Academy of Science USA, vol. 96, pp. 6745-6750, 1999 https://doi.org/10.1073/pnas.96.12.6745
  2. A. P. Gasch and M. B. Eisen, 'Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering,' Genome Biology, vol. 3, no. 11, research 0059.1-0059.22, 2002
  3. F. Hoppner, F. Klawonn, R. Kruse and T. Runkler, Fuzzy Cluster Analysis, Wiley, 2000
  4. J. C. Bezdeck, 'Numerical taxonomy with fuzzy sets,' J. Math. Biology, vol. 1, pp. 58-72, 1974
  5. J. C. Bezdeck, 'Cluster validity with fuzzy sets,' J. Cybernit, vol. 3, pp. 58-72, 1974
  6. X. L. Xie and G. Beni, 'A validity measure for fuzzy clustering,' IEEE Trnas. Pattern Analysis and Machine Intelligence, vol. 3, no. 3, pp. 841-846, 1991 https://doi.org/10.1109/34.85677
  7. D. W. Kim, K. H. Lee and D. H. Lee, 'Fuzzy cluster validation index based on inter-cluster proximity,' Pattern Recognition Letters, vol. 24. pp. 2561-2574, 2003 https://doi.org/10.1016/S0167-8655(03)00101-6
  8. I. Gath and A.B. Geva, 'Unsupervised optimal fuzzy clustering,' IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 11, pp. 773-781, 1989 https://doi.org/10.1109/34.192473
  9. J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press, 1989
  10. S. L. Chiu, 'Fuzzy model identification based on cluster estimation,' J. Intelligent and Fuzzy Systems, vol. 2, no. 3, pp. 267-278, 1994
  11. Y. Fukuyama and M. Sugeno, 'A new method of choosing the number of clusters for the fuzzy c-means method,' Proceedings of 5th Fuzzy Systems Symposium, pp. 247-250, 1989
  12. D. Dembele and P. Kastner, 'Fuzzy c-means method for clustering microarray data,' Bioinformatics, vol. 19, no. 8, pp. 973-980, 2003 https://doi.org/10.1093/bioinformatics/btg119
  13. K. Y. Yeung, et al., 'Validating clustering for gene expression data,' Bioinformatics, vol. 17, no. 4, pp. 309-318, 2001 https://doi.org/10.1093/bioinformatics/17.4.309
  14. N. Bolshakova and F. Azuaje, 'Cluster validation techniques for genome expression data,' Signal Processing, vol. 21, no. 82, pp. 1-9. 2002
  15. A. P. Gasch and M. B. Eisen, 'Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering,' Genome Biology, vol. 3, no. 11, research 0059.1-0059.22, 2002
  16. N.R. Pal and J.C. Bezdek : On cluster validity for the fuzzy c-means model, IEEE Transactions on Fuzzy Systems, Vol. 3, No. pp.3370-379, 1995 https://doi.org/10.1109/91.413225
  17. M. R. Rezaee, B. P. F. Lelieveldt and J. H. C. Reiber, 'A new cluster validity index for the fuzzy c-means,' Pattern Recognition Letters, vol. 19, pp. 237-246, 1998 https://doi.org/10.1016/S0167-8655(97)00168-2
  18. Y. Barash and N. Friedman, 'Context-specific Bayesian clustering for gene expression data,' Journal of Computational Molecular Cell Biology, vol. 9, no. 2, pp. 12-21, 2001
  19. J. Khan, J. S. Wei, M. Ringner, L. H. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C. R. Antonescu, C. Peterson and P. S. Meltzer, 'Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks,' Nature Medicine, vol. 7, pp. 673-679, 2001 https://doi.org/10.1038/89044
  20. R. J. Cho, M. J. Campbell, E. A. Winzeler, L. Steinmetz, A. Conway, L. Wodicka, T. G. Wolfsberg, A. E. Gabrielian, D. Landsman, D. J. Lockhart and R. W. Davis, 'A genome-wide transcriptional analysis of the mitotic cell cycle,' Molecular Cell, vol. 2, pp. 65-73, 1998 https://doi.org/10.1016/S1097-2765(00)80114-8