Browse > Article
http://dx.doi.org/10.7465/jkdi.2012.23.1.039

Comparison of clustering methods of microarray gene expression data  

Lim, Jin-Soo (Department of Biological Sciences, Busan National University)
Lim, Dong-Hoon (Department of Information Statistics, Gyeongsang National University)
Publication Information
Journal of the Korean Data and Information Science Society / v.23, no.1, 2012 , pp. 39-51 More about this Journal
Abstract
Cluster analysis has proven to be a useful tool for investigating the association structure among genes and samples in a microarray data set. We applied several cluster validation measures to evaluate the performance of clustering algorithms for analyzing microarray gene expression data, including hierarchical clustering, K-means, PAM, SOM and model-based clustering. The available validation measures fall into the three general categories of internal, stability and biological. The performance of clustering algorithms is evaluated using simulated and SRBCT microarray data. Our results from simulated data show that nearly every methods have good results with same result as the number of classes in the original data. For the SRBCT data the best choice for the number of clusters is less clear than the simulated data. It appeared that PAM, SOM, model-based method showed similar results to simulated data under Silhouette with of internal measure as well as PAM and model-based method under biological measure, while model-based clustering has the best value of stability measure.
Keywords
Biological measure; cluster analysis; internal measure; microarray; stability measure;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 정윤경, 백장선 (2007). 고차원(유전자 발현) 자료에 대한 군집 타당성 분석 기법의 성능비교. <응용통계연구>, 20, 167-181.
2 주용성, 정형주, 김병준 (2009). 한국 기상자료의 군집분석: 베이지안 모델기반 방법의 응용. <한국데이터정보과학회지>, 20, 57-64.
3 황진수, 김지연 (2009). 마이크로어레이 자료에서 서포트 벡터 머신과 데이터 뎁스를 이용한 분류방법의 비교연구. <한국데이터정보과학회지>, 20, 311-319.
4 Brock, G., Pihur, V., Datta, S. and Datta, S. (2008). clValid: An R package for cluster validation. Journal of Statistical Software, 25, 1-21
5 Datta, S. and Datta, S. (2003). Comparisons and validation of statistical clustering techniques for microarray gene expression data. Bioinformatics, 19, 459-466.   DOI   ScienceOn
6 Datta, S. and Datta, S. (2006). Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes. BMC Bioinformatics, 7, 397.   DOI
7 Deshmukh, S. R. and Purohit, S. G. (2007). Microarray data: Statistical analysis using R, Alpha Science International Ltd, Oxford.
8 Dunn, J. C. (1974). Well separated clusters and fuzzy partitions. Journal on Cybernetics, 4, 95-104.   DOI   ScienceOn
9 Eisen, M. B., Spellman, T. P., Brown, P. O. and Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences of the United States of America, 95, 863-14868.
10 Fraley, C. and Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97, 611-631.   DOI   ScienceOn
11 Kaufman, L. and Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis, John Wiley & Sons, New York.
12 김재희, 고윤실 (2009). 군집분석 비교 및 한우 관능평가 데이터 군집화. <응용통계연구>, 22, 745-758.
13 여인권 (2011). 우리나라 기상자료에 대한 군집분석. <한국데이터정보과학회지>, 22, 941-949.
14 이경아, 김재희 (2011). 효모 마이크로어레이 유전자 발현 데이터에 대한 군집화 비교. <한국데이터정보과학회지>, 22, 741-753.
15 Hartigan, J. A. and Wong, M. A. (1979). Algorithm AS 136: A k-means clustering algorithm. Applied Statistics, 28, 100-108.   DOI   ScienceOn
16 Khan, J., Wei, S., Ringer, M., Saal, L. H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Anyonescu, C. R., Peterson, C. and Meltzer, P. S. (2001). Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine, 7, 673-679   DOI   ScienceOn
17 Kohonen, T. (1997). Self-organizing maps, Springer-Verlag, New York.
18 Handl, J., Knowles, J. and Kell, D. B. (2005). Computational cluster validation in post-genomic data analysis. Bioinformatics, 21, 3201-3212.   DOI   ScienceOn
19 He, Y., Pan, W. and Lin, J. (2006). Cluster analysis using multivariate normal mixture models to detect differential gene expression with microarray data. Computational Statistics & Data Analysis, 51, 641-658   DOI   ScienceOn
20 Liu, Y. and Ringner, M. (2004). Multiclass discovery in array data. BMC Bioinformatics, 5, 70-79.   DOI   ScienceOn
21 Rousseeuw, P. J. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53-65.   DOI   ScienceOn
22 Yeung, K. Y., Haynor, D. R. and Ruzzo, W. L. (2001a). Validating clustering for gene expression data. Bioinformatics, 17, 309-318.   DOI   ScienceOn
23 Yeung, K. Y., Fraley, C., Murua, A., Raftery, A. E. and Ruzzo, W. L. (2001b). Model-based clustering and data transformations for gene expression data. Bioinformatics, 17, 977-987.   DOI   ScienceOn