[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.5351/KJAS.2011.24.2.315

Veri cation of Improving a Clustering Algorith for Microarray Data with Missing Values

Kim, Su-Young (Center for Korean Studies Materials, The Academy of Korean Studies)

Publication Information

The Korean Journal of Applied Statistics / v.24, no.2, 2011 , pp. 315-321 More about this Journal

Abstract

Gene expression microarray data often include multiple missing values. Most gene expression analysis (including gene clustering analysis); however, require a complete data matric as an input. In ordinary clustering methods, just a single missing value makes one abandon the whole data of a gene even if the rest of data for that gene was intact. The quality of analysis may decrease seriously as the missing rate is increased. In the opposite aspect, the imputation of missing value may result in an artifact that reduces the reliability of the analysis. To clarify this contradiction in microarray clustering analysis, this paper compared the accuracy of clustering with and without imputation over several microarray data having different missing rates. This paper also tested the clustering efficiency of several imputation methods including our propose algorithm. The results showed it is worthwhile to check the clustering result in this alternative way without any imputed data for the imperfect microarray data.

Keywords

Microarray; gene expression; clustering; missing value;

Citations & Related Records

Reference

1	Ouyang, M., Welsh, W. J. and Georgopoulos, P. (2004). Gaussian mixture clustering and imputation of microarray data, Bioinformatics, 20, 917-923. DOI ScienceOn
2	Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis, Journal of Computational and Applied Mathematics, 20, 53-65. DOI ScienceOn
3	Tibshirani, R., Walther, G. and Hastie, T. (2001). Estimating the number of data lusters via the gap statistic, Journal of the Royal Statistical Society: Series B, 63, 411-423. DOI ScienceOn
4	Troyanskaya,O. G., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D. and Altman, R. B. (2001). Missing value estimation methods for DNA microarrays, Bioinformatics, 17, 520-525. DOI ScienceOn
5	Alter, O., Brown, P. O. and Botstein, D. (2000). Singular value decomposition for genome-wide expression data processing and modeling, Proceedings of the National Academy of Sciences of the United States of America, 97, 10101-10106. DOI
6	Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis, Wiley, New York.
7	Gan, X., Liew, A. and Yan, H. (2006). Microarray missing data imputation based on a set theoretic framework and biological knowledge, Nucleic Acids Research, 34, 1608-1619. DOI ScienceOn
8	Golub, G. H. and Van Loan, C. F. (1996). Matrix Computations, Johns Hopkins University Press, Baltimore, MD.
9	Hartigan, J. A. and Wong, M. A. (1979). Algorithm AS 136: A K-means clustering algorithm, Journal of the Royal Statistical Series C, 28, 100-108. DOI ScienceOn
10	Hedenfalk, I., Duggan, D., Chen, Y., Radmacher, M., Bittner, M., Simon, R., Meltzer, P., Gusterson, B., Esteller, M. and Mark, R. (2001). Gene-expression profiles in hereditary breast cancer, The New England Journal of Medicine, 344, 539-548. DOI ScienceOn
11	Le, K., Mitsouras, K., Roy, M., Wang, Q., Xu, Q., Nelson, S. F. and Lee, C. (2004). Detecting tissue-specific regulation of alternative splicing as a qualitative change in microarray data, Nucleic Acids Research, 32, e180. DOI ScienceOn
12	Kaufman, L. and Rousseeuw, P. J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis, John Wiley, New York.
13	Khan, J., Wei, J., Ringner, M., Saal, L., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C. R., Peterson, C. and Meltzer, P. (2001). Classification and diagnostic prediction of cancers using expression profiling and artificial neural networks, Nature Medicine, 7, 673-679. DOI ScienceOn
14	Kim, D. W., Lee, K. Y., Lee, K. H. and Lee, D. (2006). Towards clustering of incomplete microarray data without the use of imputation, Bioinformatics, 23, 107-113.
15	Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D. and Levine, A. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonu-cleotide arrays, Proceedings of the National Academy of Sciences of the United States of America, 96, 6745-6750. DOI