Browse > Article
http://dx.doi.org/10.5351/KJAS.2011.24.2.315

Veri cation of Improving a Clustering Algorith for Microarray Data with Missing Values  

Kim, Su-Young (Center for Korean Studies Materials, The Academy of Korean Studies)
Publication Information
The Korean Journal of Applied Statistics / v.24, no.2, 2011 , pp. 315-321 More about this Journal
Abstract
Gene expression microarray data often include multiple missing values. Most gene expression analysis (including gene clustering analysis); however, require a complete data matric as an input. In ordinary clustering methods, just a single missing value makes one abandon the whole data of a gene even if the rest of data for that gene was intact. The quality of analysis may decrease seriously as the missing rate is increased. In the opposite aspect, the imputation of missing value may result in an artifact that reduces the reliability of the analysis. To clarify this contradiction in microarray clustering analysis, this paper compared the accuracy of clustering with and without imputation over several microarray data having different missing rates. This paper also tested the clustering efficiency of several imputation methods including our propose algorithm. The results showed it is worthwhile to check the clustering result in this alternative way without any imputed data for the imperfect microarray data.
Keywords
Microarray; gene expression; clustering; missing value;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Ouyang, M., Welsh, W. J. and Georgopoulos, P. (2004). Gaussian mixture clustering and imputation of microarray data, Bioinformatics, 20, 917-923.   DOI   ScienceOn
2 Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis, Journal of Computational and Applied Mathematics, 20, 53-65.   DOI   ScienceOn
3 Tibshirani, R., Walther, G. and Hastie, T. (2001). Estimating the number of data lusters via the gap statistic, Journal of the Royal Statistical Society: Series B, 63, 411-423.   DOI   ScienceOn
4 Troyanskaya,O. G., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D. and Altman, R. B. (2001). Missing value estimation methods for DNA microarrays, Bioinformatics, 17, 520-525.   DOI   ScienceOn
5 Alter, O., Brown, P. O. and Botstein, D. (2000). Singular value decomposition for genome-wide expression data processing and modeling, Proceedings of the National Academy of Sciences of the United States of America, 97, 10101-10106.   DOI
6 Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis, Wiley, New York.
7 Gan, X., Liew, A. and Yan, H. (2006). Microarray missing data imputation based on a set theoretic framework and biological knowledge, Nucleic Acids Research, 34, 1608-1619.   DOI   ScienceOn
8 Golub, G. H. and Van Loan, C. F. (1996). Matrix Computations, Johns Hopkins University Press, Baltimore, MD.
9 Hartigan, J. A. and Wong, M. A. (1979). Algorithm AS 136: A K-means clustering algorithm, Journal of the Royal Statistical Series C, 28, 100-108.   DOI   ScienceOn
10 Hedenfalk, I., Duggan, D., Chen, Y., Radmacher, M., Bittner, M., Simon, R., Meltzer, P., Gusterson, B., Esteller, M. and Mark, R. (2001). Gene-expression profiles in hereditary breast cancer, The New England Journal of Medicine, 344, 539-548.   DOI   ScienceOn
11 Le, K., Mitsouras, K., Roy, M., Wang, Q., Xu, Q., Nelson, S. F. and Lee, C. (2004). Detecting tissue-specific regulation of alternative splicing as a qualitative change in microarray data, Nucleic Acids Research, 32, e180.   DOI   ScienceOn
12 Kaufman, L. and Rousseeuw, P. J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis, John Wiley, New York.
13 Khan, J., Wei, J., Ringner, M., Saal, L., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C. R., Peterson, C. and Meltzer, P. (2001). Classification and diagnostic prediction of cancers using expression profiling and artificial neural networks, Nature Medicine, 7, 673-679.   DOI   ScienceOn
14 Kim, D. W., Lee, K. Y., Lee, K. H. and Lee, D. (2006). Towards clustering of incomplete microarray data without the use of imputation, Bioinformatics, 23, 107-113.
15 Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D. and Levine, A. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonu-cleotide arrays, Proceedings of the National Academy of Sciences of the United States of America, 96, 6745-6750.   DOI