DOI QR코드

DOI QR Code

Gene Screening and Clustering of Yeast Microarray Gene Expression Data

효모 마이크로어레이 유전자 발현 데이터에 대한 유전자 선별 및 군집분석

  • Lee, Kyung-A (Department of Statistics, Duksung Women's University) ;
  • Kim, Tae-Houn (Department of PrePharmMed Duksung Women's University) ;
  • Kim, Jae-Hee (Department of Statistics, Duksung Women's University)
  • 이경아 (덕성여자대학교 정보통계학과) ;
  • 김태훈 (덕성여자대학교 PrePharmMed학과) ;
  • 김재희 (덕성여자대학교 정보통계학과)
  • Received : 20110900
  • Accepted : 20111100
  • Published : 2011.12.31

Abstract

We accomplish clustering analyses for yeast cell cycle microarray expression data. To reflect the characteristics of a time-course data, we screen the genes using the test statistics with Fourier coefficients applying a FDR procedure. We compare the results done by model-based clustering, K-means, PAM, SOM, hierarchical Ward method and Fuzzy method with the yeast data. As the validity measure for clustering results, connectivity, Dunn index and silhouette values are computed and compared. A biological interpretation with GO analysis is also included.

마이크로어레이 유전자 발현 데이터인 yeast cdc15에 대해 시계열 데이터의 특성을 반영한 푸리에 계수를 이용한 검정통계량과 FDR 다중비교법을 이용하여 차별화된 유전자를 선별한 후 선별된 유전자들에 대해 모형기반 군집방법, K-평균법, PAM, SOM, 계층적 Ward 군집방법과 Fuzzy 군집방법을 실시하였다. 군집방법에 따른 특성을 알아보고 군집화 결과와 내부유효성 측도로 연결성 측도, Dunn 지수와 실루엣 값을 살펴본다. 또한 GO분석을 통한 생물학적 의미도 파악해본다.

Keywords

References

  1. 김재희 (2011). R 다변량 통계 분석, 교우사, 서울
  2. 김재희, 고윤실 (2009). 군집분석 비교 및 한우 관능평가데이터 군집화, 응용통계 연구, 22, 745-758. https://doi.org/10.5351/KJAS.2009.22.4.745
  3. Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing, Journal of the Royal Statistical Society: Series B, 57, 289-300.
  4. Bickel, D. R. (2011). Estimating the null distribution to adjust observed confidence levels for genome-scale screening, Bioinformatics, 67, 363-370.
  5. Datta, S. and Datta, S. (2005). Empirical Bayes screening of many p-values with application to microarray studies, Bioinformatics, 21, 1987-1994. https://doi.org/10.1093/bioinformatics/bti301
  6. Dudoit, S., Shaffer, J. P. and Boldrick, J. C. (2003). Multiple hypothesis testing in microarray experiments, Statistical Science, 18, 71-103. https://doi.org/10.1214/ss/1056397487
  7. Dunn (1974). Well-separated clusters and optimal fuzzy partitions, Journal of Cybernetics, 4, 95-104. https://doi.org/10.1080/01969727408546059
  8. Eckel, J. E., Gennings, C., Chinchilli, V. M., Burgoon, L. D. and Zacharewski, T. R. (2004). Empirical Bayes gene screening tool for time-course or dose-response microarray data, Journal of Biopharmaceutical Statistics, 14, 647-670. https://doi.org/10.1081/BIP-200025656
  9. Fraley, C. and Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation, Journal of the American Statistical Association, 97, 611-631. https://doi.org/10.1198/016214502760047131
  10. Fraley, C. and Raftery, A. E. (2006). MCLUST Version 3 for R: Normal mixture modeling and model-based clustering, Technical Report No. 504.
  11. Gentleman, R., Caray, V. J., Huber, W., Irizarry, R. A. and Dudoit, S. (2005). Bioinformatics and computational biology solutions using R and bioconductor, Spinger, New York.
  12. Getz, G., Levine, E., Domany, E. and Zhang, M. Q. (2000). Super-paramagneic clustering of yeast expression profiles. Physica, A279, 457-464.
  13. Handl, J., Knowles, J. and Kell, D. B. (2005). Computational cluster validation in post-genomic data analysis. Bioinformatics, 21, 3201-3212. https://doi.org/10.1093/bioinformatics/bti517
  14. Huang, D. W., Sherman, B. T. and Lempicki, R. A.(2009). Systematic and integrative analysis of large gene lists using DAVID Bioinformatics Resources, Nature Protocols, 4, 44-57. https://doi.org/10.1038/nprot.2008.211
  15. Hero, A. O., Fleury, G., Mears, A. J. and Swaroop, A. (2004). Multicriteria gene screening for analysis of differential expression with DNA microarrays, Journal on Applied Signal processing, 2004, 43-52. https://doi.org/10.1155/S1110865704310036
  16. Izenman, A. J. (2008). Modern Multivariate Statistical Techniques, Spinger, New York.
  17. Kim, B. R., Littell, R. C. and Wu, R. (2006). Clustering periodic patterns of gene expression based on fourier appoximations, Current Genomics, 7, 197-203. https://doi.org/10.2174/138920206777780229
  18. Kim, J. and Hart, J. D. (1998). Test for change when the data are dependent, Journal of Time Series, 19, 399-424. https://doi.org/10.1111/1467-9892.00100
  19. Kim, J. and Kim, H. (2008). Clustering of change using Fourier coefficient, Bioinformatics, 24, 184-191. https://doi.org/10.1093/bioinformatics/btm568
  20. Kim, J., Ogden, R. T. and Kim, H. (2011). A method of identify differential expression profile with timecourse gene data and Fourier transformation, BMC Bioinformatics, in revision.
  21. Kaufman, L. and Rousseeuw, P. J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis, Wiley, New York.
  22. Kohonen, T. (1998). The self-organizing map, Neurocomputing, 21, 1-6. https://doi.org/10.1016/S0925-2312(98)00030-7
  23. Ma, S. (2006). Empirical study of supervised gene screening, BMC Bioinformatics, 7, 537. https://doi.org/10.1186/1471-2105-7-537
  24. Rousseeuw, P. T. (1987). Silhouettes: Graphical aid to the interpretation and validation of cluster analysis, Journal of Computation Applied Math, 20, 53-65. https://doi.org/10.1016/0377-0427(87)90125-7
  25. Serban, N. and Wasserman, L. (2005). CATS: Clustering after transformation and smoothing, Journal of the American Statistical Association, 471, 990-999.
  26. Tusher, V. G., Tibshirani, R. and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response, Proceedings of the National Academy of Sciences of the United States of America, 98, 5116-5121. https://doi.org/10.1073/pnas.091062498
  27. Toronen, R., Kolehmainen, M., Wong, G. and Castren, E. (1999). Analysis of gene expression data using self-organizing maps, Federation of European Biochemical Societies, 451, 142-146. https://doi.org/10.1016/S0014-5793(99)00524-4
  28. Zhang, L., Zhang, A. and Ramanathan, M. (2003). Fourier harmonic approach for visualizing temporal patterns of gene expression data, IEEE Computer Society Bioinformatics Conference, 2, 137-147.

Cited by

  1. Screening and Clustering for Time-course Yeast Microarray Gene Expression Data using Gaussian Process Regression vol.26, pp.3, 2013, https://doi.org/10.5351/KJAS.2013.26.3.389