DOI QR코드

DOI QR Code

Screening and Clustering for Time-course Yeast Microarray Gene Expression Data using Gaussian Process Regression

효모 마이크로어레이 유전자 발현데이터에 대한 가우시안 과정 회귀를 이용한 유전자 선별 및 군집화

  • Kim, Jaehee (Department of Statistics, Duksung Women's University) ;
  • Kim, Taehoun (Department of PrePharmMed, Duksung Women's University)
  • 김재희 (덕성여자대학교 정보통계학과) ;
  • 김태훈 (덕성여자대학교 프리팜메드학과)
  • Received : 2012.12.28
  • Accepted : 2013.04.26
  • Published : 2013.06.30

Abstract

This article introduces Gaussian process regression and shows its application with time-course microarray gene expression data. Gene screening for yeast cell cycle microarray expression data is accomplished with a ratio of log marginal likelihood that uses Gaussian process regression with a squared exponential covariance kernel function. Gaussian process regression fitting with each gene is done and shown with the nine top ranking genes. With the screened data the Gaussian model-based clustering is done and its silhouette values are calculated for cluster validity.

본 연구에서는 가우시안 과정회귀방법을 소개하고 시계열 마이크로어레이 유전자 발현데이터에 대해 가우시안 과정회귀를 적용한 사례를 보이고자한다. 가우시안 과정회귀를 적합하여 로그 주변우도함수 비를 이용한 유전자를 선별방법에 대한 모의실험을 통해 민감도, 특이도, 위발견율 등을 계산하여 선별방법으로의 활용성을 보였다. 실제 효모세포주기 데이터에 대해 제곱지수공분산함수를 고려한 가우시안 과정회귀를 적합하여 로그 주변우도함수 비를 이용하여 차변화된 유전자를 선별한 후, 선별된 유전자들에 대해 가우시안 모형기반 군집화를 하고 실루엣 값으로 군집유효성을 보였다.

Keywords

References

  1. Eckel, J. E., Gennings, C., Chinchilli, V. M., Burgoon, L. D. and Zacharewski, T. R. (2004). Empirical Bayes gene screening tool for time-course or dose-response microarray data, Journal of Biopharmaceutical Statistics, 14, 647-670. https://doi.org/10.1081/BIP-200025656
  2. Fraley, C. and Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation, Journal of the American Statistical Association, 97, 611-631. https://doi.org/10.1198/016214502760047131
  3. Fraley, C. and Raftery, A. E. (2006). MCLUST Version 3 for R: Normal mixture modeling and model-based clustering. Technical Report No. 504.
  4. Fraley, C. and Raftery, A. E. (2007). Bayesian regularization for normal mixture estimation and model-based clustering, Journal of Classication, 24, 155-181.
  5. Hero, A. O., Fleury, G., Mears, A. J. and Swaroop, A. (2004). Multicriteria gene screening for analysis of differential expression with DNA microarrays, Journal on Applied Signal processing, 2004, 43-52. https://doi.org/10.1155/S1110865704310036
  6. Kalaitzis, A. and Lawrence, N. (2011). A simple approach to ranking differentially expressed gene expression time courses through Gaussian process regression, BMC Bioinformatics, 12, 180. https://doi.org/10.1186/1471-2105-12-180
  7. Kim, J. and Kim, H. (2008). Clustering of change using Fourier coefficient, Bioinformatics, 24, 184-191. https://doi.org/10.1093/bioinformatics/btm568
  8. Lee, K., Kim, T. and Kim, J. (2011). Gene screening and clustering of yeast microarray gene expression data, The Korean Journal of Applied Statistics, 24, 1077-1094. https://doi.org/10.5351/KJAS.2011.24.6.1077
  9. Ma, S. (2006). Empirical study of supervised gene screening, BMC Bioinformatics, 7, 537. https://doi.org/10.1186/1471-2105-7-537
  10. Rasmussen, C. E. and Williams, C. K. (2005). Gaussian Processes for Machine Learning, MIT Press
  11. Rousseeuw, P. T. (1987). Silhouettes: Graphical aid to the interpretation and validation of cluster analysis. Journal of Computation Applied Math, 20, 53-65. https://doi.org/10.1016/0377-0427(87)90125-7
  12. Serban, N. and Wasserman, L. (2005). CATS: Clustering after transformation and smoothing, Journal of the American Statistical Association, 471, 990-999.
  13. Spellman, P. T., Sherlock, G., Zhang, M. Q., Iyer, V. R., Anders, K., Eisen, M. B., Brown, P. O., Botstein, D. and Futcher, B. (1998). Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization, Molecular Biology of the Cell, 9, 3273-3297. https://doi.org/10.1091/mbc.9.12.3273
  14. Toronen, R., Kolehmainen, M., Wong, G. and Castren, E. (1999). Analysis of gene expression data using self-organizing maps, Federation of European Biochemical Societies, 451, 142-146. https://doi.org/10.1016/S0014-5793(99)00524-4
  15. Zhang, L., Zhang, A. and Ramanathan, M. (2003). Fourier harmonic approach for visualizing temporal patterns of gene expression data, IEEE Computer Society Bioinformatics Conference, 2, 137-147.