DOI QR코드

DOI QR Code

A Report on the Inter-Gene Correlations in cDNA Microarray Data Sets

cDNA 마이크로어레이에서 유전자간 상관 관계에 대한 보고

  • Published : 2009.06.30

Abstract

A series of recent papers reported that the inter-gene correlations in Affymetrix microarray data sets were strong and long-ranged, and the assumption of independence or weak dependence among gene expression signals which was often employed without justification was in conflict with actual data. Qui et al. (2005) indicated that applying the nonparametric empirical Bayes method in which test statistics were pooled across genes for performing the statistical inference resulted in the large variance of the number of differentially expressed genes. Qui et al. (2005) attributed this effect to strong and long-ranged inter-gene correlations. Klebanov and Yakovlev (2007) demonstrated that the inter-gene correlations provided a rich source of information rather than being a nuisance in the statistical analysis and they developed, by transforming the original gene expression sequence, a sequence of independent random variables which they referred to as a ${\delta}$-sequence. We note in this report using two cDNA microarray data sets experimented in this country that the strong and long-ranged inter-gene correlations were still valid in cDNA microarray data and also the ${\delta}$-sequence of independence could be derived from the cDNA microarray data. This note suggests that the inter-gene correlations be considered in the future analysis of the cDNA microarray data sets.

최근에 보고되는 일련의 연구는 Affymetrix 마이크로어레이 자료에서 유전자간 상관관계가 강하고 장범위(長範圍)(long-ranged)로 나타나고 있으며, 기존의 "편한" 가정, 즉 유전자간 상관관계가 매우 약하며, 따라서 유전자간 유사 독립성을 가정할 수 있다는 주장이 비현실적이라는 것을 보고하고 있다. Qui 등 (2005b)은 각 유전자의 검정통계량을 병합하여 통계적 추론을 하는 이른바 비모수적 경험적 베이즈 방법을 적용하면 검색된 특이발현 유전자수의 분산이 커진다는 것을 보고하고 있고, 이러한 분산의 불안전성 이유로서 유전자간 강한 상관관계를 지적하고 있다. 또한 Klebanov와 Yakovlev (2007)는 유전자간 상관관계가 통계적 분석을 어렵게 하는 요인이라기 보다는 유용한 정보의 원천이고 적정한 변환을 통하여 근사 독립을 유지할 수 있는 급수를 만들 수 있으며 이 급수를 ${\delta}$-급수라고 불렀다. 본 보고에서는 국내에서 생산된 2조의 cDNA 마이크로어레이 자료에서 유전자간 상관관계가 비교적 강하며, 장범위(長範圍)로 나타나는 것을 확인하며, 유사 독립성을 전제할 수 있는 ${\delta}$-급수가 cDNA 마이크로어레이에서도 발견되는 것을 보고하고자 한다, 동 보고는 추후 cDNA 마이크로어레이 자료의 분석에서도 유전자간 상관관계를 고려하여야 함을 강조하고 있다.

Keywords

References

  1. Efron, B. (2003). Robbins, empirical Bayes and microarrays, The Annals of Statistics, 31, 366-378 https://doi.org/10.1214/aos/1051027871
  2. Efron, B. (2004). Large-scale simultaneous hypothesis testing: The choice of a null hypothesis, Journal of the American Statistical Association, 99, 96-104 https://doi.org/10.1198/016214504000000089
  3. Efron, B. (2007). Correlation and large-scale simultaneous significance testing, Journal of the American Statistical Association, 102, 93-103 https://doi.org/10.1198/016214506000001211
  4. Efron, B., Tibshirani, R., Storey, J. D. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment, Journal of the American Statistical Association, 96, 1151-1160 https://doi.org/10.1198/016214501753382129
  5. Frantz,S. (2005). An array of problems, Nature Reviews Drug Discovery, 4, 302-303 https://doi.org/10.1038/nrd1746
  6. Kim, B. S., Kim, I., Lee, S., Kim, S., Rha, S. Y. and Chung, H. C. (2005). Statistical methods of translating microarray data into clinically relevant diagnostic information in colorectal cancer, Bioinformatics, 21, 517-528 https://doi.org/10.1093/bioinformatics/bti029
  7. Klebanov, L., Jordan, C. and Yakovlev, A. (2006). A new type of stochastic dependence revealed in gene expression data, Statistical Applications in Genetics and Molecular Biology, 5, Ariticle 7
  8. Klebanov, L. and Yakovlev, A. (2006). Treating expression levels of different genes as a sample in microarray data analysis: Is it worth a risk?, Statistical Applications in Genetics and Molecular Biology, 5, Ariticle 9
  9. Klebanov, L. and Yakovlev, A. (2007). Diverse correlation structures in gene expression data and their utility in improving statistical inference, The Annals oj Applied Statistics, 1, 538-559 https://doi.org/10.1214/07-AOAS120
  10. Marshall, E. (2004). Getting the noise out of gene arrays, Science, 306, 630-631 https://doi.org/10.1126/science.306.5696.630
  11. Qui, X., Brooks, A. I., Klebanov, L. and Yakovlev, A. (2005a). The effects of normalization on the correlation structure of microarray data, BMC Bioinformatics, 6, 120 https://doi.org/10.1186/1471-2105-6-120
  12. Qui, X., Klebanov, L. and Yakovlev, A. (2005b). Correlation between gene expression levels and limitations of the empirical Bayes methodology for finding differentially expressed genes, Statistical Applications in Genetics and Molecular Biology, 4, Ariticle 34
  13. Qui, X., Xiao, Y., Gordon, A. and Yakovlev, A. (2006). Assessing stability of gene selection in microarray data analysis, BMC Bioinformatics, 7, 50 https://doi.org/10.1186/1471-2105-7-50
  14. Qui, X. and Yakovlev, A. (2006). Some comments on instability of false discovery rate estimation, Journal of Bioinformatics and Computational Biology, 4, 1057-1068 https://doi.org/10.1142/S0219720006002338
  15. Stolovitzky, G. (2003). Gene selection in microarray data: The elephant, the blind men and our algorithm, Current Opinions in Structural Biology, 13, 370-376 https://doi.org/10.1016/S0959-440X(03)00078-2
  16. Yang,S., Jeung, H. C., Jeong, H. J., Choi, Y. H., Kim, J. E., Jung, J. J., Rha, S. Y., Yang, W. I. and Chung, H. C. (2007a). Identification of genes with correlated patterns of variations in DNA copy number and gene expression level in gastric cancer, Genomics, 89, 451-459 https://doi.org/10.1016/j.ygeno.2006.12.001
  17. Yang, S., Shin, J., Park, K. H., Jeung, H-C., Rha, S. Y., Noh, S. H., Yang, W. I. and Chung, H. C. (2007b). Molecular basis of the difference between normal and tumor tissues of gastric cancer, Biochimica et Biophysica Acta, 1772, 1033-1040 https://doi.org/10.1016/j.bbadis.2007.05.005

Cited by

  1. Identifying statistically significant gene sets based on differential expression and differential coexpression vol.29, pp.3, 2016, https://doi.org/10.5351/KJAS.2016.29.3.437