DOI QR코드

DOI QR Code

A study on alternatives to the permutation test in gene-set analysis

유전자집합분석에서 순열검정의 대안

  • Lee, Sunho (Division of Mathematics and Statistics, Sejong University)
  • 이선호 (세종대학교 수학통계학부)
  • Received : 2018.01.23
  • Accepted : 2018.02.26
  • Published : 2018.04.30

Abstract

The analysis of gene sets in microarray has advantages in interpreting biological functions and increasing statistical powers. Many statistical methods have been proposed for detecting significant gene sets that show relations between genes and phenotypes, but there is no consensus about which is the best to perform gene sets analysis and permutation based tests are considered as standard tools. When many gene sets are tested simultaneously, a large number of random permutations are needed for multiple testing with a high computational cost. In this paper, several parametric approximations are considered as alternatives of the permutation distribution and the moment based gene set test has shown the best performance for providing p-values of the permutation test closely and quickly on a general framework.

마이크로어레이 자료의 유전자집합분석은 개별유전자분석에 비해 검정력도 높일 수 있고 결과 해석이 쉬워서 이에 대한 연구가 활발히 진행되어 왔다. 표현형에 따라 유의한 차이를 보이는 유전자집합의 검색은 검정통계량들이 유도된 배경에 따라 결과에 차이를 보이지만 대체적으로 t-통계량의 제곱합을 이용한 순열검정이 제일 무난한 방법으로 여겨진다. 그러나 유전자집합분석에서 다중검정은 필수이고 많은 집합들의 유의성에 변별력을 주기 위해서는 순열검정에서 생성하는 치환표본의 수가 많이 필요하고 시간이 오래 걸린다는 문제점이 있다. 순열검정을 대신할 모수적 방법들을 검토한 결과, 적률을 이용한 근사가 각 집합의 유의확률 계산시간도 훨씬 단축하며 순열검정에서 구한 유의확률과 크기와 순위가 거의 일치함을 확인하였다.

Keywords

References

  1. Ackermann, M. and Strimmer, K. (2009). A general modular framework for gene set enrichment analysis, BMC Bioinformatics, 10, 47. https://doi.org/10.1186/1471-2105-10-47
  2. Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing, Journal of the Royal Statistical Society. Series B (Methodological), 57, 289-300.
  3. Bolstad, B. M., Irizarry, R. A., Astrand, M., and Speed, T. P. (2003). A comparison of normalization methods for high density oligonucleotide array data based on variance and bias, Bioinformatics, 19, 185-193. https://doi.org/10.1093/bioinformatics/19.2.185
  4. Bonferroni, C. (1936). Teoria statistica delle classi e calcolo delle probabilita, Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenze, 8, 3-62.
  5. Chiaretti, S., Li, X., Gentleman, R., Vitale, A., Vignetti, M., Mandelli, F., Ritz, J., and Foa, R. (2004). Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival, Blood, 103, 2771-2778. https://doi.org/10.1182/blood-2003-09-3243
  6. Chiaretti S., Li, X., Gentleman, R., Vitale, A., Wang, K. S., Mandelli, F., Foa, R., and Ritz, J. (2005). Gene expression profiles of B-lineage adult acute lymphocytic leukemia reveal genetic patterns that identify lineage derivation and distinct mechanisms of transformation, Clinical Cancer Research, 11, 7209-7219. https://doi.org/10.1158/1078-0432.CCR-04-2165
  7. Efron, B. and Tibshirani, R. (2007). On testing the significance of sets of genes, The Annals of Applied Statistics, 1, 107-129. https://doi.org/10.1214/07-AOAS101
  8. Irizarry R. A., Wang, C., Zhou, Y., and Speed, T. P. (2009). Gene set enrichment analysis made simple, Statistical Methods in Medical Research, 18, 565-575. https://doi.org/10.1177/0962280209351908
  9. Kim, S.-Y. and Volsky, D. J. (2005). PAGE: Parametric analysis of gene set enrichment, BMC Bioinformatics, 6, 144. https://doi.org/10.1186/1471-2105-6-144
  10. Larson, J. L. and Owen, A. B. (2015). Moment based gene set tests, BMC Bioinformatics, 16, 132. https://doi.org/10.1186/s12859-015-0571-7
  11. Mooney, M. A. and Wilmot, B. (2015). Gene set analysis: a step-by-step guide, American Journal of Medical Genetics. Part B, Neuropsychiatric Genetics, 168, 517-527. https://doi.org/10.1002/ajmg.b.32328
  12. Mootha, V. K., Lindgren, C. M., Eriksson, K. F., et al. (2003). PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes, Nature Genetics, 34, 267-273. https://doi.org/10.1038/ng1180
  13. Rahmatallah, Y., Emmert-Streib, F., and Glazko, G. (2014). Gene sets net correlations analysis (GSNCA): a multivariate differential coexpression test for gene sets, Bioinformatics, 30, 360-368. https://doi.org/10.1093/bioinformatics/btt687
  14. Subramanian, A., Tamayo, P., Mootha, V. K., et al. (2005). Gene set enrichment analysis: a knowledge- based approach for interpreting genome-wide expression profiles. In Proceedings of the National Academy of Sciences, 102, 15545-15550. https://doi.org/10.1073/pnas.0506580102
  15. Tan, Y. D., Fornage, M., and Fu, Y. X. (2006). Ranking analysis of microarray data: a powerful method for identifying differentially expressed genes, Genomics, 88, 846-554. https://doi.org/10.1016/j.ygeno.2006.08.003
  16. Tian, L., Greenberg, S. A., Kong, S. W., Altschuler, J., Kohane, I. S., and Park, P. J. (2005). Discovering statistically significant pathways in expression profiling studies. In Proceedings of the National Academy of Sciences, 102, 13544-13549. https://doi.org/10.1073/pnas.0506577102
  17. Tusher, V. G., Tibshirani, R., and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. In Proceedings of the National Academy of Sciences, 98, 5116-5121. https://doi.org/10.1073/pnas.091062498
  18. Yekutieli, D. and Benjamini, Y. (1999). Resampling based false discovery rate controlling multiple test procedure for correlated test statistics, Journal of Statistical Planning and Inference, 82, 171-196. https://doi.org/10.1016/S0378-3758(99)00041-5
  19. Zahn, J., Sonu, R., Vogel, H., et al. (2006). Transcriptional profiling of aging in human muscle reveals a common aging signature, PLoS Genetics, 2, e115. https://doi.org/10.1371/journal.pgen.0020115