DOI QR코드

DOI QR Code

Identifying statistically significant gene sets based on differential expression and differential coexpression

특이발현과 특이공발현을 고려한 유의한 유전자 집단 탐색

  • Lee, Sunho (Division of Mathematics and Statistics, Sejong University)
  • 이선호 (세종대학교 수학통계학부)
  • Received : 2015.12.22
  • Accepted : 2016.02.29
  • Published : 2016.04.30

Abstract

Gene set analysis utilizing biologic information is expected to produce more interpretable results because the occurrence of tumors (or diseases) is believed to be associated with the regulation of related genes. Many methods have been developed to identify statistically significant gene sets across different phenotypes; however, most focus exclusively on either the differential gene expression or the differential correlation structure in the gene set. This research provides a new method that simultaneously considers the differential expression of genes and differential coexpression with multiple genes in the gene set. Application of this NEW method is illustrated with real microarray data example, p53; subsequently, a simulation study compares its type I error rate and power with GSEA, SAMGS, GSCA and GSNCA.

서로 상관있는 유전자들의 발현조절이 질병이나 종양의 발생에 영향을 미치기 때문에 단일유전자 분석 대신 공통의 생물학적 요소를 지닌 유전자 집단 분석이 각광을 받게 되었고 생물학적으로 좀더 설명하기 쉬운 결과를 얻게 되었다. 표현형에 따라 유의한 차이를 보이는 유전자 집단을 찾는 여러 방법들이 있지만, 대부분의 방법들이 집단에 속한 유전자들의 표현형에 따른 발현의 차이를 탐색하거나 유전자들 사이의 공발현 구조가 다른지 탐색하는 것이다. 본 연구에서는 특이발현과 특이공발현의 차이를 모두 고려하는 탐색방법을 제시하였고 p53이란 유전자 자료와 모의자료를 이용하여 제시한 방법의 성능을 알아 보았다.

Keywords

References

  1. Choi, Y. and Kendziorski, C. (2009). Statistical methods for gene set co-expression analysis, Bioinformatics, 25, 2780-2786. https://doi.org/10.1093/bioinformatics/btp502
  2. Dinu, I., Potter, J. D., Mueller, T., Liu, Q., Adewale, A. J., Jhangri, G. S., Einecke, G., Famulski, K. S., Halloran, P. and Yasui, Y. (2007). Improving gene set analysis of microarray data by SAM-GS, BMC Bioinformatics, 8, 242. https://doi.org/10.1186/1471-2105-8-242
  3. Draghici, S., Khatri, P., Martins, R. P., Ostermeier, G. C., and Krawetz, S. A. (2003). Global functional profiling of gene expression, Genomics, 81, 98-104. https://doi.org/10.1016/S0888-7543(02)00021-6
  4. Efron, B. and Tibshirani, R. (2007). On testing the significance of sets of genes, Annals of Applied Statistics, 1, 107-129. https://doi.org/10.1214/07-AOAS101
  5. Goeman, J., van de Geer, S., de Kort, F., and Houwelingen, H. (2004). A global test for groups of genes: testing association with a clinical outcome, Bioinformatics, 20, 93-99. https://doi.org/10.1093/bioinformatics/btg382
  6. Goeman, J., Oosting, J., Cleton-Jansen, A. M., Anninga, J. K., and van Houwelingen, H. C. (2005). Testing association of a pathway with survival using gene expression data, Bioinformatics, 21, 1950-1957. https://doi.org/10.1093/bioinformatics/bti267
  7. Jung, S. and Kim, S. (2014). EDDY: a novel statistical gene set test method to detect differential genetic dependencies, Nucleic Acids Research, 42, e60. https://doi.org/10.1093/nar/gku099
  8. Khatri, P., Bhavsar, P., Bawa, G., and Draghici, S. (2004). Onto-Tools: an ensemble of web-accessible, ontology-based tools for the functional design and interpretation of high-throughput gene expression experiments, Nucleic Acids Research, 32, 449-456.
  9. Kim, B. S., Jang, J. S., Kim, S. C., and Lim, J. (2009). A report on the inter-gene correlations in cDNA microarray data sets, The Korean Journal of Applied Statistics, 22, 617-626. https://doi.org/10.5351/KJAS.2009.22.3.617
  10. Kim, S. Y. and Volsky, D. (2005). PAGE: parametric analysis of gene set enrichment, BMC Bioinformatics, 6, 1471-2105.
  11. Klebanov, L. and Yakovlev, A. (2007). Diverse correlation structures in gene expression data and their utility in improving statistical inference, The Annals of Applied Statistics, 1, 538-559. https://doi.org/10.1214/07-AOAS120
  12. Lai, Y., Wu, B., Chen, L., Zhao, H. (2004). A statistical method for identifying differential gene-gene coexpression patterns, Bioinformatics, 20, 3146-3155. https://doi.org/10.1093/bioinformatics/bth379
  13. Lee, S. H., Lee, S. K., and Lee, K. H. (2009). Developing a parametric method for testing the significance of gene sets in microarray data analysis, Communications for Statistical Applications and Methods, 397-408. https://doi.org/10.5351/CKSS.2009.16.3.397
  14. Ma, H., Schadt, E. E., Kaplan, L. M., and Zhao, H. (2011). COSINE: condition-specific sub-network identification using a global optimization method, Bioinformatics, 27, 1290-1298. https://doi.org/10.1093/bioinformatics/btr136
  15. Maciejewski, H. (2014). Gene set analysis methods: statistical models and methodological differences, Briefings in Bioinformatics, 15, 504-518. https://doi.org/10.1093/bib/bbt002
  16. Meyer, C. (2001). Matrix Analysis and Applied Linear Algebra, Society for industrial and applied mathematics (SIAM), Philadelphia.
  17. Mootha, V. K., Lindgren, C. M., Eriksson, K. F., Subramanian, A., Sihag, S., Lehar, J., Puigserver, P., Carlsson, E., Ridderstrale, M., Laurila, E., Houstis, N., Daly, M. J., Patterson, N., Mesirov, J. P., Golub, T. R., Tamayo, P., Spiegelman, B., Lander, E. S., Hirschhorn, J. N., Altshuler, D., and Groop, L. C. (2003). PGC-1-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes, Nature Genetics, 34, 267-273. https://doi.org/10.1038/ng1180
  18. Newton, M. A., Quintana, F. A., den Boon, J. A. (2007). Random set methods identify distinct aspects of the enrichment signal in gene-set analysis, Annals of Applied Statistics, 1, 85-106. https://doi.org/10.1214/07-AOAS104
  19. Qui, X., Klebanov, L., and Yakovlev, A. (2005). Correlation between gene expression levels and limitations of the empirical Bayes methodology for finding differentially expressed genes, Statistical Applications in Genetics and Molecular Biology, 4, Ariticle 34.
  20. Rahmatallah, Y., Emmert-Streib, F. and Glazko, G. (2014). Gene sets net correlations analysis (GSNCA): a multivariate differential coexpression test for gene sets, Bioinformatics, 30, 360-368. https://doi.org/10.1093/bioinformatics/btt687
  21. Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., Paulovich, A., Pomeroy, S. L., Golub, T. R., Lander, E. S., and Mesirov, J. P. (2005). Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles, In Proceedings of the National Academy of Sciences, 102, 15545-15550. https://doi.org/10.1073/pnas.0506580102
  22. Tesson, B. M., Breitling, R., and Jansen, R. C. (2010). DiffCoEx: a simple and sensitive method to find differentially coexpressed gene modules, BMC Bioinformatics, 11, 497. https://doi.org/10.1186/1471-2105-11-497
  23. Tusher, V. G. (2001). Significance analysis of microarrays applied to the ionizing radiation response, In Proceedings of the National Academy of Sciences, 98, 5116-5121. https://doi.org/10.1073/pnas.091062498