Browse > Article
http://dx.doi.org/10.14348/molcells.2019.0065

NGSEA: Network-Based Gene Set Enrichment Analysis for Interpreting Gene Expression Phenotypes with Functional Gene Sets  

Han, Heonjong (Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University)
Lee, Sangyoung (Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University)
Lee, Insuk (Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University)
Abstract
Gene set enrichment analysis (GSEA) is a popular tool to identify underlying biological processes in clinical samples using their gene expression phenotypes. GSEA measures the enrichment of annotated gene sets that represent biological processes for differentially expressed genes (DEGs) in clinical samples. GSEA may be suboptimal for functional gene sets; however, because DEGs from the expression dataset may not be functional genes per se but dysregulated genes perturbed by bona fide functional genes. To overcome this shortcoming, we developed network-based GSEA (NGSEA), which measures the enrichment score of functional gene sets using the expression difference of not only individual genes but also their neighbors in the functional network. We found that NGSEA outperformed GSEA in identifying pathway gene sets for matched gene expression phenotypes. We also observed that NGSEA substantially improved the ability to retrieve known anti-cancer drugs from patient-derived gene expression data using drug-target gene sets compared with another method, Connectivity Map. We also repurposed FDA-approved drugs using NGSEA and experimentally validated budesonide as a chemical with anti-cancer effects for colorectal cancer. We, therefore, expect that NGSEA will facilitate both pathway interpretation of gene expression phenotypes and anti-cancer drug repositioning. NGSEA is freely available at www.inetbio.org/ngsea.
Keywords
drug repositioning; gene network; gene set enrichment analysis; network-based analysis; pathway analysis;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Alexeyenko, A., Lee, W., Pernemalm, M., Guegan, J., Dessen, P., Lazar, V., Lehtio, J., and Pawitan, Y. (2012). Network enrichment analysis: extension of gene-set enrichment analysis to gene networks. BMC Bioinformatics 13, 226.   DOI
2 Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al. (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25-29.   DOI
3 Barrett, T., Wilhite, S.E., Ledoux, P., Evangelista, C., Kim, I.F., Tomashevsky, M., Marshall, K.A., Phillippy, K.H., Sherman, P.M., Holko, M., et al. (2013). NCBI GEO: archive for functional genomics data sets--update. Nucleic Acids Res. 41, D991-D995.   DOI
4 Davis, A.P., Grondin, C.J., Johnson, R.J., Sciaky, D., King, B.L., McMorran, R., Wiegers, J., Wiegers, T.C., and Mattingly, C.J. (2017). The Comparative Toxicogenomics Database: update 2017. Nucleic Acids Res. 45 , D972-D978.   DOI
5 de Leeuw, C.A., Neale, B.M., Heskes, T., and Posthuma, D. (2016). The statistical properties of gene-set analysis. Nat. Rev. Genet. 17, 353-364.   DOI
6 Draghici, S., Khatri, P., Tarca, A.L., Amin, K., Done, A., Voichita, C., Georgescu, C., and Romero, R. (2007). A systems biology approach for pathway level analysis. Genome Res. 17, 1537-1545.   DOI
7 Glaab, E., Baudot, A., Krasnogor, N., Schneider, R., and Valencia, A. (2012). EnrichNet: network-based gene set enrichment analysis. Bioinformatics 28, i451-i457.   DOI
8 Huang da, W., Sherman, B.T., and Lempicki, R.A. (2009). Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44-57.   DOI
9 Hwang, S., Kim, C.Y., Yang, S., Kim, E., Hart, T., Marcotte, E.M., and Lee, I. (2019). HumanNet v2: human gene networks for disease research. Nucleic Acids Res. 47, D573-D580.   DOI
10 Irizarry, R.A., Wang, C., Zhou, Y., and Speed, T.P. (2009). Gene set enrichment analysis made simple. Stat. Methods Med. Res. 18, 565-575.   DOI
11 Jensen, M.A., Ferretti, V., Grossman, R.L., and Staudt, L.M. (2017). The NCI Genomic Data Commons as an engine for precision medicine. Blood 130, 453-459.   DOI
12 Jiang, P., Wang, H., Li, W., Zang, C., Li, B., Wong, Y.J., Meyer, C., Liu, J.S., Aster, J.C., and Liu, X.S. (2015). Network analysis of gene essentiality in functional genomics experiments. Genome Biol. 16, 239.   DOI
13 Kim, H., Joe, A., Lee, M., Yang, S., Ma, X., Ronald, P.C., and Lee, I. (2019). A genome-scale co-functional network of xanthomonas genes can accurately reconstruct regulatory circuits controlled by two-component signaling systems. Mol. Cells 42, 166-174.   DOI
14 Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y., and Morishima, K. (2017). KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45, D353-D361.   DOI
15 Kim, E., Hwang, S., Kim, H., Shim, H., Kang, B., Yang, S., Shim, J.H., Shin, S.Y., Marcotte, E.M., and Lee, I. (2016). MouseNet v2: a database of gene networks for studying the laboratory mouse and eight other model vertebrates. Nucleic Acids Res. 44, D848-D854.   DOI
16 Kim, E. and Lee, I. (2017). Network-based gene function prediction in mouse and other model vertebrates using mousenet server. Methods Mol. Biol. 1611, 183-198.   DOI
17 Lamb, J., Crawford, E.D., Peck, D., Modell, J.W., Blat, I.C., Wrobel, M.J., Lerner, J., Brunet, J.P., Subramanian, A., Ross, K.N., et al. (2006). The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science 313, 1929-1935.   DOI
18 Liberzon, A., Subramanian, A., Pinchback, R., Thorvaldsdottir, H., Tamayo, P., and Mesirov, J.P. (2011). Molecular signatures database (MSigDB) 3.0. Bioinformatics 27, 1739-1740.   DOI
19 McCormack, T., Frings, O., Alexeyenko, A., and Sonnhammer, E.L. (2013). Statistical assessment of crosstalk enrichment between gene groups in biological networks. PLoS One 8, e54945.   DOI
20 Nitsch, D., Tranchevent, L.C., Thienpont, B., Thorrez, L., Van Esch, H., Devriendt, K., and Moreau, Y. (2009). Network analysis of differential expression for the identification of disease-causing genes. PLoS One 4, e5526.   DOI
21 Pletscher-Frankild, S., Palleja, A., Tsafou, K., Binder, J.X., and Jensen, L.J. (2015). DISEASES: text mining and data integration of disease-gene associations. Methods 74, 83-89.   DOI
22 Papatheodorou, I., Fonseca, N.A., Keays, M., Tang, Y.A., Barrera, E., Bazant, W., Burke, M., Fullgrabe, A., Fuentes, A.M., George, N., et al. (2018). Expression Atlas: gene and protein expression across multiple studies and organisms. Nucleic Acids Res. 46, D246-D251.   DOI
23 Pavlidis, P., Qin, J., Arango, V., Mann, J.J., and Sibille, E. (2004). Using the gene ontology for microarray data mining: a comparison of methods and application to age effects in human prefrontal cortex. Neurochem. Res. 29, 1213-1222.   DOI
24 Pinero, J., Bravo, A., Queralt-Rosinach, N., Gutierrez-Sacristan, A., Deu-Pons, J., Centeno, E., Garcia-Garcia, J., Sanz, F., and Furlong, L.I. (2017). DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Res. 45, D833-D839.   DOI
25 Quinn, B.J., Kitagawa, H., Memmott, R.M., Gills, J.J., and Dennis, P.A. (2013). Repositioning metformin for cancer prevention and treatment. Trends Endocrinol. Metab. 24, 469-480.   DOI
26 Saxena, V., Orgill, D., and Kohane, I. (2006). Absolute enrichment: gene set enrichment analysis for homeostatic systems. Nucleic Acids Res. 34, e151.   DOI
27 Subramanian, A., Narayan, R., Corsello, S.M., Peck, D.D., Natoli, T.E., Lu, X., Gould, J., Davis, J.F., Tubelli, A.A., Asiedu, J.K., et al. (2017). A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell 171, 1437-1452.e17.   DOI
28 Tarca, A.L., Bhatti, G., and Romero, R. (2013). A comparison of gene set analysis methods in terms of sensitivity, prioritization and specificity. PLoS One 8, e79217.   DOI
29 Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., et al. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U. S. A. 102, 15545-15550.   DOI
30 Szklarczyk, D., Morris, J.H., Cook, H., Kuhn, M., Wyder, S., Simonovic, M., Santos, A., Doncheva, N.T., Roth, A., Bork, P., et al. (2017). The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible. Nucleic Acids Res. 45, D362-D368.   DOI
31 Wang, P.I., Hwang, S., Kincaid, R.P., Sullivan, C.S., Lee, I., and Marcotte, E.M. (2012). RIDDLE: reflective diffusion and local extension reveal functional associations for unannotated gene sets via proximity in a gene network. Genome Biol. 13, R125.   DOI
32 Yoo, M., Shin, J., Kim, J., Ryall, K.A., Lee, K., Lee, S., Jeon, M., Kang, J., and Tan, A.C. (2015). DSigDB: drug signatures database for gene set analysis. Bioinformatics 31, 3069-3071.   DOI