DOI QR코드

DOI QR Code

Comparison of covariance thresholding methods in gene set analysis

  • Park, Sora (Department of Statistics, Pusan National University) ;
  • Kim, Kipoong (Department of Statistics, Pusan National University) ;
  • Sun, Hokeun (Department of Statistics, Pusan National University)
  • 투고 : 2022.03.22
  • 심사 : 2022.05.09
  • 발행 : 2022.09.30

초록

In gene set analysis with microarray expression data, a group of genes such as a gene regulatory pathway and a signaling pathway is often tested if there exists either differentially expressed (DE) or differentially co-expressed (DC) genes between two biological conditions. Recently, a statistical test based on covariance estimation have been proposed in order to identify DC genes. In particular, covariance regularization by hard thresholding indeed improved the power of the test when the proportion of DC genes within a biological pathway is relatively small. In this article, we compare covariance thresholding methods using four different regularization penalties such as lasso, hard, smoothly clipped absolute deviation (SCAD), and minimax concave plus (MCP) penalties. In our extensive simulation studies, we found that both SCAD and MCP thresholding methods can outperform the hard thresholding method when the proportion of DC genes is extremely small and the number of genes in a biological pathway is much greater than a sample size. We also applied four thresholding methods to 3 different microarray gene expression data sets related with mutant p53 transcriptional activity, and epithelium and stroma breast cancer to compare genetic pathways identified by each method.

키워드

과제정보

This work was supported by a 2-Year Research Grant of Pusan National University

참고문헌

  1. Bickel PJ and Levina E (2008). Covariance regularization by thresholding, Annals of Statistics, 36, 2577-2604. https://doi.org/10.1214/08-AOS600
  2. Choi Y and Kendziorski C (2009). Statistical methods for gene set co-expression analysis, Bioinformatics, 25, 2780-2786. https://doi.org/10.1093/bioinformatics/btp502
  3. Dinu I, Potter JD, Mueller T, et al. (2007). Improving gene set analysis of microarray data by SAMGS, BMC Bioinformatics, 8, 242.
  4. Fan J and Li R (2001). Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association, 96, 1348-1360. https://doi.org/10.1198/016214501753382273
  5. Fan J, Feng Y, and Wu Y (2009). Network exploration via the adaptive Lasso and SCAD penalties, Annals of Applied Statistics, 3, 521-541. https://doi.org/10.1214/08-AOAS215
  6. Goeman JJ and Buhlmann P (2007). Analyzing gene expression data in terms of gene sets: methodological issues, Bioinformatics, 23, 980-987. https://doi.org/10.1093/bioinformatics/btm051
  7. Hsueh H and Tsai C (2016). Gene set analysis using sufficient dimension reduction, BMC Bioinformatics, 17, 74.
  8. Oh M, Kim K, and Sun H (2020). Covariance thresholding to detect differentially co-expressed genes from microarray gene expression data, Journal of Bioinformatics and Computational Biology, 18, 2050002.
  9. Rahmatallah Y, Emmert-Streib F, and Glazko G (2014). Gene Sets Net Correlations Analysis (GSNCA): a multivariate di erential coexpression test for gene sets, Bioinformatics, 30, 360-368. https://doi.org/10.1093/bioinformatics/btt687
  10. Rothman AJ, Levina E, and Zhu J (2009). Generalized thresholding of large covariance matrix, Journal of the American Statistical Association, 104, 177-186. https://doi.org/10.1198/jasa.2009.0101
  11. Subramanian A, Tamayo P, Mootha VK, et al. (2005). Gene set enrichment analysis: A knowledgebased approach for interpreting genome-wide expression profiles, National Academy of Sciences of the United States of America, 102, 15545-15550. https://doi.org/10.1073/pnas.0506580102
  12. Wu D, Lim E, Vaillant F, Asselin-Labat M-L, Visvader JE, Smyth GK (2010). ROAST: rotation gene set tests for complex microarray experiments, Bioinformatics, 26, 2176-2182. https://doi.org/10.1093/bioinformatics/btq401
  13. Zhang C (2010). Nearly unbaised variable selection under minmax concave penalty, Annals of Statistics, 38, 894-942. https://doi.org/10.1214/09-AOS729
  14. Zou H and Li R (2008). One-step sparse estimates in nonconcave penalized likelihood models, Annals of Statistics, 36, 1509-1533. https://doi.org/10.1214/009053607000000802