DOI QR코드

DOI QR Code

A Study on Bias Effect on Model Selection Criteria in Graphical Lasso

  • Received : 2018.10.08
  • Accepted : 2018.11.12
  • Published : 2018.11.30

Abstract

Graphical lasso is one of the most popular methods to estimate a sparse precision matrix, which is an inverse of a covariance matrix. The objective function of graphical lasso imposes an ${\ell}_1$-penalty on the (vectorized) precision matrix, where a tuning parameter controls the strength of the penalization. The selection of the tuning parameter is practically and theoretically important since the performance of the estimation depends on an appropriate choice of tuning parameter. While information criteria (e.g. AIC, BIC, or extended BIC) have been widely used, they require an asymptotically unbiased estimator to select optimal tuning parameter. Thus, the biasedness of the ${\ell}_1$-regularized estimate in the graphical lasso may lead to a suboptimal tuning. In this paper, we propose a two-staged bias-correction procedure for the graphical lasso, where the first stage runs the usual graphical lasso and the second stage reruns the procedure with an additional constraint that zero estimates at the first stage remain zero. Our simulation and real data example show that the proposed bias correction improved on both edge recovery and estimation error compared to the single-staged graphical lasso.

Keywords

Acknowledgement

Supported by : INHA UNIVERSITY

References

  1. Jordan MI, Sejnowski TJ. Graphical models: foundations of neural computation. Computational neuroscience series. London: The MIT Press; 2001.
  2. Menendez P, Kourmpetis YAI, ter Braak CJF, van Eeuwijk FA. Gene regulatory networks from multifactorial perturbations using graphical Lasso: application to the DREAM4 challenge. Plos One 2010;5:e14147. https://doi.org/10.1371/journal.pone.0014147
  3. Oh JH, Deasy JO. Inference of radio-responsive gene regulatory networks using the graphical lasso algorithm. BMC Bioinformatics 2014;15:S5.
  4. Coloigner J, Phlypo R, Bush A, Lepore N, Wood J. Functional connectivity analysis for thalassemia disease based on a graphical lasso model. 2016 IEEE 13th I S Biomed Imaging; 2016.
  5. Kim J. Review of connectivity and dynamics of neural information processing. QBS 2017;36:97-103.
  6. Friedman J, Hastie T, Tibshirani R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 2008:9:432-441. https://doi.org/10.1093/biostatistics/kxm045
  7. Witten D, Friedman J, Simon N. New insights and faster computations for the graphical lasso. J Comput Graph Stat 2011;20:892-900. https://doi.org/10.1198/jcgs.2011.11051a
  8. Mazumder R, Hastie T. The graphical lasso: new insights and alternatives. Electron J Stat 2012;6:2125-2149. https://doi.org/10.1214/12-EJS740
  9. Meinshausen N, Buhlmann P. High-dimensional graphs and variable selection with the lasso. Ann Stat 2006;34:1436-1462. https://doi.org/10.1214/009053606000000281
  10. Peng J, Wang P, Zhou N, Zhu J. Partial correlation estimation by joint sparse regression models. J Am Stat Assoc 2009;104:735-746. https://doi.org/10.1198/jasa.2009.0126
  11. Khare K, Oh S-Y, Rajaratnam B. A convex pseudolikelihood framework for high dimensional partial correlation estimation with convergence guarantees. J Roy Stat Soc B 2015;77:803-825. https://doi.org/10.1111/rssb.12088
  12. Ali A, Khare K, Oh S-Y, Rajaratnam B. Generalized pseudolikelihood methods for inverse covariance estimation. Proc Mach Learn Res 2017;54:280-288.
  13. Cai T, Liu W, Luo X. A constrained l (1) minimization approach to sparse precision matrix estimation. J Am Stat Assoc 2011;106:594-607. https://doi.org/10.1198/jasa.2011.tm10155
  14. Cai T, Liu W, Zhou H. Estimating sparse precision matrix: optimal rates of convergence and adaptive estimation. Ann Stat 2016;44:455-488. https://doi.org/10.1214/13-AOS1171
  15. Danaher P, Wang P, Witten D. The joint graphical lasso for inverse covariance estimation across multiple classes. J Roy Stat Soc B 2014;76:373-397. https://doi.org/10.1111/rssb.12033
  16. Foygel R, Drton M. Extended Bayesian information criteria for Gaussian graphical models. Neural Inf Process S 2010.
  17. Guo J, Cheng J, Elizaveta L, George M, Zhu J. Estimating heterogeneous graphical models for discrete data with an application to roll call voting. Ann Appl Stat 2015;9:821-848. https://doi.org/10.1214/13-AOAS700
  18. Chen J, Chen Z. Extended Bayesian information criteria for model selection with large model spaces. Biometrika 2008;95:759-771. https://doi.org/10.1093/biomet/asn034
  19. Fan Y, Tang CY. Tuning parameter selection in high dimensional penalized likelihood. J Roy Stat Soc B 2013;75:531-552. https://doi.org/10.1111/rssb.12001
  20. Wang T, Zhu LX. Consistent tuning parameter selection in high dimensional sparse linear regression. J Multivariate Anal 2011;102:1141-1151. https://doi.org/10.1016/j.jmva.2011.03.007
  21. Belloni A, Chernozhukov V. Least squares after model selection in high-dimensional sparse models. Bernoulli 2013;19:521-547. https://doi.org/10.3150/11-BEJ410
  22. Barabasi A, Albert R. Emergence of scaling in random networks. Science 1999;286:509-512. https://doi.org/10.1126/science.286.5439.509
  23. Jemal A, Siegel R, Xu J, Ward E. Cancer statistics. Ca-cancer J Clin 2010;60:277-300. https://doi.org/10.3322/caac.20073
  24. Li X, Asmitananda T, Gao L, Gai D, Song Z, Zhang Y, et al. Biomarkers in the lung cancer diagnosis: a clinical perspective. Neoplasma 2012;59:500-507. https://doi.org/10.4149/neo_2012_064
  25. Tang H, Xiao G, Behrens C, Schiller J, Allen J, Chow CW, et al. A 12-gene set predicts survival benefits from adjuvant chemotherapy in non-small cell lung cancer patients. Clin Cancer Res 2013;19:1577-1586. https://doi.org/10.1158/1078-0432.CCR-12-2321
  26. Tomida S, Takeuchi T, Shimada Y, Arima C, Matsuo K, Mitsudomi T, et al. Relapse-related molecular signature in lung adenocarcinomas identifies patients with dismal prognosis. J Clin Oncol 2009;27:2793-2799. https://doi.org/10.1200/JCO.2008.19.7053
  27. Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 2003;19:185-193. https://doi.org/10.1093/bioinformatics/19.2.185
  28. Yu D, Son W, Lim J, Xiao G. Statistical completion of a partially identified graph with applications for the estimation of gene regulatory networks. Biostatistics 2015;16:670-685. https://doi.org/10.1093/biostatistics/kxv013
  29. Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A practical and powerful approach to multiple testing. J Roy Stat Soc B 1995;57:289-300.
  30. Pounds S, Morris SW. Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values. Bioinformatics 2003;19:1236-1242. https://doi.org/10.1093/bioinformatics/btg148
  31. Sun YB, Xu S. Expression of KISS1 and KISS1R (GPR54) may be used as favorable prognostic markers for patients with non-small cell lung cancer. Int J Oncol 2013;43:521-530. https://doi.org/10.3892/ijo.2013.1967
  32. Zhao YJ, Ju Q, Li GC. Tumor markers for hepatocellular carcinoma. Mol Clin Oncol 2013;1:593-598. https://doi.org/10.3892/mco.2013.119
  33. Vihinen P, Kahari VM. Matrix metalloproteinases in cancer: prognostic markers and therapeutic targets. Int J Cancer 2002;99:157-166. https://doi.org/10.1002/ijc.10329
  34. Yu H, Xu Q, Liu F, Ye X, Wang J, Meng X. Identification and validation of long noncoding RNA biomarkers in human non-small-cell lung carcinomas. J Thorac Oncol 2015;10:645-654. https://doi.org/10.1097/JTO.0000000000000470
  35. Mouallif M, Albert A, Zeddou M, Ennaji MM, Delvenne P, Guenin S. Expression profile of undifferentiated cell transcription factor 1 in normal and cancerous human epithelia. Int J Exp Pathol 2014;95:251-259. https://doi.org/10.1111/iep.12077