Browse > Article
http://dx.doi.org/10.29220/CSAM.2022.29.6.655

Estimation of high-dimensional sparse cross correlation matrix  

Yin, Cao (Department of Statistics, Seoul National University)
Kwangok, Seo (Department of Statistics, Seoul National University)
Soohyun, Ahn (Department of Mathematics, Ajou University)
Johan, Lim (Department of Statistics, Seoul National University)
Publication Information
Communications for Statistical Applications and Methods / v.29, no.6, 2022 , pp. 655-664 More about this Journal
Abstract
On the motivation by an integrative study of multi-omics data, we are interested in estimating the structure of the sparse cross correlation matrix of two high-dimensional random vectors. We rewrite the problem as a multiple testing problem and propose a new method to estimate the sparse structure of the cross correlation matrix. To do so, we test the correlation coefficients simultaneously and threshold the correlation coefficients by controlling FRD at a predetermined level α. Further, we apply the proposed method and an alternative adaptive thresholding procedure by Cai and Liu (2016) to the integrative analysis of the protein expression data (X) and the mRNA expression data (Y) in TCGA breast cancer cohort. By varying the FDR level α, we show that the new procedure is consistently more efficient in estimating the sparse structure of cross correlation matrix than the alternative one.
Keywords
integrative analysis; local false discovery rate; multiple testing; multi-omics data;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Benjamini Y and Hochberg Y (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing, Journal of the Royal Statistical Society, Series B, 57, 289-300.
2 Benjamini Y and Yekutieli D (2001). The control of the false discovery rate in multiple testing under dependency, The Annals of Statistics, 29, 1165-1188.   DOI
3 Bennett CM, Wolford GL, and Miller MB (2009). The principled control of false positives in neuroimaging, Social Cognitive and Affecive Neuroscience, 4, 417-422.   DOI
4 Bickel P and Levina E (2008). Covariance regularization by thresholding, The Annals of Statistics, 36, 2577-2604.   DOI
5 Cai T and Liu W (2011). Adaptive thresholding for sparse covariance matrix estimation, Journal of the American Statistical Association, 106, 672-684.   DOI
6 Cai T and Liu W (2016). Large-scale multiple testing of correlations, Journal of the American Statistical Association, 111, 229-240.   DOI
7 Cheng J, Kapranov P, Drenkow J, and Dike S (2005). Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution, Science, 308, 1149-1154.   DOI
8 Dubois PC, Trynka G, Franke L et al. (2010).Multiple common variants for celiac disease influencing immune gene expression, Nature Genetics, 42, 295-302.   DOI
9 Efron B and Tibshirani R (2002). Empirical Bayes methods and false discovery rates for microarrays, Genetic Epidemiology, 23, 70-86.   DOI
10 Efron B (2004). Large-scale simultaneous hypothesis testing: The choice of a null hypothesis, Journal of the American Statistical Association, 99, 96-104.   DOI
11 Elliott P and Wartenberg D (2004). Review spatial epidemiology: Current approaches and future challenges, Environmental Health Perspectives, 112, 998-1006.   DOI
12 Fan J, Fan Y, and Lv J (2008). High dimensional covariance matrix estimation using a factor model, Journal of Econometrics, 147, 186-197.   DOI
13 Fan J, Han X, and Gu W (2012). Estimating false discovery proportion under arbitrary covariance dependence, Journal of the American Statistical Association, 107, 1019-1035.   DOI
14 Han H, Shim H, Shin D, et al. (2015). TRRUST: A reference database of human transcriptional regulatory interactions, Scientific Reports, 5, 11432.
15 Huttlin EL, Ting L, Bruckner RJ, et al. (2015). The bioplex network: A systematic exploration of the human interactome, Cell, 162, 425-440.   DOI
16 Jaeger J, Sengupta R, and Ruzzo WL (2003). Improved gene selection for classification of microarrays, Pacific Symposium on Biocomputing, 8, 53-64.
17 Liu W (2013). Gaussian graphical model estimation with false discovery rate control, The Annals of Statistics, 41, 2948-2978.
18 Razick S, Magklaras G, and Donaldson IM (2008). IRefIndex: A consolidated protein interaction database with provenance, BMC Bioinformatics, 9, 405.
19 Rosato A, Tenori L, Cascante M, De Atauri Carulla PR, Martins Dos Santos VA, and Saccenti E (2018). From correlation to causation: Analysis of metabolomics data using systems biology approaches, Metabolomics, 14, 37.
20 Shaw P, Greenstein D, Lerch J, et al. (2006). Intellectual ability and cortical development in children and adolescents, Nature, 440, 676-679.   DOI
21 Xia Y, Cai T, and Cai TT (2015). Testing differential networks with applications to detecting geneby-gene interactions, Biometrika, 102, 247-266.   DOI
22 Shedden K and Taylor J (2005). Differential correlation detects complex associations between gene expression and clinical outcomes in lung adenocarcinomas, Methods of Microarray Data Analysis, (pp. 121-131), Springer, Boston.
23 Storey JD (2002). A direct approach to false discovery rates, Journal of the Royal Statistical Society, Series B, 64, 479-498.   DOI
24 Wang W and Fan J (2017). Asymptotics of empirical eigenstructure for high dimensional spiked covariance, The Annals of Statistics, 45, 1342-1374.
25 Yu D, Lee SH, Lim J, Xiao G, Craddock RC, and Biswal BB (2018). Fused lasso regression for identifying differential correlations in brain connectome graphs, Statistical Analysis and Data Mining, 11, 203-226.   DOI
26 Zhao F, Xuan Z, Liu L, and Zhang MQ (2005). TRED: A Transcriptional Regulatory Element Database and a platform for in silico gene regulation studies, Nucleic Acids Research, 33, D103-D107.   DOI
27 Zheng G, Tu K, Yang Q, Xiong Y, Wei C, Xie L, Zhu Y, and Li Y (2008). ITFP: An integrated platform of mammalian transcription factors, Bioinformatics, 24, 2416-2417.   DOI