[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.29220/CSAM.2020.27.5.535

Selection probability of multivariate regularization to identify pleiotropic variants in genetic association studies

Kim, Kipoong (Department of Statistics, Pusan National University)
Sun, Hokeun (Department of Statistics, Pusan National University)

Publication Information

Communications for Statistical Applications and Methods / v.27, no.5, 2020 , pp. 535-546 More about this Journal

Abstract

In genetic association studies, pleiotropy is a phenomenon where a variant or a genetic region affects multiple traits or diseases. There have been many studies identifying cross-phenotype genetic associations. But, most of statistical approaches for detection of pleiotropy are based on individual tests where a single variant association with multiple traits is tested one at a time. These approaches fail to account for relations among correlated variants. Recently, multivariate regularization methods have been proposed to detect pleiotropy in analysis of high-dimensional genomic data. However, they suffer a problem of tuning parameter selection, which often results in either too many false positives or too small true positives. In this article, we applied selection probability to multivariate regularization methods in order to identify pleiotropic variants associated with multiple phenotypes. Selection probability was applied to individual elastic-net, unified elastic-net and multi-response elastic-net regularization methods. In simulation studies, selection performance of three multivariate regularization methods was evaluated when the total number of phenotypes, the number of phenotypes associated with a variant, and correlations among phenotypes are different. We also applied the regularization methods to a wild bean dataset consisting of 169,028 variants and 17 phenotypes.

Keywords

pleiotropy; regularization methods; selection probability; high-dimensional data;

Citations & Related Records

Reference

1	Bhattacharjee S, Rajaraman P, Jacobs KB, et al. (2012). A subset-based approach improves power and interpretation for the combined analysis of genetic association studies of heterogeneous traits, American Journal of Human Genetics, 90, 821-835. DOI
2	Broadaway KA, Cutler DJ, Duncan R, et al. (2016). A statistical approach for testing cross-phenotype effects of rare variants, American Journal of Human Genetics, 98, 525-540. DOI
3	Choi J, Kim K, and Sun H (2018). New variable selection strategy for analysis of high-dimensional DNA methylation data, Journal of Bioinformatics and Computational Biology, 16, 1850010. DOI
4	Foulkes AS (2009). Applied Statistical Genetics with R, Springer-Verlag, New York.
5	Kim K and Sun H (2019). Incorporating genetic networks into case-control association studies with high-dimensional DNA methylation data, BMC Bioinformatics, 20, 510. DOI
6	Li Y, Nan B, and Zhu J (2015). Multivariate sparse group lasso for the multivariate multiple linear regression with an arbitrary group structure, Biometrics, 71, 354-363. DOI
7	Lin Z and Lin X (2018). Multiple phenotype association tests using summary statistics in genome-wide association studies, Biometrics, 74, 165-175. DOI
8	Lipka AE, Tian F, Wang Q, et al. (2012). GAPIT: genome association and prediction integrated tool, Bioinformatics, 28, 2397-2399. DOI
9	Meinshausen N and Buhlmann P (2010). Stability selection, Journal of the Royal Statistical Society Series B, 72, 417-473. DOI
10	Schaid DJ, Tong X, Larrabee B, Kennedy RB, Poland GA, and Sinnwell JP (2016). Statistical methods for testing genetic pleiotropy, Genetics, 204, 483-497. DOI
11	Simon N, Friedman J, and Hastie T (2013a). A blockwise descent algorithm for group-penalized multiresponse and multinomial regression, arXiv preprint arXiv:1311.6529.
12	Simon N, Friedman J, Hastie T, and Tibshirani R (2013b). A sparse-group lasso, Journal of Computational and Graphical Statistics, 22, 231-245. DOI
13	Solovieff N, Cotsapas C, Lee PH, Purcell SM, and Smoller JW (2013). Pleiotropy in complex traits: challenges and strategies, Nature Reviews Genetics, 14, 483-495. DOI
14	Sun H and Wang S (2012). Penalized logistic regression for high-dimensional DNA methylation data analysis with case-control studies, Bioinformatics, 28, 1368-1375. DOI
15	Sun H and Wang S (2013). Network-based regularization for matched case-control analysis of high-dimensional DNA methylation data, Statistics in Medicine, 32, 2127-2139. DOI
16	Sun H, Wang Y, Chen Y, Li Y, and Wang S (2017). pETM: a penalized Exponential Tilt Model for analysis of correlated high-dimensional DNA methylation data, Bioinformatics, 33, 1765-1772. DOI
17	van der Sluis S, Posthuma D, and Dolan CV (2013). TATES: efficient multivariate genotype-phenotype analysis for genome-wide association studies, PLoS Genetics, 9, e1003235. DOI
18	Wu B and Pankow JS (2016). Sequence kernel association test of multiple continuous phenotypes, Genetic Epidemiology, 40, 91-100. DOI
19	Wu T, Chen Y, Hastie T, Sobel E, and Lange K (2009). Genome-wide association analysis by lasso penalized logistic regression, Bioinformatics, 25, 714-721. DOI
20	Yuan M and Lin Y (2006). Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society Series B, 68, 49-67. DOI
21	Zou H and Hastie T (2005). Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society Series B, 67, 301-320. DOI
22	Zhou H, Sehl M, Sinsheimer J, and Lange K (2010). Association screening of common and rare genetic variants by penalized regression, Bioinformatics, 26, 2375-2382. DOI
23	Alexander D and Lange K (2011). Stability selection for genome-wide association, Genetic Epidemiology, 35, 722-728. DOI