Browse > Article
http://dx.doi.org/10.29220/CSAM.2022.29.1.065

Grid-based Gaussian process models for longitudinal genetic data  

Chung, Wonil (Department of Statistics and Actuarial Science, Soongsil University)
Publication Information
Communications for Statistical Applications and Methods / v.29, no.1, 2022 , pp. 65-83 More about this Journal
Abstract
Although various statistical methods have been developed to map time-dependent genetic factors, most identified genetic variants can explain only a small portion of the estimated genetic variation in longitudinal traits. Gene-gene and gene-time/environment interactions are known to be important putative sources of the missing heritability. However, mapping epistatic gene-gene interactions is extremely difficult due to the very large parameter spaces for models containing such interactions. In this paper, we develop a Gaussian process (GP) based nonparametric Bayesian variable selection method for longitudinal data. It maps multiple genetic markers without restricting to pairwise interactions. Rather than modeling each main and interaction term explicitly, the GP model measures the importance of each marker, regardless of whether it is mostly due to a main effect or some interaction effect(s), via an unspecified function. To improve the flexibility of the GP model, we propose a novel grid-based method for the within-subject dependence structure. The proposed method can accurately approximate complex covariance structures. The dimension of the covariance matrix depends only on the number of fixed grid points although each subject may have different numbers of measurements at different time points. The deviance information criterion (DIC) and the Bayesian predictive information criterion (BPIC) are proposed for selecting an optimal number of grid points. To efficiently draw posterior samples, we combine a hybrid Monte Carlo method with a partially collapsed Gibbs (PCG) sampler. We apply the proposed GP model to a mouse dataset on age-related body weight.
Keywords
Bayesian; longitudinal; Gaussian process; hybrid Monte Carlo; PCG Sampler;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 Chung WI, Chen J, Turman C, Lindstrom S, Zhu Z, Loh PR, Kraft P, and Liang L (2019). Efficient cross-trait penalized regression increases prediction accuracy in large cohorts using secondary phenotypes. Nature Communications, 10, 1-11.   DOI
2 Churchill GA and Doerge RW (1994). Empirical threshold values for quantitative trait mapping. Genetics, 138, 963-971.   DOI
3 Cockerham CC (1954). An extension of the concept of partitioning hereditary variance for analysis of covariances among relatives when epistasis is present. Genetics, 39.
4 Kamatani Y, Matsuda K, Okada Y, Kubo M, Hosono N, Daigo Y, Nakamura Y, and Kamatani N (2010). Genome-wide association study of hematological and biochemical traits in a Japanese population. Nature Genetics, 42, 210-215.   DOI
5 Kao CH and Zeng ZB (2002). Modeling epistasis of quantitative trait loci using Cockerham's model. Genetics, 160, 1243-1261.   DOI
6 Kearsey MJ and Pooni HS (1998). The Genetical Analysis of Quantitative Traits, Stanley Thornes (Publishers) Ltd.
7 Lander ES and Botstein D (1989). Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics, 121, 185-199.   DOI
8 Lee SG, Emond MJ, and Bamshad MJ, et al. (2012). Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. American Journal of Human Genetics.
9 Li B and Leal SM (2008). Methods for detecting associations with rare variants for common diseases: Application to analysis of sequence data. American Journal of Human Genetics, 83, 311-321.   DOI
10 Lippert C, Listgarten J, Liu Y, Kadie CM, Davidson RI, and Heckerman D (2011). Fast linear mixed models for genome-wide association studies. Nature Methods, 8, 833-835.   DOI
11 Ma CX, Casella G, and Wu R (2002). Functional mapping of quantitative trait loci underlying the character process: A theoretical framework. Genetics, 161, 1751-1762.   DOI
12 Madsen BE and Browning SR (2009). A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genetics, 5.
13 Mather K (1967). Complementary and duplicate gene interactions in biometrical genetics. Heredity, 22.
14 Neale BM, Rivas MA, and Voight BF, et al. (2011). Testing for an unusual distribution of rare variants. PLoS Genetics, 7.
15 Mather K and Jinks JL (1977). Introduction to Biometrical Genetics, Cambridge University Press.
16 Mei H, Chen W, and Jiang F, et al. (2012). Longitudinal replication studies of GWAS risk SNPs influencing body mass index over the course of childhood and adulthood. PLoS ONE, 7.
17 Morris AP and Zeggini E (2010). An evaluation of statistical approaches to rare variant analysis in genetic association studies. Genetic Epidemiology, 34, 188-193.   DOI
18 Pitman WA, Korstanje R, and Churchill GA, et al. (2002). Quantitative trait locus mapping of genes that regulate HDL cholesterol in SM/J and NZB/B1NJ inbred mice. Physiological Genomics, 9, 93-102.   DOI
19 Price AL, Kryukov GV, de Bakker PIW, Purcell SM, Staples J, Wei LJ, and Sunyaev SR (2010). Pooled association tests for rare variants in exon-resequencing studies. American Journal of Human Genetics, 86, 832-838.   DOI
20 Sabatti C, Service SK, and Hartikainen A-L, et al. (2009). Genome-wide association analysis of metabolic traits in a birth cohort from a founder population. Nature Genetics, 41, 35-46.   DOI
21 Sax K (1923). The association of size differences with seed-coat pattern and pigmentation in Phaseolus vulgaris. Genetics, 8, 552.   DOI
22 Schaffner SF, Foo C, Gabriel S, Reich D, Daly MJ, and Altshuler D (2005). Calibrating a coalescent simulation of human genome sequence variation. Genome Research, 15, 1576-1583.   DOI
23 Strachan DP, Rudnicka AR, and Power C, et al. (2007). Lifecourse influences on health among British adults: effects of region of residence in childhood and adulthood. International journal of epidemiology, 36, 522-531.   DOI
24 Young AI (2019). Solving the missing heritability problem. PLoS genetics, 15.
25 Wu WR, Li WM, Tang DZ, Lu HR, and Worland AJ (1999). Time-related mapping of quantitative trait loci underlying tiller number in rice. Genetics, 151, 297-303.   DOI
26 Ning C, Kang HM, Zhou L, Wang D, Wang H, Wang A, Fu J, Zhang S, and Liu J (2017). Performance gains in genome-wide association studies for longitudinal traits via modeling time-varied effects. Scientific reports, 7, 1-12.   DOI
27 Yang R, Tian Q, and Xu S (2006). Mapping quantitative trait loci for longitudinal traits in line crosses. Genetics, 173, 2339-2356.   DOI
28 Zhou X and Stephens M (2012). Genome-wide efficient mixed-model analysis for association studies. Nature Genetics, 44, 821-824.   DOI
29 Morgenthaler S and Thilly WG (2007). A strategy to discover genes that carry multi-allelic or monoallelic risk for common diseases: A cohort allelic sums test (CAST). Mutation Research/Fund amental and Molecular Mechanisms of Mutagenesis, 615, 28-56.   DOI
30 Purcell S, Neale B, and Todd-Brown K, et al. (2007). PLINK: A tool set for whole-genome association and population-based linkage analyses. American Journal of Human Genetics, 81, 559-575.   DOI
31 Shang J, Zhang JY, Lei XJ, Zhao WY, and Dong YF (2013). EpiSIM: simulation of multiple epistasis, linkage disequilibrium patterns and haplotype blocks for genome-wide interaction analysis. Genes & Genomics, 1-12.
32 Das K, Li JA, Fu GF, Wang Z, Li RZ, and Wu RL (2013). Dynamic semiparametric Bayesian models for genetic mapping of complex trait with irregular longitudinal data. Statistics in Medicine, 32, 509-523.   DOI
33 Kao CH and Zeng ZB (1997). General formulas for obtaining the MLEs and the asymptotic variance-covariance matrix in mapping quantitative trait loci when using the EM algorithm. Biometrics, 653-665.
34 Zeng ZB (1993). Theoretical basis for separation of multiple linked gene effects in mapping quantitative trait loci. Proceedings of the National Academy of Sciences, 90, 10972-10976.   DOI
35 Burton PR, Clayton DG, and Cardon LR, et al. (2007). Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature, 447, 661-678.   DOI
36 Chung WI and Zou F (2014). Mixed-effects models for GAW18 longitudinal blood pressure data. BMC Proceedings, 8, S87.
37 Crow JF and Kimura M (1970). An Introduction to Population Genetics Theory, New York, Evanston and London: Harper & Row, Publishers.
38 Derkach A, Lawless JF, and Sun L (2013). Robust and powerful tests for rare variants using fisher's method to combine evidence of association from two or more complementary tests. Genetic Epidemiology, 37, 110-121.   DOI
39 DeWan AT (2018). Gene-gene and gene-environment interactions. Genetic Epidemiology, 89-110.
40 Dupuis J and Siegmund D (1999). Statistical methods for mapping quantitative trait loci from a dense set of markers. Genetics, 151, 373-386.   DOI
41 Fisher RA (1918). The correlation between relatives on the supposition of Mendelian inheritance. Transactions of the Royal Society of Edinburgh, 52, 399-433.   DOI
42 Furlotte NA, Eskin E, and Eyheramendy S (2012). Genome-wide association mapping with longitudinal data. Genetic Epidemiology, 36, 463-471.   DOI
43 Kang HM, Sul JH, Service SK, Zaitlen NA, Kong SY, Freimer NB, Sabatti C, and Eskin E (2010). Variance component model to account for sample structure in genome-wide association studies. Nature Genetics, 42, 348-354.   DOI
44 Gibson G (2012). Rare and common variants: twenty arguments. Nature Reviews Genetics, 13, 135-145.   DOI
45 Haley CS and Knott SA (1992). A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity, 69.
46 Das K, Li JA, and Wang Z, et al. (2011). A dynamic model for genome-wide association studies. Human Genetics, 129, 629-639.   DOI
47 Kang HM, Zaitlen NA, Wade CM, Kirby A, Heckerman D, Daly MJ, and Eskin E (2008). Efficient control of population structure in model organism association mapping. Genetics, 178, 1709-1723.   DOI
48 Smith EN, Chen W, and Kahonen M, et al. (2010). Longitudinal genome-wide association of cardiovascular disease risk factors in the Bogalusa heart study. PLoS Genetics, 6.
49 VanLiere JM and Rosenberg NA (2008). Mathematical properties of the r2 measure of linkage disequilibrium. Theoretical Population Biology, 74, 130-137.   DOI
50 Wu MC, Lee SG, Cai T, Li Y, Boehnke M, and Lin X (2011). Rare-variant association testing for sequencing data with the sequence kernel association test. American Journal of Human Genetics, 89, 82-93.   DOI
51 Wu W, Zhou Y, Li W, Mao D, and Chen Q (2002). Mapping of quantitative trait loci based on growth models. TAG Theoretical and Applied Genetics, 105, 1043-1049.   DOI
52 Yandell BS, Mehta T, and Banerjee S, et al. (2007). R/qtlbim: QTL with Bayesian interval mapping in experimental crosses. Bioinformatics, 23, 641-643.   DOI
53 Kao CH, Zeng ZB, and Teasdale RD (1999). Multiple interval mapping for quantitative trait loci. Genetics, 152, 1203-1216.   DOI
54 Chen Z and Dunson DB (2003). Random effects selection in linear mixed models. Biometrics, 59, 762-769.   DOI
55 Gilmour AR, Thompson R, and Cullis BR (1995). Average information REML: An efficient algorithm for variance parameter estimation in linear mixed models. Biometrics, 1440-1450.
56 Yap JS, Fan J, and Wu R (2009). Nonparametric modeling of longitudinal covariance structure in functional mapping of quantitative trait loci. Biometrics, 65, 1068-1077.   DOI
57 Goodnight CJ (2001). Quantitative trait loci and gene interaction: the quantitative genetics of metapopulations. Heredity, 84, 587-598.   DOI
58 Almasy L, Dyer TD, and Peralta JM, et al. (2014). Data for genetic analysis workshop 18: human whole genome sequence, blood pressure, and simulated phenotypes in extended pedigrees. BMC proceedings, 8, 1-9.
59 Alves AC, De Silva NMG, and Karhunen V, et al. (2019). GWAS on longitudinal growth traits reveals different genetic factors influencing infant, child, and adult BMI. Science Advances, 5.
60 Broman KW, Wu H, Sen S, and Churchill GA (2003). R/qtl : QTL mapping in experimental crosses. Bioinformatics, 19, 889-890.   DOI
61 Yi N, Shriner D, Banerjee S, Mehta T, Pomp D, and Yandell BS (2007). An efficient Bayesian model selection approach for interacting quantitative trait loci models with many effects. Genetics, 176, 1865-1877.   DOI