DOI QR코드

DOI QR Code

Grid-based Gaussian process models for longitudinal genetic data

  • Chung, Wonil (Department of Statistics and Actuarial Science, Soongsil University)
  • Received : 2021.07.08
  • Accepted : 2021.11.15
  • Published : 2022.01.31

Abstract

Although various statistical methods have been developed to map time-dependent genetic factors, most identified genetic variants can explain only a small portion of the estimated genetic variation in longitudinal traits. Gene-gene and gene-time/environment interactions are known to be important putative sources of the missing heritability. However, mapping epistatic gene-gene interactions is extremely difficult due to the very large parameter spaces for models containing such interactions. In this paper, we develop a Gaussian process (GP) based nonparametric Bayesian variable selection method for longitudinal data. It maps multiple genetic markers without restricting to pairwise interactions. Rather than modeling each main and interaction term explicitly, the GP model measures the importance of each marker, regardless of whether it is mostly due to a main effect or some interaction effect(s), via an unspecified function. To improve the flexibility of the GP model, we propose a novel grid-based method for the within-subject dependence structure. The proposed method can accurately approximate complex covariance structures. The dimension of the covariance matrix depends only on the number of fixed grid points although each subject may have different numbers of measurements at different time points. The deviance information criterion (DIC) and the Bayesian predictive information criterion (BPIC) are proposed for selecting an optimal number of grid points. To efficiently draw posterior samples, we combine a hybrid Monte Carlo method with a partially collapsed Gibbs (PCG) sampler. We apply the proposed GP model to a mouse dataset on age-related body weight.

Keywords

Acknowledgement

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (NRF-2020R1C1C1A01012657) and Basic Science Research Program through the NRF funded by the Ministry of Education (2021R1A6A1A10044154). This work was supported by Soongsil University Research Fund.

References

  1. Almasy L, Dyer TD, and Peralta JM, et al. (2014). Data for genetic analysis workshop 18: human whole genome sequence, blood pressure, and simulated phenotypes in extended pedigrees. BMC proceedings, 8, 1-9.
  2. Alves AC, De Silva NMG, and Karhunen V, et al. (2019). GWAS on longitudinal growth traits reveals different genetic factors influencing infant, child, and adult BMI. Science Advances, 5.
  3. Broman KW, Wu H, Sen S, and Churchill GA (2003). R/qtl : QTL mapping in experimental crosses. Bioinformatics, 19, 889-890. https://doi.org/10.1093/bioinformatics/btg112
  4. Burton PR, Clayton DG, and Cardon LR, et al. (2007). Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature, 447, 661-678. https://doi.org/10.1038/nature05911
  5. Chen Z and Dunson DB (2003). Random effects selection in linear mixed models. Biometrics, 59, 762-769. https://doi.org/10.1111/j.0006-341X.2003.00089.x
  6. Chung WI, Chen J, Turman C, Lindstrom S, Zhu Z, Loh PR, Kraft P, and Liang L (2019). Efficient cross-trait penalized regression increases prediction accuracy in large cohorts using secondary phenotypes. Nature Communications, 10, 1-11. https://doi.org/10.1038/s41467-018-07882-8
  7. Chung WI and Zou F (2014). Mixed-effects models for GAW18 longitudinal blood pressure data. BMC Proceedings, 8, S87.
  8. Churchill GA and Doerge RW (1994). Empirical threshold values for quantitative trait mapping. Genetics, 138, 963-971. https://doi.org/10.1093/genetics/138.3.963
  9. Cockerham CC (1954). An extension of the concept of partitioning hereditary variance for analysis of covariances among relatives when epistasis is present. Genetics, 39.
  10. Crow JF and Kimura M (1970). An Introduction to Population Genetics Theory, New York, Evanston and London: Harper & Row, Publishers.
  11. Das K, Li JA, and Wang Z, et al. (2011). A dynamic model for genome-wide association studies. Human Genetics, 129, 629-639. https://doi.org/10.1007/s00439-011-0960-6
  12. Das K, Li JA, Fu GF, Wang Z, Li RZ, and Wu RL (2013). Dynamic semiparametric Bayesian models for genetic mapping of complex trait with irregular longitudinal data. Statistics in Medicine, 32, 509-523. https://doi.org/10.1002/sim.5535
  13. Derkach A, Lawless JF, and Sun L (2013). Robust and powerful tests for rare variants using fisher's method to combine evidence of association from two or more complementary tests. Genetic Epidemiology, 37, 110-121. https://doi.org/10.1002/gepi.21689
  14. DeWan AT (2018). Gene-gene and gene-environment interactions. Genetic Epidemiology, 89-110.
  15. Dupuis J and Siegmund D (1999). Statistical methods for mapping quantitative trait loci from a dense set of markers. Genetics, 151, 373-386. https://doi.org/10.1093/genetics/151.1.373
  16. Fisher RA (1918). The correlation between relatives on the supposition of Mendelian inheritance. Transactions of the Royal Society of Edinburgh, 52, 399-433. https://doi.org/10.1017/S0080456800012163
  17. Furlotte NA, Eskin E, and Eyheramendy S (2012). Genome-wide association mapping with longitudinal data. Genetic Epidemiology, 36, 463-471. https://doi.org/10.1002/gepi.21640
  18. Gibson G (2012). Rare and common variants: twenty arguments. Nature Reviews Genetics, 13, 135-145. https://doi.org/10.1038/nrg3118
  19. Gilmour AR, Thompson R, and Cullis BR (1995). Average information REML: An efficient algorithm for variance parameter estimation in linear mixed models. Biometrics, 1440-1450.
  20. Goodnight CJ (2001). Quantitative trait loci and gene interaction: the quantitative genetics of metapopulations. Heredity, 84, 587-598. https://doi.org/10.1046/j.1365-2540.2000.00698.x
  21. Haley CS and Knott SA (1992). A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity, 69.
  22. Kamatani Y, Matsuda K, Okada Y, Kubo M, Hosono N, Daigo Y, Nakamura Y, and Kamatani N (2010). Genome-wide association study of hematological and biochemical traits in a Japanese population. Nature Genetics, 42, 210-215. https://doi.org/10.1038/ng.531
  23. Kang HM, Sul JH, Service SK, Zaitlen NA, Kong SY, Freimer NB, Sabatti C, and Eskin E (2010). Variance component model to account for sample structure in genome-wide association studies. Nature Genetics, 42, 348-354. https://doi.org/10.1038/ng.548
  24. Kang HM, Zaitlen NA, Wade CM, Kirby A, Heckerman D, Daly MJ, and Eskin E (2008). Efficient control of population structure in model organism association mapping. Genetics, 178, 1709-1723. https://doi.org/10.1534/genetics.107.080101
  25. Kao CH and Zeng ZB (1997). General formulas for obtaining the MLEs and the asymptotic variance-covariance matrix in mapping quantitative trait loci when using the EM algorithm. Biometrics, 653-665.
  26. Kao CH, Zeng ZB, and Teasdale RD (1999). Multiple interval mapping for quantitative trait loci. Genetics, 152, 1203-1216. https://doi.org/10.1093/genetics/152.3.1203
  27. Kao CH and Zeng ZB (2002). Modeling epistasis of quantitative trait loci using Cockerham's model. Genetics, 160, 1243-1261. https://doi.org/10.1093/genetics/160.3.1243
  28. Kearsey MJ and Pooni HS (1998). The Genetical Analysis of Quantitative Traits, Stanley Thornes (Publishers) Ltd.
  29. Lander ES and Botstein D (1989). Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics, 121, 185-199. https://doi.org/10.1093/genetics/121.1.185
  30. Lee SG, Emond MJ, and Bamshad MJ, et al. (2012). Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. American Journal of Human Genetics.
  31. Li B and Leal SM (2008). Methods for detecting associations with rare variants for common diseases: Application to analysis of sequence data. American Journal of Human Genetics, 83, 311-321. https://doi.org/10.1016/j.ajhg.2008.06.024
  32. Lippert C, Listgarten J, Liu Y, Kadie CM, Davidson RI, and Heckerman D (2011). Fast linear mixed models for genome-wide association studies. Nature Methods, 8, 833-835. https://doi.org/10.1038/nmeth.1681
  33. Ma CX, Casella G, and Wu R (2002). Functional mapping of quantitative trait loci underlying the character process: A theoretical framework. Genetics, 161, 1751-1762. https://doi.org/10.1093/genetics/161.4.1751
  34. Madsen BE and Browning SR (2009). A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genetics, 5.
  35. Mather K (1967). Complementary and duplicate gene interactions in biometrical genetics. Heredity, 22.
  36. Mather K and Jinks JL (1977). Introduction to Biometrical Genetics, Cambridge University Press.
  37. Mei H, Chen W, and Jiang F, et al. (2012). Longitudinal replication studies of GWAS risk SNPs influencing body mass index over the course of childhood and adulthood. PLoS ONE, 7.
  38. Morgenthaler S and Thilly WG (2007). A strategy to discover genes that carry multi-allelic or monoallelic risk for common diseases: A cohort allelic sums test (CAST). Mutation Research/Fund amental and Molecular Mechanisms of Mutagenesis, 615, 28-56. https://doi.org/10.1016/j.mrfmmm.2006.09.003
  39. Morris AP and Zeggini E (2010). An evaluation of statistical approaches to rare variant analysis in genetic association studies. Genetic Epidemiology, 34, 188-193. https://doi.org/10.1002/gepi.20450
  40. Neale BM, Rivas MA, and Voight BF, et al. (2011). Testing for an unusual distribution of rare variants. PLoS Genetics, 7.
  41. Ning C, Kang HM, Zhou L, Wang D, Wang H, Wang A, Fu J, Zhang S, and Liu J (2017). Performance gains in genome-wide association studies for longitudinal traits via modeling time-varied effects. Scientific reports, 7, 1-12. https://doi.org/10.1038/s41598-016-0028-x
  42. Pitman WA, Korstanje R, and Churchill GA, et al. (2002). Quantitative trait locus mapping of genes that regulate HDL cholesterol in SM/J and NZB/B1NJ inbred mice. Physiological Genomics, 9, 93-102. https://doi.org/10.1152/physiolgenomics.00107.2001
  43. Price AL, Kryukov GV, de Bakker PIW, Purcell SM, Staples J, Wei LJ, and Sunyaev SR (2010). Pooled association tests for rare variants in exon-resequencing studies. American Journal of Human Genetics, 86, 832-838. https://doi.org/10.1016/j.ajhg.2010.04.005
  44. Purcell S, Neale B, and Todd-Brown K, et al. (2007). PLINK: A tool set for whole-genome association and population-based linkage analyses. American Journal of Human Genetics, 81, 559-575. https://doi.org/10.1086/519795
  45. Sabatti C, Service SK, and Hartikainen A-L, et al. (2009). Genome-wide association analysis of metabolic traits in a birth cohort from a founder population. Nature Genetics, 41, 35-46. https://doi.org/10.1038/ng.271
  46. Sax K (1923). The association of size differences with seed-coat pattern and pigmentation in Phaseolus vulgaris. Genetics, 8, 552. https://doi.org/10.1093/genetics/8.6.552
  47. Schaffner SF, Foo C, Gabriel S, Reich D, Daly MJ, and Altshuler D (2005). Calibrating a coalescent simulation of human genome sequence variation. Genome Research, 15, 1576-1583. https://doi.org/10.1101/gr.3709305
  48. Shang J, Zhang JY, Lei XJ, Zhao WY, and Dong YF (2013). EpiSIM: simulation of multiple epistasis, linkage disequilibrium patterns and haplotype blocks for genome-wide interaction analysis. Genes & Genomics, 1-12.
  49. Smith EN, Chen W, and Kahonen M, et al. (2010). Longitudinal genome-wide association of cardiovascular disease risk factors in the Bogalusa heart study. PLoS Genetics, 6.
  50. Strachan DP, Rudnicka AR, and Power C, et al. (2007). Lifecourse influences on health among British adults: effects of region of residence in childhood and adulthood. International journal of epidemiology, 36, 522-531. https://doi.org/10.1093/ije/dyl309
  51. VanLiere JM and Rosenberg NA (2008). Mathematical properties of the r2 measure of linkage disequilibrium. Theoretical Population Biology, 74, 130-137. https://doi.org/10.1016/j.tpb.2008.05.006
  52. Wu MC, Lee SG, Cai T, Li Y, Boehnke M, and Lin X (2011). Rare-variant association testing for sequencing data with the sequence kernel association test. American Journal of Human Genetics, 89, 82-93. https://doi.org/10.1016/j.ajhg.2011.05.029
  53. Wu WR, Li WM, Tang DZ, Lu HR, and Worland AJ (1999). Time-related mapping of quantitative trait loci underlying tiller number in rice. Genetics, 151, 297-303. https://doi.org/10.1093/genetics/151.1.297
  54. Wu W, Zhou Y, Li W, Mao D, and Chen Q (2002). Mapping of quantitative trait loci based on growth models. TAG Theoretical and Applied Genetics, 105, 1043-1049. https://doi.org/10.1007/s00122-002-1052-8
  55. Yandell BS, Mehta T, and Banerjee S, et al. (2007). R/qtlbim: QTL with Bayesian interval mapping in experimental crosses. Bioinformatics, 23, 641-643. https://doi.org/10.1093/bioinformatics/btm011
  56. Yang R, Tian Q, and Xu S (2006). Mapping quantitative trait loci for longitudinal traits in line crosses. Genetics, 173, 2339-2356. https://doi.org/10.1534/genetics.105.054775
  57. Yap JS, Fan J, and Wu R (2009). Nonparametric modeling of longitudinal covariance structure in functional mapping of quantitative trait loci. Biometrics, 65, 1068-1077. https://doi.org/10.1111/j.1541-0420.2009.01222.x
  58. Yi N, Shriner D, Banerjee S, Mehta T, Pomp D, and Yandell BS (2007). An efficient Bayesian model selection approach for interacting quantitative trait loci models with many effects. Genetics, 176, 1865-1877. https://doi.org/10.1534/genetics.107.071365
  59. Young AI (2019). Solving the missing heritability problem. PLoS genetics, 15.
  60. Zeng ZB (1993). Theoretical basis for separation of multiple linked gene effects in mapping quantitative trait loci. Proceedings of the National Academy of Sciences, 90, 10972-10976. https://doi.org/10.1073/pnas.90.23.10972
  61. Zhou X and Stephens M (2012). Genome-wide efficient mixed-model analysis for association studies. Nature Genetics, 44, 821-824. https://doi.org/10.1038/ng.2310