DOI QR코드

DOI QR Code

Bayesian mixed models for longitudinal genetic data: theory, concepts, and simulation studies

  • Chung, Wonil (Department of Statistics and Actuarial Science, Soongsil University) ;
  • Cho, Youngkwang (Department of Statistics and Actuarial Science, Soongsil University)
  • Received : 2021.12.13
  • Accepted : 2022.03.03
  • Published : 2022.03.31

Abstract

Despite the success of recent genome-wide association studies investigating longitudinal traits, a large fraction of overall heritability remains unexplained. This suggests that some of the missing heritability may be accounted for by gene-gene and gene-time/environment interactions. In this paper, we develop a Bayesian variable selection method for longitudinal genetic data based on mixed models. The method jointly models the main effects and interactions of all candidate genetic variants and non-genetic factors and has higher statistical power than previous approaches. To account for the within-subject dependence structure, we propose a grid-based approach that models only one fixed-dimensional covariance matrix, which is thus applicable to data where subjects have different numbers of time points. We provide the theoretical basis of our Bayesian method and then illustrate its performance using data from the 1000 Genome Project with various simulation settings. Several simulation studies show that our multivariate method increases the statistical power compared to the corresponding univariate method and can detect gene-time/ environment interactions well. We further evaluate our method with different numbers of individuals, variants, and causal variants, as well as different trait-heritability, and conclude that our method performs reasonably well with various simulation settings.

Keywords

Acknowledgement

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (2020R1C1C1A01012657) and Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2021R1A6A1A10044154). This work was supported by the Soongsil University Research Fund.

References

  1. Sabatti C, Service SK, Hartikainen AL, Pouta A, Ripatti S, Brodsky J, et al. Genome-wide association analysis of metabolic traits in a birth cohort from a founder population. Nat Genet 2009;41:35-46. https://doi.org/10.1038/ng.271
  2. Mei H, Chen W, Jiang F, He J, Srinivasan S, Smith EN, et al. Longitudinal replication studies of GWAS risk SNPs influencing body mass index over the course of childhood and adulthood. PLoS One 2012;7:e31470. https://doi.org/10.1371/journal.pone.0031470
  3. Furlotte NA, Eskin E, Eyheramendy S. Genome-wide association mapping with longitudinal data. Genet Epidemiol 2012;36:463-471. https://doi.org/10.1002/gepi.21640
  4. Das K, Li J, Fu G, Wang Z, Li R, Wu R. Dynamic semiparametric Bayesian models for genetic mapping of complex trait with irregular longitudinal data. Stat Med 2013;32:509-523. https://doi.org/10.1002/sim.5535
  5. Couto Alves A, De Silva NM, Karhunen V, Sovio U, Das S, Taal HR, et al. GWAS on longitudinal growth traits reveals different genetic factors influencing infant, child, and adult BMI. Sci Adv 2019;5:eaaw3095. https://doi.org/10.1126/sciadv.aaw3095
  6. Gouveia MH, Bentley AR, Leonard H, Meeks KA, Ekoru K, Chen G, et al. Trans-ethnic meta-analysis identifies new loci associated with longitudinal blood pressure traits. Sci Rep 2021;11:4075. https://doi.org/10.1038/s41598-021-83450-3
  7. Chung W. Statistical models and computational tools for predicting complex traits and diseases. Genomics Inform 2021;19:e36. https://doi.org/10.5808/gi.21053
  8. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, et al. Finding the missing heritability of complex diseases. Nature 2009;461:747-753. https://doi.org/10.1038/nature08494
  9. Visscher PM, Brown MA, McCarthy MI, Yang J. Five years of GWAS discovery. Am J Hum Genet 2012;90:7-24. https://doi.org/10.1016/j.ajhg.2011.11.029
  10. Chung W, Chen J, Turman C, Lindstrom S, Zhu Z, Loh PR, et al. Efficient cross-trait penalized regression increases prediction accuracy in large cohorts using secondary phenotypes. Nat Commun 2019;10:569. https://doi.org/10.1038/s41467-019-08535-0
  11. Clarke AJ, Cooper DN. GWAS: heritability missing in action? Eur J Hum Genet 2010;18:859-861. https://doi.org/10.1038/ejhg.2010.35
  12. Smith EN, Chen W, Kahonen M, Kettunen J, Lehtimaki T, Peltonen L, et al. Longitudinal genome-wide association of cardiovascular disease risk factors in the Bogalusa heart study. PLoS Genet 2010;6:e1001094. https://doi.org/10.1371/journal.pgen.1001094
  13. Satagopan JM, Yandell BS, Newton MA, Osborn TC. A Bayesian approach to detect quantitative trait loci using Markov chain Monte Carlo. Genetics 1996;144:805-816. https://doi.org/10.1093/genetics/144.2.805
  14. Yi N, Xu S. Mapping quantitative trait loci with epistatic effects. Genet Res 2002;79:185-198. https://doi.org/10.1017/S0016672301005511
  15. Yi N, George V, Allison DB. Stochastic search variable selection for identifying multiple quantitative trait loci. Genetics 2003;164:1129-1138. https://doi.org/10.1093/genetics/164.3.1129
  16. Yi N. A unified Markov chain Monte Carlo framework for mapping multiple quantitative trait loci. Genetics 2004;167:967-975. https://doi.org/10.1534/genetics.104.026286
  17. Yi N, Yandell BS, Churchill GA, Allison DB, Eisen EJ, Pomp D. Bayesian model selection for genome-wide epistatic quantitative trait loci analysis. Genetics 2005;170:1333-1344. https://doi.org/10.1534/genetics.104.040386
  18. Yi N, Shriner D, Banerjee S, Mehta T, Pomp D, Yandell BS. An efficient Bayesian model selection approach for interacting quantitative trait loci models with many effects. Genetics 2007;176:1865-1877. https://doi.org/10.1534/genetics.107.071365
  19. Banerjee S, Yandell BS, Yi N. Bayesian quantitative trait loci mapping for multiple traits. Genetics 2008;179:2275-2289. https://doi.org/10.1534/genetics.108.088427
  20. Wu WR, Li WM, Tang DZ, Lu HR, Worland AJ. Time-related mapping of quantitative trait loci underlying tiller number in rice. Genetics 1999;151:297-303. https://doi.org/10.1093/genetics/151.1.297
  21. Wu W, Zhou Y, Li W, Mao D, Chen Q. Mapping of quantitative trait loci based on growth models. Theor Appl Genet 2002;105:1043-1049. https://doi.org/10.1007/s00122-002-1052-8
  22. Ma CX, Casella G, Wu R. Functional mapping of quantitative trait loci underlying the character process: a theoretical framework. Genetics 2002;161:1751-1762. https://doi.org/10.1093/genetics/161.4.1751
  23. Yap JS, Fan J, Wu R. Nonparametric modeling of longitudinal covariance structure in functional mapping of quantitative trait loci. Biometrics 2009;65:1068-1077. https://doi.org/10.1111/j.1541-0420.2009.01222.x
  24. Yang R, Tian Q, Xu S. Mapping quantitative trait loci for longitudinal traits in line crosses. Genetics 2006;173:2339-2356. https://doi.org/10.1534/genetics.105.054775
  25. Chung W, Zou F. Mixed-effects models for GAW18 longitudinal blood pressure data. BMC Proc 2014;8(Suppl 1):S87. https://doi.org/10.1186/1753-6561-8-S1-S87
  26. Chen Z, Dunson DB. Random effects selection in linear mixed models. Biometrics 2003;59:762-769. https://doi.org/10.1111/j.0006-341X.2003.00089.x
  27. Lehmann EL, Casella G. Theory of Point Estimation. New York: Springer, 2006.
  28. Jeffreys H. Theory of Probability. 3rd ed. Oxford: Clarendon, 1961.
  29. Yandell BS, Mehta T, Banerjee S, Shriner D, Venkataraman R, Moon JY, et al. R/qtlbim: QTL with Bayesian Interval Mapping in experimental crosses. Bioinformatics 2007;23:641-643. https://doi.org/10.1093/bioinformatics/btm011
  30. Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A. Bayesian measures of model complexity and fit. J R Stat Soc Series B Stat Methodol 2002;64:583-639. https://doi.org/10.1111/1467-9868.00353
  31. Robert CP, Titterington DM. Discussion of a paper by D. J. Spiegelhalter et al. J. R. Stat. Soc. Ser. B 2002;64:621-622.
  32. Ando T. Bayesian predictive information criterion for the evaluation of hierarchical Bayesian and empirical Bayes models. Biometrika 2007;94:443-458. https://doi.org/10.1093/biomet/asm017
  33. Chung W. Bayesian Parametric and Nonparametric Methods for Multiple QTL Mapping and SNP-Set Analysis. Chapel Hill: University of North Carolina at Chapel Hill, 2013.
  34. Broman KW, Wu H, Sen S, Churchill GA. R/qtl: QTL mapping in experimental crosses. Bioinformatics 2003;19:889-890. https://doi.org/10.1093/bioinformatics/btg112
  35. Geweke JF. Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. Minneapolis: Federal Reserve Bank of Minneapolis, 1991.
  36. Gelman A, Rubin DB. Inference from iterative simulation using multiple sequences. Stat Sci 1992;7:457-472. https://doi.org/10.1214/ss/1177011136
  37. Ando T. Predictive Bayesian model selection. Am J Math Manag Sci 2011;31:13-38. https://doi.org/10.1080/01966324.2011.10737798
  38. Lindley DV. Bayesian Statistics: A Review. Philadelphia: Society for Industrial and Applied Mathematics, 1972.
  39. Poirier DJ. Revising beliefs in nonidentified models. Econ Theor 1998;14:483-509. https://doi.org/10.1017/S0266466698144043
  40. Eberly LE, Carlin BP. Identifiability and convergence issues for Markov chain Monte Carlo fitting of spatial models. Stat Med 2000;19:2279-2294. https://doi.org/10.1002/1097-0258(20000915/30)19:17/18<2279::AID-SIM569>3.0.CO;2-R
  41. Gelfand AE, Sahu SK. Identifiability, improper priors, and Gibbs sampling for generalized linear models. J Am Stat Assoc 1999;94:247-253. https://doi.org/10.1080/01621459.1999.10473840