DOI QR코드

DOI QR Code

A maximum likelihood approach to infer demographic models

  • Chung, Yujin (Department of Applied Statistics, Kyonggi University)
  • Received : 2020.03.20
  • Accepted : 2020.04.20
  • Published : 2020.05.31

Abstract

We present a new maximum likelihood approach to estimate demographic history using genomic data sampled from two populations. A demographic model such as an isolation-with-migration (IM) model explains the genetic divergence of two populations split away from their common ancestral population. The standard probability model for an IM model contains a latent variable called genealogy that represents gene-specific evolutionary paths and links the genetic data to the IM model. Under an IM model, a genealogy consists of two kinds of evolutionary paths of genetic data: vertical inheritance paths (coalescent events) through generations and horizontal paths (migration events) between populations. The computational complexity of the IM model inference is one of the major limitations to analyze genomic data. We propose a fast maximum likelihood approach to estimate IM models from genomic data. The first step analyzes genomic data and maximizes the likelihood of a coalescent tree that contains vertical paths of genealogy. The second step analyzes the estimated coalescent trees and finds the parameter values of an IM model, which maximizes the distribution of the coalescent trees after taking account of possible migration events. We evaluate the performance of the new method by analyses of simulated data and genomic data from two subspecies of common chimpanzees in Africa.

Keywords

References

  1. Andersen LN, Mailund T, and Hobolth A (2014). Efficient computation in the IM model, Journal of Mathematical Biology, 68, 1423-1451. https://doi.org/10.1007/s00285-013-0671-9
  2. Chung Y (2019). Recent advances in Bayesian inference of isolation-with-migration models, Genomics & Informatics, 17, e37. https://doi.org/10.5808/GI.2019.17.4.e37
  3. Chung Y and Hey J (2017). Bayesian analysis of evolutionary divergence with genomic data under diverse demographic models, Molecular Biology and Evolution, 34, 1517-1528. https://doi.org/10.1093/molbev/msx070
  4. Felsenstein J (1976). The theoretical population genetics of variable selection and migration, Annual Review of Genetics, 10, 253-280. https://doi.org/10.1146/annurev.ge.10.120176.001345
  5. Felsenstein J (1988). Phylogenies from molecular sequences: inference and reliability, Annual Review of Genetics, 22, 521-565. https://doi.org/10.1146/annurev.ge.22.120188.002513
  6. Hasegawa M, Kishino H, and Yano T (1985). Dating of the human-ape splitting by a molecular clock of mitochondrial DNA, Journal of Molecular Evolution 22, 160-174. https://doi.org/10.1007/BF02101694
  7. Hey J (2010). The divergence of chimpanzee species and subspecies as revealed in multipopulation isolation-with-migration analyses, Molecular Biology and Evolution, 27, 921-933. https://doi.org/10.1093/molbev/msp298
  8. Hey J, Chung Y, Sethuraman, A., Lachance J, Tishkoff S, Sousa VC, and Wang Y (2018). Phylogeny estimation by integration over isolation with migration models, Molecular Biology and Evolution, 35, 2805-2818.
  9. Hey J and Nielsen R (2004). Multilocus methods for estimating population sizes, migration rates and divergence time, with applications to the divergence of drosophila pseudoobscura and d. persimilis, Genetics, 167, 747-760. https://doi.org/10.1534/genetics.103.024182
  10. Hey J and Nielsen R (2007) Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics, PNAS, 104, 2785-2790. https://doi.org/10.1073/pnas.0611164104
  11. Hobolth A, Andersen LN, and Mailund T (2011). On computing the coalescence time density in an isolation-with-migration model with few samples, Genetics, 187, 1241-1243. https://doi.org/10.1534/genetics.110.124164
  12. Hudson RR (1983). Properties of a neutral allele model with intragenic recombination, Theoretical Population Biology, 23, 183-201. https://doi.org/10.1016/0040-5809(83)90013-8
  13. Hudson RR (2002). Generating samples under a Wright-Fisher neutral model of genetic variation, Bioinformatics 18, 337-338. https://doi.org/10.1093/bioinformatics/18.2.337
  14. Jukes TH and Cantor CR (1969). Evolution of protein molecules. In Mammalian Protein Metabolism (Munro HN ed, pp. 21-132), Academic Press, New York.
  15. Kimura M (1969). The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations, Genetics, 61, 893-903. https://doi.org/10.1093/genetics/61.4.893
  16. Kingman JF (1982). On the genealogy of large populations, Journal of Applied Probability, 19, 27-43. https://doi.org/10.2307/3213548
  17. Prado-Martinez J, Sudmant PH, Kidd JM, et al. (2013). Great ape genetic diversity and population history, Nature, 499, 471-475. https://doi.org/10.1038/nature12228
  18. Rambaut A and Grassly NC (1997). Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees, Bioinformatics 13, 235-238. https://doi.org/10.1093/bioinformatics/13.3.235
  19. Semple C and Steel M (2003). Phylogenetics, Oxford University Press, New York.
  20. Sousa VC, Grelaud A, and Hey J (2011). On the nonidentifiability of migration time estimates in isolation with migration models, Molecular Ecology, 20, 3956-3962. https://doi.org/10.1111/j.1365-294X.2011.05247.x
  21. Strasburg JL and Rieseberg LH (2011). Interpreting the estimated timing of migration events between hybridizing species, Molecular Ecology, 20, 2353-2366. https://doi.org/10.1111/j.1365-294X.2011.05048.x
  22. Swofford D (2002). PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4.0, Sinauer Associates.
  23. Tavare S (1986). Some probabilistic and statistical problems in the analysis of DNA sequences, Lectures on Mathematics in the Life Sciences, 17, 57-86.
  24. Won YJ and Hey J (2005). Divergence population genetics of chimpanzees, Molecular Biology and Evolution, 22, 297-307. https://doi.org/10.1093/molbev/msi017
  25. Wright S (1931). Evolution in mendelian populations, Genetics, 16, 97-159. https://doi.org/10.1093/genetics/16.2.97
  26. Zhu T and Yang Z (2012). Maximum likelihood implementation of an isolation-with-migration model with three species for testing speciation with gene flow, Molecular Biology and Evolution, 29, 3131-3142. https://doi.org/10.1093/molbev/mss118