DOI QR코드

DOI QR Code

Effect of an unsampled population on the estimation of a population size

집단 크기 추정에 대한 미표본 집단의 영향

  • Chung, Yujin (Department of Applied Statistics, Kyonggi University)
  • 정유진 (경기대학교 응용통계학과)
  • Received : 2020.04.19
  • Accepted : 2020.04.30
  • Published : 2020.06.30

Abstract

An Isolation-with-Migration (IM) model is used to estimate extant population sizes, the splitting time of populations split away from their common ancestral populations, and migration rates between the extant populations. An evolutionary model such as IM models is estimated by analyzing DNA sequences sampled from the extant populations in the model. When a true model includes an unsampled 'ghost' population without data, the unsampled population is often ignored from the evolutionary model to infer. In this paper, we conduct a simulation study to investigate the effect of an unsampled population on the estimation of the size of the sampled population. When there exists an unsampled population that shares migrations with the sampled population, the size estimation of the sampled population was biased. However, the size estimation was improved if an evolutionary model, including the unsampled population, was estimated.

IM 모형(Isolation-with-Migration model; IM model)은 현존하는 집단들의 크기, 그 집단들이 공통 조상 집단으로부터 분리 된 분화 시간, 그리고 현존 집단 간의 이주율을 추정하는 데 널리 사용되는 진화 모형이다. IM 모형과 같은 진화 모형은 그 진화 모형 내 현존 집단으로부터 추출 된 DNA 염기서열을 분석하여 추정할 수 있다. 참인 진화 모형이 데이터가 추출되지 않은 미표본 집단(unsampled population) 혹은 소위 ghost라 불리는 집단을 포함할 때, 종종 이 미표본 집단을 제외한 진화 모델이 추론된다. 본 논문에서는 미표본 집단이 표본집단의 크기 추정에 미치는 영향을 조사하기 위해 모의실험을 수행하였다. 표본집단과 미표본집단 사이에 이주 사건들이 존재하는 경우, 표본집단의 크기의 추정량은 편향되었다. 그러나 미표본집단을 포함한 진화 모델이 추정되면 표본집단의 크기의 추정량은 많은 경우 개선되었다.

Keywords

References

  1. Beerli, P. (2004). Effect of unsampled populations on the estimation of population sizes and migration rates between sampled populations, Molecular Ecology, 13, 827-836. https://doi.org/10.1111/j.1365-294X.2004.02101.x
  2. Chung, Y. (2019). Recent advances in Bayesian inference of isolation-with-migration models, Genomics & Informatics, 17, e37. https://doi.org/10.5808/GI.2019.17.4.e37
  3. Chung, Y. (2020). A maximum likelihood approach to infer demographic models, Communications for Statistical Applications and Methods, 27, 385-395. https://doi.org/10.29220/CSAM.2020.27.3.385
  4. Chung, Y. and Hey, J. (2017). Bayesian analysis of evolutionary divergence with genomic data under diverse demographic models. Molecular Biology and Evolution, 34, 1517-1528. https://doi.org/10.1093/molbev/msx070
  5. Felsenstein, J. (1988). Phylogenies from molecular sequences: Inference and reliability, Annual Review of Genetics, 22, 521-565. https://doi.org/10.1146/annurev.ge.22.120188.002513
  6. Hasegawa, M., Kishino, H., and Yano, T. (1985). Dating of the human-ape splitting by a molecular clock of mitochondrial DNA, Journal of Molecular Evolution, 22, 160-174. https://doi.org/10.1007/BF02101694
  7. Hey, J., Chung, Y., Sethuraman, A., Lachance, J., Tishkoff, S., Sousa, V. C., and Wang, Y. (2018). Phylogeny estimation by integration over isolation with migration models, Molecular Biology and Evolution, 35 2805-2818.
  8. Hudson, R. R. (2002). Generating samples under a Wright-Fisher neutral model of genetic variation, Bioinformatics, 18, 337-338. https://doi.org/10.1093/bioinformatics/18.2.337
  9. Jukes, T. H. and Cantor, C. R. (1969). Evolution of Protein Molecules, Academy Press.
  10. Kimura M. (1969). The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics, 61, 893-903. https://doi.org/10.1093/genetics/61.4.893
  11. Kingman J. F. C. (1982). The Coalescent, Stochastic Processes and their Applications, 13, 235-248. https://doi.org/10.1016/0304-4149(82)90011-4
  12. Rambaut, A. and Grassly, N. C. (1997). Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees, Bioinformatics, 13, 235-238. https://doi.org/10.1093/bioinformatics/13.3.235
  13. Swofford, D. (2002). PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4.0. Sinauer Associates.
  14. Tavare, S. (1986). Some probabilistic and statistical problems in the analysis of DNA sequences. American Mathematical Society: Lectures on Mathematics in the Life Sciences, 17, 57-86.
  15. Wakeley, J. (2009) Coalescent Theory: An Introduction. Roberts and Company Publishers.
  16. Zhu, T. and Yang Z. (2012). Maximum likelihood implementation of an isolation-with-migration model with three species for testing speciation with gene flow. Molecular Biology and Evolution, 29, 3131-3142. https://doi.org/10.1093/molbev/mss118