DOI QR코드

DOI QR Code

Identifying Copy Number Variants under Selection in Geographically Structured Populations Based on F-statistics

  • Song, Hae-Hiang (Division of Biostatistics, Department of Medical Lifescience, The Catholic University of Korea, College of Medicine) ;
  • Hu, Hae-Jin (Department of Microbiology, Integrated Research Center for Genome Polymorphism, The Catholic University of Korea, College of Medicine) ;
  • Seok, In-Hae (Department of Statistics, Hankuk University of Foreign Studies) ;
  • Chung, Yeun-Jun (Department of Microbiology, Integrated Research Center for Genome Polymorphism, The Catholic University of Korea, College of Medicine)
  • Received : 2012.04.10
  • Accepted : 2012.05.17
  • Published : 2012.06.30

Abstract

Large-scale copy number variants (CNVs) in the human provide the raw material for delineating population differences, as natural selection may have affected at least some of the CNVs thus far discovered. Although the examination of relatively large numbers of specific ethnic groups has recently started in regard to inter-ethnic group differences in CNVs, identifying and understanding particular instances of natural selection have not been performed. The traditional $F_{ST}$ measure, obtained from differences in allele frequencies between populations, has been used to identify CNVs loci subject to geographically varying selection. Here, we review advances and the application of multinomial-Dirichlet likelihood methods of inference for identifying genome regions that have been subject to natural selection with the $F_{ST}$ estimates. The contents of presentation are not new; however, this review clarifies how the application of the methods to CNV data, which remains largely unexplored, is possible. A hierarchical Bayesian method, which is implemented via Markov Chain Monte Carlo, estimates locus-specific $F_{ST}$ and can identify outlying CNVs loci with large values of FST. By applying this Bayesian method to the publicly available CNV data, we identified the CNV loci that show signals of natural selection, which may elucidate the genetic basis of human disease and diversity.

Keywords

References

  1. Sebat J, Lakshmi B, Troge J, Alexander J, Young J, Lundin P, et al. Large-scale copy number polymorphism in the human genome. Science 2004;305:525-528. https://doi.org/10.1126/science.1098918
  2. Tuzun E, Sharp AJ, Bailey JA, Kaul R, Morrison VA, Pertz LM, et al. Fine-scale structural variation of the human genome. Nat Genet 2005;37:727-732. https://doi.org/10.1038/ng1562
  3. Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, et al. Global variation in copy number in the human genome. Nature 2006;444:444-454. https://doi.org/10.1038/nature05329
  4. Conrad DF, Pinto D, Redon R, Feuk L, Gokcumen O, Zhang Y, et al. Origins and functional impact of copy number variation in the human genome. Nature 2010;464:704-712. https://doi.org/10.1038/nature08516
  5. Lupski JR. Genomic disorders: structural features of the genome can lead to DNA rearrangements and human disease traits. Trends Genet 1998;14:417-422. https://doi.org/10.1016/S0168-9525(98)01555-8
  6. Stankiewicz P, Lupski JR. Genome architecture, rearrangements and genomic disorders. Trends Genet 2002;18:74-82. https://doi.org/10.1016/S0168-9525(02)02592-1
  7. Jakobsson M, Scholz SW, Scheet P, Gibbs JR, VanLiere JM, Fung HC, et al. Genotype, haplotype and copy-number variation in worldwide human populations. Nature 2008;451:998-1003. https://doi.org/10.1038/nature06742
  8. Fanciulli M, Norsworthy PJ, Petretto E, Dong R, Harper L, Kamesh L, et al. FCGR3B copy number variation is associated with susceptibility to systemic, but not organ-specific, autoimmunity. Nat Genet 2007;39:721-723. https://doi.org/10.1038/ng2046
  9. Yang Y, Chung EK, Wu YL, Savelli SL, Nagaraja HN, Zhou B, et al. Gene copy-number variation and associated polymorphisms of complement component C4 in human systemic lupus erythematosus (SLE): low copy number is a risk factor for and high copy number is a protective factor against SLE susceptibility in European Americans. Am J Hum Genet 2007;80:1037-1054. https://doi.org/10.1086/518257
  10. Hollox EJ, Huffmeier U, Zeeuwen PL, Palla R, Lascorz J, Rodijk-Olthuis D, et al. Psoriasis is associated with increased beta-defensin genomic copy number. Nat Genet 2008;40:23-25. https://doi.org/10.1038/ng.2007.48
  11. Gonzalez E, Kulkarni H, Bolivar H, Mangano A, Sanchez R, Catano G, et al. The influence of CCL3L1 gene-containing segmental duplications on HIV-1/AIDS susceptibility. Science 2005;307:1434-1440. https://doi.org/10.1126/science.1101160
  12. Wright S. The genetical structure of populations. Ann Hum Genet 1949;15:323-354. https://doi.org/10.1111/j.1469-1809.1949.tb02451.x
  13. Holsinger KE, Weir BS. Genetics in geographically structured populations: defining, estimating and interpreting F(ST). Nat Rev Genet 2009;10:639-650. https://doi.org/10.1038/nrg2611
  14. Foll M. BayeScan v2.0 User Manual. BayeScan, 2010.
  15. Akey JM, Zhang G, Zhang K, Jin L, Shriver MD. Interrogating a high-density SNP map for signatures of natural selection. Genome Res 2002;12:1805-1814. https://doi.org/10.1101/gr.631202
  16. Cockerham CC. Analyses of gene frequencies. Genetics 1973;74:679-700.
  17. Weir BS. Genetic Data Analysis II: Methods for Discrete Population Genetic Data. Sunderland: Sinauer Associates, 1996.
  18. Weir BS, Cockerham CC. Estimating F-statistics for the analysis of population structure. Evolution 1984;38:1358-1370. https://doi.org/10.2307/2408641
  19. Balding DJ. Likelihood-based inference for genetic correlation coefficients. Theor Popul Biol 2003;63:221-230. https://doi.org/10.1016/S0040-5809(03)00007-8
  20. Rousset F. Inferences from spatial population genetics. In: Handbook of Statistical Genetics (Balding DJ, Bishop MJ, Cannings C, eds.). Chichester: Wiley, 2001. pp. 239-269.
  21. Rousset F. genepop'007: a complete re-implementation of the genepop software for Windows and Linux. Mol Ecol Resour 2008;8:103-106. https://doi.org/10.1111/j.1471-8286.2007.01931.x
  22. Beaumont MA, Balding DJ. Identifying adaptive genetic divergence among populations from genome scans. Mol Ecol 2004;13:969-980. https://doi.org/10.1111/j.1365-294X.2004.02125.x
  23. Vitalis R, Dawson K, Boursot P. Interpretation of variation across marker loci as evidence of selection. Genetics 2001;158:1811-1823.
  24. Beaumont MA, Nichols RA. Evaluating loci for use in the genetic analysis of population structure. Proc R Soc Lond Series B Biol Sci 1996;263;1619-1626. https://doi.org/10.1098/rspb.1996.0237
  25. Falush D, Stephens M, Pritchard JK. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 2003;164:1567-1587.
  26. Foll M, Gaggiotti O. Identifying the environmental factors that determine the genetic structure of populations. Genetics 2006;174:875-891. https://doi.org/10.1534/genetics.106.059451
  27. Foll M, Gaggiotti O. A genome-scan method to identify selected loci appropriate for both dominant and codominant markers: a Bayesian perspective. Genetics 2008;180:977-993. https://doi.org/10.1534/genetics.108.092221
  28. Green PJ. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 1995;82:711-732. https://doi.org/10.1093/biomet/82.4.711
  29. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series B Methodol 1995;57:289-300.
  30. Voight BF, Kudaravalli S, Wen X, Pritchard JK. A map of recent positive selection in the human genome. PLoS Biol 2006;4:e72. https://doi.org/10.1371/journal.pbio.0040072

Cited by

  1. A genome-wide characterization of copy number variations in native populations of Peninsular Malaysia vol.26, pp.6, 2018, https://doi.org/10.1038/s41431-018-0120-8