DOI QR코드

DOI QR Code

How Many SNPs Should Be Used for the Human Phylogeny of Highly Related Ethnicities? A Case of Pan Asian 63 Ethnicities

  • Ghang, Ho-Young (Department of Electrical Engineering, Korea Advanced Institute of Science and Technology) ;
  • Han, Young-Joo (Department of Electrical Engineering, Korea Advanced Institute of Science and Technology) ;
  • Jeong, Sang-Jin (Department of Electrical Engineering, Korea Advanced Institute of Science and Technology) ;
  • Bhak, Jong (Theragen Bio Institute, Theragen Etex Co. Ltd.) ;
  • Lee, Sung-Hoon (Theragen Bio Institute, Theragen Etex Co. Ltd.) ;
  • Kim, Tae-Hyung (Theragen Bio Institute, Theragen Etex Co. Ltd.) ;
  • Kim, Chul-Hong (Theragen Bio Institute, Theragen Etex Co. Ltd.) ;
  • Kim, Sang-Soo (Department of Bioinformatics & Life Sciences, Soongsil University) ;
  • Al-Mulla, Fahd (Department of Pathology, University of Kuwait) ;
  • Youn, Chan-Hyun (Department of Electrical Engineering, Korea Advanced Institute of Science and Technology) ;
  • Yoo, Hyang-Sook (Korea Research Institute of Bioscience and Biotechnology (KRIBB)) ;
  • The HUGO Pan-Asian SNP Consortium, The HUGO Pan-Asian SNP Consortium (The HUGO Pan-Asian SNP Consortium)
  • Received : 2011.11.14
  • Accepted : 2011.12.01
  • Published : 2011.12.31

Abstract

In planning a model-based phylogenic study for highly related ethnic data, the SNP marker number is an important factor to determine for relationship inferences. Genotype frequency data, utilizing a sub sampling method, from 63 Pan Asian ethnic groups was used for determining the minimum SNP number required to establish such relationships. Bootstrap random sub-samplings were done from 5.6K PASNPi SNP data. DA distance was calculated and neighbour-joining trees were drawn with every re-sampling data set. Consensus trees were made with the same 100 sub-samples and bootstrap proportions were calculated. The tree consistency to the one obtained from the whole marker set, improved with increasing marker numbers. The bootstrap proportions became reliable when more than 7,000 SNPs were used at a time. Within highly related ethnic groups, the minimum SNPs number for a robust neighbor-joining tree inference was about 7,000 for a 95% bootstrap support.

Keywords

References

  1. Abdulla, M.A., Ahmed, I., Assawamakin, A., Bhak, J., Brahmachari, S.K., Calacal, G.C., Chaurasia, A., Chen, C.H., Chen, J., Chen, Y.T., Chu, J., Cutiongco-de la Paz, E.M., De Ungria, M.C., Delfin, F.C., Edo, J., Fuchareon, S., Ghang, H., Gojobori, T., Han, J., Ho, S.F., Hoh, B.P., Huang, W., Inoko, H., Jha, P., Jinam, T.A., Jin, L., Jung, J., Kangwanpong, D., Kampuansai, J., Kennedy, G.C., Khurana, P., Kim, H.L., Kim, K., Kim, S., Kim, W.Y., Kimm, K., Kimura, R., Koike, T., Kulawonganunchai, S., Kumar, V., Lai, P.S., Lee, J.Y., Lee, S., Liu, E.T., Majumder, P.P., Mandapati, K.K., Marzuki, S., Mitchell, W., Mukerji, M., Naritomi, K., Ngamphiw, C., Niikawa, N., Nishida, N., Oh, B., Oh, S., Ohashi, J., Oka, A., Ong, R., Padilla, C.D., Palittapongarnpim, P., Perdigon, H.B., Phipps, M.E., Png, E., Sakaki, Y., Salvador, J.M., Sandraling, Y., Scaria, V., Seielstad, M., Sidek, M.R., Sinha, A., Srikummool, M., Sudoyo, H., Sugano, S., Suryadi, H., Suzuki, Y., Tabbada, K.A., Tan, A., Tokunaga, K., Tongsima, S., Villamor, L.P., Wang, E., Wang, Y., Wang, H., Wu, J.Y., Xiao, H., Xu, S., Yang, J.O., Shugart, Y.Y., Yoo, H.S., Yuan, W., Zhao, G., and Zilfalil, B.A. (2009). Mapping human genetic diversity in Asia. Science 326, 1541-1545. https://doi.org/10.1126/science.1177074
  2. Agrawal, S. and Khan, F. (2005). Reconstructing recent human phylogenies with forensic STR loci: a statistical approach. BMC Genet. 6, 47. https://doi.org/10.1186/1471-2156-6-47
  3. Cavalli-Sforza, L.L. and Feldman, M.W. (2003). The application of molecular genetic approaches to the study of human evolution. Nat. Genet. 33 Suppl, 266-275. https://doi.org/10.1038/ng1113
  4. Collins, F.S., Guyer, M.S., and Charkravarti, A. (1997). Variations on a theme: cataloging human DNA sequence variation. Science 278, 1580-1581. https://doi.org/10.1126/science.278.5343.1580
  5. Felsenstein, J. (1988). Phylogenies from molecular sequences: inference and reliability. Annu. Rev. Genet. 22, 521-565. https://doi.org/10.1146/annurev.ge.22.120188.002513
  6. Felsenstein, J. (1989). PHYLIP-Phylogeny Inference Package (Version 3.2). Cladistics 5, 164-166.
  7. Gabriel, S.B., Schaffner, S.F., Nguyen, H., Moore, J.M., Roy, J., Blumenstiel, B., Higgins, J., DeFelice, M., Lochner, A., Faggart, M., Liu-Cordero, S.N., Rotimi, C., Adeyemo, A., Cooper, R., Ward, R., Lander, E.S., Daly, M.J., and Altshuler, D. (2002). The structure of haplotype blocks in the human genome. Science 296, 2225-2229. https://doi.org/10.1126/science.1069424
  8. Glover, K.A., Hansen, M.M., Lien, S., Als, T.D., Hoyheim, B., and Skaala, O. (2010). A comparison of SNP and STR loci for delineating population structure and performing individual genetic assignment. BMC Genet . 11, 2.
  9. Hinch, A.G., Tandon, A., Patterson, N., Song, Y., Rohland, N., Palmer, C.D., Chen, G.K., Wang, K., Buxbaum, S.G., Akylbekova, E.L., Aldrich, M.C., Ambrosone, C.B., Amos, C., Bandera, E.V., Berndt, S.I., Bernstein, L., Blot, W.J., Bock, C.H., Boerwinkle, E., Cai, Q., Caporaso, N., Casey, G., Cupples, L.A., Deming, S.L., Diver, W.R., Divers, J., Fornage, M., Gillanders, E.M., Glessner, J., Harris, C.C., Hu, J.J., Ingles, S.A., Isaacs, W., John, E.M., Kao, W.H., Keating, B., Kittles, R.A., Kolonel, L.N., Larkin, E., Le Marchand, L., McNeill, L.H., Millikan, R.C., Murphy, A., Musani, S., Neslund-Dudas, C., Nyante, S., Papanicolaou, G.J., Press, M.F., Psaty, B.M., Reiner, A.P., Rich, S.S., Rodriguez-Gil, J.L., Rotter, J.I., Rybicki, B.A., Schwartz, A.G., Signorello, L.B., Spitz, M., Strom, S.S., Thun, M.J., Tucker, M.A., Wang, Z., Wiencke, J.K., Witte, J.S., Wrensch, M., Wu, X., Yamamura, Y., Zanetti, K.A., Zheng, W., Ziegler, R.G., Zhu, X., Redline, S., Hirschhorn, J.N., Henderson, B.E., Taylor, H.A., Jr., Price, A.L., Hakonarson, H., Chanock, S.J., Haiman, C.A., Wilson, J.G., Reich, D., and Myers, S.R. (2011). The landscape of recombination in African Americans. Nature 476, 170-175. https://doi.org/10.1038/nature10336
  10. Jung, J., Kang, H., Cho, Y.S., Oh, J.H., Ryu, M.H., Chung, H.W., Seo, J.S., Lee, J.E., Oh, B., Bhak, J., and Kim, H.L. (2010). Gene Flow between the Korean Peninsula and Its Neighboring Countries. PLoS One 5, e11855. https://doi.org/10.1371/journal.pone.0011855
  11. Karafet, T.M., Mendez, F.L., Meilerman, M.B., Underhill, P.A., Zegura, S.L., and Hammer, M.F. (2008). New binary polymorphisms reshape and increase resolution of the human Y chromosomal haplogroup tree. Genome Res. 18, 830-838. https://doi.org/10.1101/gr.7172008
  12. Lecointre, G., Philippe, H., Van Le, H.L., and Le Guyader, H. (1994). How many nucleotides are required to resolve a phylogenetic problem? The use of a new statistical method applicable to available sequences. Mol. Phylogenet. Evol. 3, 292-309. https://doi.org/10.1006/mpev.1994.1037
  13. Li, D., Sun, Y., Lu, Y., Mustavich, L.F., Ou, C., Zhou, Z., Li, S., Jin, L., and Li, H. (2010). Genetic origin of Kadai-speaking Gelong people on Hainan island viewed from Y chromosomes. J. Hum. Genet. 55, 462-468. https://doi.org/10.1038/jhg.2010.50
  14. Li, J.Z., Absher, D.M., Tang, H., Southwick, A.M., Casto, A.M., Ramachandran, S., Cann, H.M., Barsh, G.S., Feldman, M., Cavalli-Sforza, L.L., and Myers, R.M. (2008). Worldwide human relationships inferred from genomewide patterns of variation. Science 319, 1100-1104. https://doi.org/10.1126/science.1153717
  15. Lin, J. and Nei, M. (1991). Relative efficiencies of the maximum- parsimony and distance-matrix methods of phylogeny construction for restriction data. Mol. Biol. Evol. 8, 356-365.
  16. Liu, K. and Muse, S.V. (2005). PowerMarker: an integrated analysis environment for genetic marker analysis. Bioinformatics 21, 2128-2129. https://doi.org/10.1093/bioinformatics/bti282
  17. Matsuzaki, H., Dong, S., Loi, H., Di, X., Liu, G., Hubbell, E., Law, J., Berntsen, T., Chadha, M., Hui, H., Yang, G., Kennedy, G.C., Webster, T.A., Cawley, S., Walsh, P.S., Jones, K.W., Fodor, S.P., and Mei, R. (2004). Genotyping over 100,000 SNPs on a pair of oligonucleotide arrays. Nat. Methods 1, 109-111. https://doi.org/10.1038/nmeth718
  18. Mountain, J.L. and Cavalli-Sforza, L.L. (1997). Multilocus genotypes, a tree of individuals, and human evolutionary history. Am. J. Hum. Genet. 61, 705-718. https://doi.org/10.1086/515510
  19. Nei, M. (1978a). Estimation of average heterozygosity and genetic distance from a small number of individuals. Genetics 89, 583-590.
  20. Nei, M. (1978b). The theory of genetic distance and evolution of human races. Jinrui Idengaku Zasshi. 23, 341-369. https://doi.org/10.1007/BF01908190
  21. Nei, M. and Roychoudhury, A.K. (1974). Sampling variances of heterozygosity and genetic distance. Genetics 76, 379-390.
  22. Nei, M., Tajima, F. and Tateno, Y. (1983). Accuracy of Estimated Phylogenetic Trees from Molecular-Data.2. Gene-Frequency Data. J. Mol. Evol. 19, 153-170. https://doi.org/10.1007/BF02300753
  23. Nicolae, D.L., Wen, X., Voight, B.F. and Cox, N.J. (2006). Coverage and characteristics of the Affymetrix GeneChip Human Mapping 100K SNP set. PLoS Genet. 2, e67. https://doi.org/10.1371/journal.pgen.0020067
  24. Saitou, N. and Nei, M. (1987). The Neighbor-Joining Method - a New Method for Reconstructing Phylogenetic Trees. Mol. Biol. Evol. 4, 406-425.
  25. Takezaki, N. and Nei, M. (1996). Genetic distances and reconstruction of phylogenetic trees from microsatellite DNA. Genetics 144, 389-399.
  26. Tateno, Y., Takezaki, N. and Nei, M. (1994). Relative efficiencies of the maximum-likelihood, neighbor-joining, and maximum-parsimony methods when substitution rate varies with site. Mol. Biol. Evol. 11, 261-277.
  27. Torroni, A., Achilli, A., Macaulay, V., Richards, M. and Bandelt, H.J. (2006). Harvesting the fruit of the human mtDNA tree. Trends Genet. 22, 339-345. https://doi.org/10.1016/j.tig.2006.04.001
  28. Travis, J. (2009). Forensic science. Scientists decry isotope, DNA testing of 'nationality'. Science 326, 30-31. https://doi.org/10.1126/science.326_30
  29. Wang, D.G., Fan, J.B., Siao, C.J., Berno, A., Young, P., Sapolsky, R., Ghandour, G., Perkins, N., Winchester, E., Spencer, J., Kruglyak, L., Stein, L., Hsie, L., Topaloglou, T., Hubbell, E., Robinson, E., Mittmann, M., Morris, M.S., Shen, N., Kilburn, D., Rioux, J., Nusbaum, C., Rozen, S., Hudson, T.J., Lipshutz, R., Chee, M. and Lander, E.S. (1998). Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. Science 280, 1077-1082. https://doi.org/10.1126/science.280.5366.1077
  30. Youn, C.H., Shim, E.B., Lim, S., Cho, Y.M., Hong, H.K., Choi, Y.S., Park, H.D. and Lee, H.K. (2011). A cooperative metabolic syndrome estimation with high precision sensing unit. IEEE Trans. Biomed. Eng. 58, 809-813. https://doi.org/10.1109/TBME.2010.2088397
  31. Zharkikh, A. and Li, W.H. (1992a). Statistical properties of bootstrap estimation of phylogenetic variability from nucleotide sequences. I. Four taxa with a molecular clock. Mol. Biol. Evol. 9, 1119-1147.
  32. Zharkikh, A. and Li, W.H. (1992b). Statistical properties of bootstrap estimation of phylogenetic variability from nucleotide sequences: II. Four taxa without a molecular clock. J. Mol. Evol. 35, 356-366. https://doi.org/10.1007/BF00161173