Browse > Article
http://dx.doi.org/10.5808/GI.2011.9.4.181

How Many SNPs Should Be Used for the Human Phylogeny of Highly Related Ethnicities? A Case of Pan Asian 63 Ethnicities  

Ghang, Ho-Young (Department of Electrical Engineering, Korea Advanced Institute of Science and Technology)
Han, Young-Joo (Department of Electrical Engineering, Korea Advanced Institute of Science and Technology)
Jeong, Sang-Jin (Department of Electrical Engineering, Korea Advanced Institute of Science and Technology)
Bhak, Jong (Theragen Bio Institute, Theragen Etex Co. Ltd.)
Lee, Sung-Hoon (Theragen Bio Institute, Theragen Etex Co. Ltd.)
Kim, Tae-Hyung (Theragen Bio Institute, Theragen Etex Co. Ltd.)
Kim, Chul-Hong (Theragen Bio Institute, Theragen Etex Co. Ltd.)
Kim, Sang-Soo (Department of Bioinformatics & Life Sciences, Soongsil University)
Al-Mulla, Fahd (Department of Pathology, University of Kuwait)
Youn, Chan-Hyun (Department of Electrical Engineering, Korea Advanced Institute of Science and Technology)
Yoo, Hyang-Sook (Korea Research Institute of Bioscience and Biotechnology (KRIBB))
The HUGO Pan-Asian SNP Consortium, The HUGO Pan-Asian SNP Consortium (The HUGO Pan-Asian SNP Consortium)
Abstract
In planning a model-based phylogenic study for highly related ethnic data, the SNP marker number is an important factor to determine for relationship inferences. Genotype frequency data, utilizing a sub sampling method, from 63 Pan Asian ethnic groups was used for determining the minimum SNP number required to establish such relationships. Bootstrap random sub-samplings were done from 5.6K PASNPi SNP data. DA distance was calculated and neighbour-joining trees were drawn with every re-sampling data set. Consensus trees were made with the same 100 sub-samples and bootstrap proportions were calculated. The tree consistency to the one obtained from the whole marker set, improved with increasing marker numbers. The bootstrap proportions became reliable when more than 7,000 SNPs were used at a time. Within highly related ethnic groups, the minimum SNPs number for a robust neighbor-joining tree inference was about 7,000 for a 95% bootstrap support.
Keywords
neighbour-joining; phylogeny; minimum SNP; ethnic group;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Takezaki, N. and Nei, M. (1996). Genetic distances and reconstruction of phylogenetic trees from microsatellite DNA. Genetics 144, 389-399.
2 Tateno, Y., Takezaki, N. and Nei, M. (1994). Relative efficiencies of the maximum-likelihood, neighbor-joining, and maximum-parsimony methods when substitution rate varies with site. Mol. Biol. Evol. 11, 261-277.
3 Torroni, A., Achilli, A., Macaulay, V., Richards, M. and Bandelt, H.J. (2006). Harvesting the fruit of the human mtDNA tree. Trends Genet. 22, 339-345.   DOI
4 Travis, J. (2009). Forensic science. Scientists decry isotope, DNA testing of 'nationality'. Science 326, 30-31.   DOI
5 Wang, D.G., Fan, J.B., Siao, C.J., Berno, A., Young, P., Sapolsky, R., Ghandour, G., Perkins, N., Winchester, E., Spencer, J., Kruglyak, L., Stein, L., Hsie, L., Topaloglou, T., Hubbell, E., Robinson, E., Mittmann, M., Morris, M.S., Shen, N., Kilburn, D., Rioux, J., Nusbaum, C., Rozen, S., Hudson, T.J., Lipshutz, R., Chee, M. and Lander, E.S. (1998). Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. Science 280, 1077-1082.   DOI
6 Youn, C.H., Shim, E.B., Lim, S., Cho, Y.M., Hong, H.K., Choi, Y.S., Park, H.D. and Lee, H.K. (2011). A cooperative metabolic syndrome estimation with high precision sensing unit. IEEE Trans. Biomed. Eng. 58, 809-813.   DOI
7 Zharkikh, A. and Li, W.H. (1992a). Statistical properties of bootstrap estimation of phylogenetic variability from nucleotide sequences. I. Four taxa with a molecular clock. Mol. Biol. Evol. 9, 1119-1147.
8 Zharkikh, A. and Li, W.H. (1992b). Statistical properties of bootstrap estimation of phylogenetic variability from nucleotide sequences: II. Four taxa without a molecular clock. J. Mol. Evol. 35, 356-366.   DOI
9 Li, J.Z., Absher, D.M., Tang, H., Southwick, A.M., Casto, A.M., Ramachandran, S., Cann, H.M., Barsh, G.S., Feldman, M., Cavalli-Sforza, L.L., and Myers, R.M. (2008). Worldwide human relationships inferred from genomewide patterns of variation. Science 319, 1100-1104.   DOI   ScienceOn
10 Lin, J. and Nei, M. (1991). Relative efficiencies of the maximum- parsimony and distance-matrix methods of phylogeny construction for restriction data. Mol. Biol. Evol. 8, 356-365.
11 Liu, K. and Muse, S.V. (2005). PowerMarker: an integrated analysis environment for genetic marker analysis. Bioinformatics 21, 2128-2129.   DOI
12 Matsuzaki, H., Dong, S., Loi, H., Di, X., Liu, G., Hubbell, E., Law, J., Berntsen, T., Chadha, M., Hui, H., Yang, G., Kennedy, G.C., Webster, T.A., Cawley, S., Walsh, P.S., Jones, K.W., Fodor, S.P., and Mei, R. (2004). Genotyping over 100,000 SNPs on a pair of oligonucleotide arrays. Nat. Methods 1, 109-111.   DOI
13 Mountain, J.L. and Cavalli-Sforza, L.L. (1997). Multilocus genotypes, a tree of individuals, and human evolutionary history. Am. J. Hum. Genet. 61, 705-718.   DOI
14 Nei, M. (1978a). Estimation of average heterozygosity and genetic distance from a small number of individuals. Genetics 89, 583-590.
15 Nei, M. (1978b). The theory of genetic distance and evolution of human races. Jinrui Idengaku Zasshi. 23, 341-369.   DOI
16 Nei, M. and Roychoudhury, A.K. (1974). Sampling variances of heterozygosity and genetic distance. Genetics 76, 379-390.
17 Nei, M., Tajima, F. and Tateno, Y. (1983). Accuracy of Estimated Phylogenetic Trees from Molecular-Data.2. Gene-Frequency Data. J. Mol. Evol. 19, 153-170.   DOI
18 Nicolae, D.L., Wen, X., Voight, B.F. and Cox, N.J. (2006). Coverage and characteristics of the Affymetrix GeneChip Human Mapping 100K SNP set. PLoS Genet. 2, e67.   DOI
19 Hinch, A.G., Tandon, A., Patterson, N., Song, Y., Rohland, N., Palmer, C.D., Chen, G.K., Wang, K., Buxbaum, S.G., Akylbekova, E.L., Aldrich, M.C., Ambrosone, C.B., Amos, C., Bandera, E.V., Berndt, S.I., Bernstein, L., Blot, W.J., Bock, C.H., Boerwinkle, E., Cai, Q., Caporaso, N., Casey, G., Cupples, L.A., Deming, S.L., Diver, W.R., Divers, J., Fornage, M., Gillanders, E.M., Glessner, J., Harris, C.C., Hu, J.J., Ingles, S.A., Isaacs, W., John, E.M., Kao, W.H., Keating, B., Kittles, R.A., Kolonel, L.N., Larkin, E., Le Marchand, L., McNeill, L.H., Millikan, R.C., Murphy, A., Musani, S., Neslund-Dudas, C., Nyante, S., Papanicolaou, G.J., Press, M.F., Psaty, B.M., Reiner, A.P., Rich, S.S., Rodriguez-Gil, J.L., Rotter, J.I., Rybicki, B.A., Schwartz, A.G., Signorello, L.B., Spitz, M., Strom, S.S., Thun, M.J., Tucker, M.A., Wang, Z., Wiencke, J.K., Witte, J.S., Wrensch, M., Wu, X., Yamamura, Y., Zanetti, K.A., Zheng, W., Ziegler, R.G., Zhu, X., Redline, S., Hirschhorn, J.N., Henderson, B.E., Taylor, H.A., Jr., Price, A.L., Hakonarson, H., Chanock, S.J., Haiman, C.A., Wilson, J.G., Reich, D., and Myers, S.R. (2011). The landscape of recombination in African Americans. Nature 476, 170-175.   DOI
20 Glover, K.A., Hansen, M.M., Lien, S., Als, T.D., Hoyheim, B., and Skaala, O. (2010). A comparison of SNP and STR loci for delineating population structure and performing individual genetic assignment. BMC Genet . 11, 2.
21 Jung, J., Kang, H., Cho, Y.S., Oh, J.H., Ryu, M.H., Chung, H.W., Seo, J.S., Lee, J.E., Oh, B., Bhak, J., and Kim, H.L. (2010). Gene Flow between the Korean Peninsula and Its Neighboring Countries. PLoS One 5, e11855.   DOI   ScienceOn
22 Karafet, T.M., Mendez, F.L., Meilerman, M.B., Underhill, P.A., Zegura, S.L., and Hammer, M.F. (2008). New binary polymorphisms reshape and increase resolution of the human Y chromosomal haplogroup tree. Genome Res. 18, 830-838.   DOI   ScienceOn
23 Lecointre, G., Philippe, H., Van Le, H.L., and Le Guyader, H. (1994). How many nucleotides are required to resolve a phylogenetic problem? The use of a new statistical method applicable to available sequences. Mol. Phylogenet. Evol. 3, 292-309.   DOI
24 Li, D., Sun, Y., Lu, Y., Mustavich, L.F., Ou, C., Zhou, Z., Li, S., Jin, L., and Li, H. (2010). Genetic origin of Kadai-speaking Gelong people on Hainan island viewed from Y chromosomes. J. Hum. Genet. 55, 462-468.   DOI
25 Collins, F.S., Guyer, M.S., and Charkravarti, A. (1997). Variations on a theme: cataloging human DNA sequence variation. Science 278, 1580-1581.   DOI
26 Abdulla, M.A., Ahmed, I., Assawamakin, A., Bhak, J., Brahmachari, S.K., Calacal, G.C., Chaurasia, A., Chen, C.H., Chen, J., Chen, Y.T., Chu, J., Cutiongco-de la Paz, E.M., De Ungria, M.C., Delfin, F.C., Edo, J., Fuchareon, S., Ghang, H., Gojobori, T., Han, J., Ho, S.F., Hoh, B.P., Huang, W., Inoko, H., Jha, P., Jinam, T.A., Jin, L., Jung, J., Kangwanpong, D., Kampuansai, J., Kennedy, G.C., Khurana, P., Kim, H.L., Kim, K., Kim, S., Kim, W.Y., Kimm, K., Kimura, R., Koike, T., Kulawonganunchai, S., Kumar, V., Lai, P.S., Lee, J.Y., Lee, S., Liu, E.T., Majumder, P.P., Mandapati, K.K., Marzuki, S., Mitchell, W., Mukerji, M., Naritomi, K., Ngamphiw, C., Niikawa, N., Nishida, N., Oh, B., Oh, S., Ohashi, J., Oka, A., Ong, R., Padilla, C.D., Palittapongarnpim, P., Perdigon, H.B., Phipps, M.E., Png, E., Sakaki, Y., Salvador, J.M., Sandraling, Y., Scaria, V., Seielstad, M., Sidek, M.R., Sinha, A., Srikummool, M., Sudoyo, H., Sugano, S., Suryadi, H., Suzuki, Y., Tabbada, K.A., Tan, A., Tokunaga, K., Tongsima, S., Villamor, L.P., Wang, E., Wang, Y., Wang, H., Wu, J.Y., Xiao, H., Xu, S., Yang, J.O., Shugart, Y.Y., Yoo, H.S., Yuan, W., Zhao, G., and Zilfalil, B.A. (2009). Mapping human genetic diversity in Asia. Science 326, 1541-1545.   DOI   ScienceOn
27 Agrawal, S. and Khan, F. (2005). Reconstructing recent human phylogenies with forensic STR loci: a statistical approach. BMC Genet. 6, 47.   DOI
28 Cavalli-Sforza, L.L. and Feldman, M.W. (2003). The application of molecular genetic approaches to the study of human evolution. Nat. Genet. 33 Suppl, 266-275.   DOI   ScienceOn
29 Felsenstein, J. (1988). Phylogenies from molecular sequences: inference and reliability. Annu. Rev. Genet. 22, 521-565.   DOI
30 Felsenstein, J. (1989). PHYLIP-Phylogeny Inference Package (Version 3.2). Cladistics 5, 164-166.
31 Gabriel, S.B., Schaffner, S.F., Nguyen, H., Moore, J.M., Roy, J., Blumenstiel, B., Higgins, J., DeFelice, M., Lochner, A., Faggart, M., Liu-Cordero, S.N., Rotimi, C., Adeyemo, A., Cooper, R., Ward, R., Lander, E.S., Daly, M.J., and Altshuler, D. (2002). The structure of haplotype blocks in the human genome. Science 296, 2225-2229.   DOI   ScienceOn
32 Saitou, N. and Nei, M. (1987). The Neighbor-Joining Method - a New Method for Reconstructing Phylogenetic Trees. Mol. Biol. Evol. 4, 406-425.