Browse > Article
http://dx.doi.org/10.29220/CSAM.2019.26.1.001

Evaluation of the classification method using ancestry SNP markers for ethnic group  

Lee, Hyo Jung (Product Development HQ)
Hong, Sun Pyo (Research & Development Center, GeneMatrix Inc.)
Lee, Soong Deok (Department of Forensic Medicine, Seoul National University, College of Medicine)
Rhee, Hwan seok (Bioinformatics Research Center, Macrogen Inc.)
Lee, Ji Hyun (Research & Development Center, GeneMatrix Inc.)
Jeong, Su Jin (Department of Statistics, Korea University)
Lee, Jae Won (Department of Statistics, Korea University)
Publication Information
Communications for Statistical Applications and Methods / v.26, no.1, 2019 , pp. 1-9 More about this Journal
Abstract
Various probabilistic methods have been proposed for using interpopulation allele frequency differences to infer the ethnic group of a DNA specimen. The selection of the statistical method is critical because the accuracy of the statistical classification results vary. For the ancestry classification, we proposed a new ancestry evaluation method that estimate the combined ethnicity index as well as compared its performance with various classical classification methods using two real data sets. We selected 13 SNPs that are useful for the inference of ethnic origin. These single nucleotide polymorphisms (SNPs) were analyzed by restriction fragment mass polymorphism assay and followed by classification among ethnic groups. We genotyped 400 individuals from four ethnic groups (100 African-American, 100 Caucasian, 100 Korean, and 100 Mexican-American) for 13 SNPs and allele frequencies that differed among the four ethnic groups. Additionally, we applied our new method to HapMap SNP genotypes for 1,011 samples from 4 populations (African, European, East Asian, and Central-South Asian). Our proposed method yielded the highest accuracy among statistical classification methods. Our ethnic group classification system based on the analysis of ancestry informative SNP markers can provide a useful statistical tool to identify ethnic groups.
Keywords
single nucleotide polymorphisms (SNP); allele; ethnic group; classification; Korean population;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Pastinen T and Hudson TJ (2004). Cis-acting regulatory variation in the human genome, Science, 306, 647-650.   DOI
2 Phillips C, Freire AA, Kriegel AK, et al. (2013). Eurasiaplex: a forensic SNP assay for differentiating European and South Asian ancestries, Forensic Science International Genetics, 7, 359-366.   DOI
3 Porras-Hurtado L, Ruiz Y, Santos C, Phillips C, Carracedo A, and Lareu MV (2013). An overview of STRUCTURE: applications, parameter settings, and supporting software, Frontiers in Genetics, 29, 1-13.
4 Pritchard JK, Stephens M, and Donnelly P (2000). Inference of population structure using multilocus genotype data. Genetics, 155, 945-959.   DOI
5 Rosenberg N, Murata M, Ikeda Y, Opare-Sem O, Zivelin A, Geffen E, and Seligsohn U (2002). The frequent 5,10-methylenetetrahydrofolate reductase C677T polymorphism is associated with a common haplotype in whites, Japanese, and Africans, American Journal of Human Genetics, 70, 758-762.   DOI
6 Altman NS (1992). An introduction to kernel and nearest-neighbor nonparametric regression, The American Statistician, 46, 175-185.   DOI
7 Bickel PJ and Levina E (2004). Some theory for Fisher's linear discriminant function, 'naive Bayes', and some alternatives when there are many more variables than observations, Bernoulli, 10, 989-1010.   DOI
8 Botto LD and Yang Q (2000). 5,10-Methylenetetrahydrofolate reductase gene variants and congenital anomalies: a HuGE review, American Journal of Epidemiology, 151, 862-877.   DOI
9 Bray MS, Boerwinkle E, and Doris PA (2001). High-throughput multiplex SNP genotyping with MALDI-TOF mass spectrometry: practice, problems and promise, Human Mutation, 17, 296-304.   DOI
10 Breiman L (2001). Random forests, Machine Learning, 45, 5-32.   DOI
11 Brenner CH (1998). Difficulties in the estimation of ethnic affiliation, American Journal of Human Genetics, 62, 1558-1560.   DOI
12 Butler JM (2009). Fundamentals of Forensic DNA Typing, Elsevier Science, Burlington.
13 Schafer AJ and Hawkins JR (1998). DNA variation and the future of human genetics, Nature Biotechnology, 16, 33-39.   DOI
14 Shriver MD, SmithMW, Jin L, Marcini A, Akey JM, Deka R, and Ferrell RE (1997). Ethnic-affiliation estimation by use of population-specific DNA markers, American Journal of Human Genetics, 60, 957-964.
15 Taillon-Miller P, Piernot EE, and Kwok PY (1999). Efficient approach to unique single-nucleotide polymorphism discovery, Genome Research, 9, 499-505.
16 Tusher VG, Tibshirani R, and Chu G (2001). Significance analysis of microarrays applied to the ionizing radiation response. In Proceedings of the National Academy of Sciences of the United States of America, 98, 5116-5121.   DOI
17 Vapnik VN (2000). The Nature of Statistical Learning Theory (2nd ed), Springer, New York.
18 Breiman L (1984). Classification and Regression Trees, Wadsworth International Group, California.
19 Dudoit S, Fridlyand J, and Speed TP (2002). Comparison of discrimination methods for the classification of tumors using gene expression data, Journal of the American Statistical Association, 97, 77-87.   DOI
20 Duffy DL, Montgomery GW, Chen W, et al. (2007). A three-single-nucleotide polymorphism haplotype in intron 1 of OCA2 explains most human eye-color variation, American Journal of Human Genetics, 80, 241-252.   DOI
21 Evett IW, Pinchin R, and Buffery C (1992). An investigation of the feasibility of inferring ethnic origin from DNA profiles, Journal of the Forensic Science Society, 32, 301-306.   DOI
22 Fisher RA (1936). The use of multiple measurements in taxonomic problems, Annals of Human Genetics, 7, 179-188.
23 Frudakis T, Venkateswarlu K, Thomas MJ, et al. (2003). A classifier for the SNP-based inference of ancestry, Journal of Forensic Science, 48, 771-782.
24 Graf J, Hodgson R, and van Daal A (2005). Single nucleotide polymorphisms in the MATP gene are associated with normal human pigmentation variation, Human Mutation, 25, 278-284.   DOI
25 Graf J, Voisey J, Hughes I, and van Daal A (2007). Promoter polymorphisms in theMATP (SLC45A2) gene are associated with normal human skin color variation, Human Mutation, 28, 710-717.   DOI
26 Hong SP, Ji SI, Rhee H, et al. (2008). A simple and accurate SNP scoring strategy based on typeIIS restriction endonuclease cleavage and matrix-assisted laser desorption/ionization mass spectrometry, BMC Genomics, 9, 276.   DOI
27 Mountain JL, Knight A, Jobin M, Gignoux C, Miller A, Lin AA, and Underhill PA (2002). SNPSTRs: empirically derived, rapidly typed, autosomal haplotypes for inference of population history and mutational processes, Genome Research, 12, 1766-1772.   DOI
28 Hwang SH, Oh HB, Choi SE, Hong SP, and Yoo W (2007). Effective screening of informative single nucleotide polymorphisms using the novel method of restriction fragment mass polymorphism, The Journal of International Medical Research, 35, 827-835.   DOI
29 Koda Y, Tachida H, Pang M, Liu Y, Soejima M, Ghaderi AA, Takenaka O, and Kimura H (2001). Contrasting patterns of polymorphisms at the ABO-secretor gene (FUT2) and plasma ${\alpha}$(1, 3) fucosyltransferase gene (FUT6) in human populations, Genetics, 158, 747-756.   DOI
30 Lowe AL, Urquhart A, Foreman LA, and Evett IW (2001). Inferring ethnic origin by means of an STR profile, Forensic Science International, 119, 17-22.   DOI
31 Nguyen DV and Rocke DM (2004). On partial least squares dimension reduction for microarray-based classification: a simulation study, Computational Statistics & Data Analysis, 46, 407-425.   DOI