Browse > Article
http://dx.doi.org/10.5808/GI.2009.7.2.136

A Scheme for Filtering SNPs Imputed in 8,842 Korean Individuals Based on the International HapMap Project Data  

Lee, Ki-Chan (Department of Bioinformatics & Life Science, Soongsil University)
Kim, Sang-Soo (Department of Bioinformatics & Life Science, Soongsil University)
Abstract
Genome-wide association (GWA) studies may benefit from the inclusion of imputed SNPs into their dataset. Due to its predictive nature, the imputation process is typically not perfect. Thus, it would be desirable to develop a scheme for filtering out the imputed SNPs by maximizing the concordance with the observed genotypes. We report such a scheme, which is based on the combination of several parameters that are calculated by PLINK, a popular GWA analysis software program. We imputed the genotypes of 8,842 Korean individuals, based on approximately 2 million SNP genotypes of the CHB+JPT panel in the International HapMap Project Phase II data, complementing the 352k SNPs in the original Affymetrix 5.0 dataset. A total of 333,418 SNPs were found in both datasets, with a median concordance rate of 98.7%. The concordance rates were calculated at different ranges of parameters, such as the number of proxy SNPs (NPRX), the fraction of successfully imputed individuals (IMPUTED), and the information content (INFO). The poor concordance that was observed at the lower values of the parameters allowed us to develop an optimal combination of the cutoffs (IMPUTED${\geq}$0.9 and INFO${\geq}$0.9). A total of 1,026,596 SNPs passed the cutoff, of which 94,364 were found in both datasets and had 99.4% median concordance. This study illustrates a conservative scheme for filtering imputed SNPs that would be useful in GWA studies.
Keywords
genome-wide association; HapMap; PLINK; SNP imputation;
Citations & Related Records
연도 인용수 순위
  • Reference
1 de Bakker, P.I.W., Ferreira, M.A.R., Xioming, J., Neale, B.M., Raychaudhuri, S., and Voicht, B.F. (2008). Practical aspects of imputation-driven meta-analysis of genome-wide association studies. Hum. Mol. Genet. 17, R122-128   DOI   PUBMED   ScienceOn
2 Ahn, S.M., Kim, T.H., Lee, S., et al. (2009). The first Korean genome sequence and analysis: full genome sequencing for a socio-ethnic group. Genome Res. published in advance   DOI   ScienceOn
3 Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A., Bender, D., Maller, J., Sklar, P., de Bakker, P.I.W., Daly, M.J., et al. (2007). PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559-575   DOI   ScienceOn
4 The International HapMap Consortium. (2003). The International HapMap Project. Nature 426, 789-796   DOI   PUBMED   ScienceOn
5 Thorisson, G.A., Smith, A.V., Krishnan, L., and Stein, L.D. (2005). The International HapMap Project Web site. Genome Res. 15, 1591-1593   DOI   ScienceOn
6 Cho, Y.S., Go, M.J., Kim, Y.J., et al. (2009). A large-scale genome-wide association study of Asian populations uncover genetic factors influencing eight quantitative traits. Nat. Genet. 41, 527-534   DOI   ScienceOn
7 The International HapMap Consortium. (2005). A Haplotype Map of the Human Genome. Nature 437, 1299-1320   DOI   ScienceOn
8 Marchini, J., Howie, B., Myers, S., McVean, G., and Donnelly, P. (2007). A new multipoint method for genome-wide association studies by imputation of genotypes. Nat. Genet. 39, 906-913   DOI   ScienceOn
9 Xiong, M., and Jin, L. (2007). Association Studies of Complex Diseases. In Bioinformatics - From Genomes to Therapies Vol. 3, T. Lengauer, ed. (Wiley-VCH, Germany), pp.1375-1426