DOI QR코드

DOI QR Code

Adjusting sampling bias in case-control genetic association studies

  • Seo, Geum Chu (Department of Statistics, Seoul National University) ;
  • Park, Taesung (Department of Statistics and Interdisciplinary Program in Bioinformatics, Seoul National University)
  • Received : 2014.06.30
  • Accepted : 2014.08.04
  • Published : 2014.09.30

Abstract

Genome-wide association studies (GWAS) are designed to discover genetic variants such as single nucleotide polymorphisms (SNPs) that are associated with human complex traits. Although there is an increasing interest in the application of GWAS methodologies to population-based cohorts, many published GWAS have adopted a case-control design, which raise an issue related to a sampling bias of both case and control samples. Because of unequal selection probabilities between cases and controls, the samples are not representative of the population that they are purported to represent. Therefore, non-random sampling in case-control study can potentially lead to inconsistent and biased estimates of SNP-trait associations. In this paper, we proposed inverse-probability of sampling weights based on disease prevalence to eliminate a case-control sampling bias in estimation and testing for association between SNPs and quantitative traits. We apply the proposed method to a data from the Korea Association Resource project and show that the standard estimators applied to the weighted data yield unbiased estimates.

Keywords

References

  1. Cho, Y. S., Go M. J., Kim Y. J., Heo J. Y., Oh J. H., Ban, H., Yoon D., Lee, M. H., et al. (2009). A large-scale genome-wide association study of Asian populations uncovers genetic factors influencing eight quantitative traits. Nature Genetics, 41, 527-534. https://doi.org/10.1038/ng.357
  2. Hernan, M. A. and Robins J. M. (2006). Estimating causal effects from epidemiological data. Journal of Epidemiology and Community Health, 60, 578-586. https://doi.org/10.1136/jech.2004.029496
  3. Hunter, D. J., Kraft, P., Jacobs, K. B., Cox, D. G., Yeager, M., Hankinson, S. E., Wacholder, S., Wang, Z., et al. (2007). A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nature Genetics, 39, 870-874. https://doi.org/10.1038/ng2075
  4. Manolio, T. A., Collins, F. S., Cox, N. J., Goldstein, D. B., Hindorff, L. A., Hunter, D. J., McCarthy, M.I., Ramos, E. M., et al. (2009). Finding the missing heritability of complex diseases. Nature Genetics, 461, 747-753.
  5. Scott, L. J., Mohlke, K. L., Bonnycastle, L. L., Willer, C. J., Li, Y., Duren, W. L., Erdos, M. R., Stringham, H. M., et al. (2007). A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science, 316, 1341-1345. https://doi.org/10.1126/science.1142382
  6. Thomas, G., Jacobs, K. B., Yeager, M., Kraft, P., Wacholder, S., Orr, N., Yu, K., Chatterjee, N., et al. (2008). Multiple loci identified in a genome-wide association study of prostate cancer. Nature Genetics, 40, 310-315. https://doi.org/10.1038/ng.91
  7. VanderWeele, T. J. and Vansteelandt, S. (2010). Odds ratios for mediation analysis for a dichotomous outcome. American Journal of Epidemiology, 172, 1339-1348. https://doi.org/10.1093/aje/kwq332