Sample Size and Power Estimation in Case-Control Genetic Association Studies

  • Ahn Chul (Department of Medicine, University of Texas Medical School)
  • Published : 2006.06.01

Abstract

In planning a genetic association study, it is necessary to determine the number of samples to be collected for the study in order to achieve sufficient power to detect the hypothesized effect. The case-control design is increasingly used for genetic association studies due to the simplicity of its design. We review the methods for the sample size and power calculations in case-control genetic association studies between a marker locus and a disease phenotype.

Keywords

References

  1. Afshar-Kharghan, V., Aleksic, N., Ahn, C. Boerwinkle, E., Wu, K., and Lopez, J. (2004). The variable number of tandem repeat polymorphism of platelet glycoprotein Ib ${\alpha}$ and risk of coronary heart disease. Blood 103, 963-965 https://doi.org/10.1182/blood-2003-05-1502
  2. Ahn, C. and Jung, S. (2003). Efficiency of GEE estimators of slopes in repeated measurements: Adding subjects or adding measurements? Drug Information Journal 37, 309-316 https://doi.org/10.1177/009286150303700306
  3. Ahn, C. and Jung, S. (2005). Effect of dropout on sample size estimates on trends across repeated measurements. J. of Biopharmaceutical Statistics 15, 33-41
  4. American Statistical Association. (1999). Ethical guidelines for statistical practice: Executive Summary. Amstat News April, 12-15
  5. American Urological Association. (2000). Prostate-specific antigen (PSA) best practice policy. Oncology 14, 267-280
  6. Apple, R., Erlich, H., Klitz, W., Manos, M., Becker, T., and Wheeler, C. (1994). HLA DR-DQ associations with cervical carcinoma show papillomavirus-type specificity. Nature Genetics 6, 157-162 https://doi.org/10.1038/ng0294-157
  7. Armitage, P. (1955). Tests for linear trends in proportions and frequencies. Biometrics 11, 375-386 https://doi.org/10.2307/3001775
  8. Borenstein, M., Rothstein, H., and Cohen, J. (2001). Power and Precision Biostat, Inc. Englewood, NJ
  9. Brawer, M.K. (1999). Prostate-specific antigen: current status. CA Cancer J. Clin. 49, 264-281 https://doi.org/10.3322/canjclin.49.5.264
  10. Carter, H.B., Pearson, J.D., Metter, E.J., Brant, L.J., Chan, D.W., Andres, R., Fozard, J.L., and Walsh, P.C. (1992). Longitudinal evaluation of prostate-specific antigen levels in men with and without prostate disease. JAMA 267, 2215-2220 https://doi.org/10.1001/jama.267.16.2215
  11. Catalona, W.J., Smith, D.S., Ratliff, T.L., Dodds, K.M., Coplen, D.E., Yuan, J.J., Petros, J.A., and Andriole, G.L. (1991). Measurement of prostate-specific antigen in serum as a screening test for prostate cancer. N. Engl. J. Med. 324, 1156-1161 https://doi.org/10.1056/NEJM199104253241702
  12. Cox, D. and Hinkley, D. (1974). Theoretical Statistics. (Boca Raton: Chapman and Hall/CRC Press)
  13. Cousin, E., Genin, E., Mace, S., Ricard, S., Chansac, C., del Zompo, M., and Deleuze, J. (2003). Association studies in candidate genes: strategies to select SNPs to be tested. Human Heredity 56, 151-159 https://doi.org/10.1159/000073200
  14. Douglas, J., Skol, A., and Boehnke, M. (2002). Probability of detection of genotyping errors and mutations as inheritance inconsistencies in nuclear-family data. American Journal of Human Genetics 70, 487-495 https://doi.org/10.1086/338919
  15. Edwards, B., Haynes, C., Levenstien, M., Finch, S., and Gordon, G. (2005). Power and sample size calculations in the presence of phenotype errors for case/control genetic association studies. BMC Genetics 2005, 6, 18 https://doi.org/10.1186/1471-2156-6-18
  16. Elashoff, J. (2005). nQuery $Advisor^{(R}}$ Version 6.0 User's Guide. (Los Angeles, CA)
  17. Freidlin, B., Zheng, G., Li, Z., and Gastwirth, J. (2002). Trend tests for case-control studies of genetic markers: power, sample size and robustness. Human Heredity 53, 146-152 https://doi.org/10.1159/000064976
  18. Gordon, D., Heath, S.C., Liu, X., and Ott, J. (2001). A transmission disequilibrium that allows for genotyping errors in the analysis of single nucleotide polymorphism data. American Journal of Human Genetics 69, 371-380 https://doi.org/10.1086/321981
  19. Gordon, D., Finch, S., Nothnagel, M., and Ott, J. (2002). Power and sample size calculation for case-control genetic association tests when errors are present: Application to single nucleotide polymorphisms. Human Heredity 54, 22-33 https://doi.org/10.1159/000066696
  20. Gordon, D., Levenstien, M., Finch, S., and Ott, J. (2003). Errors and linkage disequilibrium interact multiplicatively when computing sample sizes for genetic case-control association studies. Pacific Symposium on Biocomputing 490-501
  21. Gordon, D., Haynes, C., Blumenfeld, J., and Finch, S. (2005). "PAWE-3D: visualizing Power for Association With Error in case/control genetic studies of complex traits", Bioinformatics 21, 3935-3937 https://doi.org/10.1093/bioinformatics/bti643
  22. Guenther, W. (1977). Power and sample size for approximate chi-square tests. Am Stat 31, 83-85 https://doi.org/10.2307/2683047
  23. Ji, F., Yang, Y., Haynes, C., Finch, S., and Gordon, D. (in press) Computing asymptotic power and sample size for case-control genetic association studies in the presence of phenotype and/or genotype misclassification errors. Statistical Applications in Genetics and Molecular Biology
  24. Jung, S. and Ahn, C. (2003). Sample size estimation for GEE method for comparing slopes in repeated measurements data. Statistics in Medicine 22, 1305-1315 https://doi.org/10.1002/sim.1384
  25. Jung, S. and Ahn, C. (2005). Sample size for repeated binary measurements using GEE. Statistics in Medicine 24, 2583-2596 https://doi.org/10.1002/sim.2136
  26. Kang, S., Gordon, D., and Finch, S. (2004). What SNP genotyping errors are most costly for genetic association studies? Genetic Epidemiology 26, 132-141 https://doi.org/10.1002/gepi.10301
  27. Kang, S., Finch, S., Haynes, C., and Gordon, D. (2004). Quantifying the percent increase in minimum sample size for SNP genotyping errors in genetic model-based association studies. Human Heredity 58, 139-144 https://doi.org/10.1159/000083540
  28. Kang, S., Shin, D., Oh, M., and Ahn, C. (2004). An investigation on the allelic chi-square test used in genetic association studies. Biometrical Journal 46, 699-706 https://doi.org/10.1002/bimj.200410063
  29. Lachin, J. (1977). Sample size determination for comparative trials. Biometrics 33, 315-324 https://doi.org/10.2307/2529781
  30. Laird, N., Horvath, S., and Xu, X. (2000). Implementing a unified approach to family based tests of association. Genetic Epi. 19, S36-S42 https://doi.org/10.1002/1098-2272(2000)19:1+<::AID-GEPI6>3.0.CO;2-M
  31. Lalouel, J. and Rohrwasser, A. (2002). Power and replication in case-control studies. American Journal of Hypertension 15, 201-205 https://doi.org/10.1016/S0895-7061(01)02285-3
  32. Mehta, C. and Patel, N. (2006). StatXact 7 with Gytel Studio Cytel Software Corporation, (Cambridge, MA)
  33. Mote, V.L. and Anderson, R.L. (1965). An investigation of the effect of misclassification on the properties of chi-square tests in the analysis of categorical data. Biometrika 52, 95-109
  34. Odunsi, K., Terry, G., Ho, L., Bell, J., Cuzick, J., and Ganesan, T. (1995). Association between HLA DQB1*03 and cervical intra-epithelial neoplasia. Molecular Medicine 1, 161-171 https://doi.org/10.1007/s008940050013
  35. Oesterling, J.E., Cooner, W.H., Jacobsen, S.J., Guess, H.A., and Lieber, M.M. (1993). Influence of patient age on the serum PSA concentration. An important dinical observation. Urol. Clin. North Am. 20, 671-680
  36. PS: Power and sample size calculation. Software webpage is http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/PowerSampleSize
  37. Purcell, S., Cherny, S., and Sham, P. (2003). Genetic Power Calculator: design of linkage and association genetic mapping studies of complex traits. Bioinformatics 19, 149-150 https://doi.org/10.1093/bioinformatics/19.1.149
  38. Reveille, J., Moulds, J., Ahn, C., Friedman, A., Baethge, B., Roseman, J., Stratton, K., and Alarcon, G. (1988). Systemic lupus erythematosus in three ethnic groups: I. The effects of HLA class II, C4, and CR1 alleles, socioeconomic factors, and ethnicity at disease onset. LUMINA Study Group. Lupus in minority populations, nature versus nurture. Arthritis Rheum 41, 1161-1172 https://doi.org/10.1002/1529-0131(199807)41:7<1161::AID-ART4>3.0.CO;2-K
  39. Reveille, J., Fischbach, M., McNearney, T., Friedman, A., Aguilar, M., Lisse, J., Fritzler, M., Ahn, C., Arnett, F., and GENISOS Study Group. (2001). Systemic sclerosis in 3 US ethnic groups: a comparison of clinical, sociodemographic, serologic, and immunogenetic determinants. Semin Arthritis Rheum 30, 332-346
  40. Romero, R., Kuivaniemi, H., Tromp, G., and Olson, J. (2002). The design, execution, and interpretation of genetic association studies to decipher complex disease. Am. J. Obstet. Gynecol. 187, 1299-1312 https://doi.org/10.1067/mob.2002.128319
  41. Sasieni, P. (1997). From genotypes to genes: Doubling the sample size. Biometrics 53, 1253-1261 https://doi.org/10.2307/2533494
  42. Slager, S. and Schaid, D. (2001). Case-control studies of genetic markers: power and sample size approximations for Armitage's test for trend. Human Heredity 52, 149-153 https://doi.org/10.1159/000053370
  43. Sobel, E., Papp, J., and Lange, K. (2002). Detection and integration of genotyping errors in statistical genetics. Am. J. Hum. Genet. 70, 496-508 https://doi.org/10.1086/338920
  44. Zheng, G., and Tian, X. (2005). The impact of diagnostic error on testing genetic association in case-control studies. Statistics in Medicine 24, 869-882 https://doi.org/10.1002/sim.1976