DOI QR코드

DOI QR Code

A review of analysis methods for secondary outcomes in case-control studies

  • Schifano, Elizabeth D. (Department of Statistics, University of Connecticut)
  • Received : 2018.08.13
  • Accepted : 2019.01.23
  • Published : 2019.03.31

Abstract

The main goal of a case-control study is to learn the association between various risk factors and a primary outcome (e.g., disease status). Particularly recently, it is also quite common to perform secondary analyses of the case-control data in order to understand certain associations between the risk factors of the primary outcome. It has been repeatedly documented with case-control data, association studies of the risk factors that ignore the case-control sampling scheme can produce highly biased estimates of the population effects. In this article, we review the issues of the naive secondary analyses that do not account for the biased sampling scheme, and also the various methods that have been proposed to account for the case-control ascertainment. We additionally compare the results of many of the discussed methods in an example examining the association of a particular genetic variant with smoking behavior, where the data were obtained from a lung cancer case-control study.

Keywords

References

  1. Breslow NE, Amorim G, Pettinger MB, and Rossouw J (2013). Using the whole cohort in the analysis of case-control data: application to the women's health initiative, Statistics in Biosciences, 5.
  2. Breslow NE and Cain KC (1988). Logistic regression for two-stage case-control data, Biometrika, 75, 11-20. https://doi.org/10.1093/biomet/75.1.11
  3. Breslow NE and Day NE (1980). Statistical Methods in Cancer Research Volume 1, The Analysis of Case Control Studies, International Agency for Research on Cancer, Lyon.
  4. Chen HY, Kittles R, and ZhangW(2013). Bias correction to secondary trait analysis with case-control design, Statistics in Medicine, 32, 1494-1508. https://doi.org/10.1002/sim.5613
  5. Etter JF, Duc TV, and Perneger TV (1999). Validity of the Fagerstrom test for nicotine dependence and of the Heaviness of Smoking Index among relatively light smokers, Addiction, 94, 269-281. https://doi.org/10.1046/j.1360-0443.1999.94226910.x
  6. Flanders WD and Greenland S (1991). Analytic methods for two-stage case-control studies and other stratified designs, Statistics in Medicine, 10, 739-747. https://doi.org/10.1002/sim.4780100509
  7. Gallus S, Pacifici R, Colombo P, La Vecchia C, Garattini S, Apolone G, and Zuccaro P (2005). Tobacco dependence in the general population in Italy, Annals of Oncology, 16, 703-706. https://doi.org/10.1093/annonc/mdi153
  8. Gazioglu S, Wei J, Jennings EM, and Carroll RJ (2013). A note on penalized regression spline estimation in the secondary analysis of case-control data, Statistics in Biosciences, 5, 250-260. https://doi.org/10.1007/s12561-013-9094-9
  9. Ghosh A, Wright F, and Zou F (2013). Unified analysis of secondary traits in case-control association studies, Journal of the American Statistical Association, 108, 566-576. https://doi.org/10.1080/01621459.2013.793121
  10. Hancock DB, Reginsson GW, Gaddis NC, et al. (2015). Genome-wide meta-analysis reveals common splice site acceptor variant in CHRNA4 associated with nicotine dependence, Translational Psychiatry, 5, e651. https://doi.org/10.1038/tp.2015.149
  11. He J, Li H, Edmondson AC, Rader DJ, and Li M (2012). A Gaussian copula approach for the analysis of secondary phenotypes in case-control genetic association studies, Biostatistics, 13, 497-508. https://doi.org/10.1093/biostatistics/kxr025
  12. Heatherton TF, Kozlowski LT, Frecker RC, and Fagerstrom KO (1991). The Fagerstrom test for nicotine dependence: a revision of the Fagerstrom tolerance questionnaire, British Journal of Addiction, 86, 1119-1127. https://doi.org/10.1111/j.1360-0443.1991.tb01879.x
  13. Horvitz DG and Thompson DJ (1952). A generalization of sampling without replacement from a finite universe, Journal of the American Statistical Association, 47, 663-685. https://doi.org/10.1080/01621459.1952.10483446
  14. Jiang Y, Scott AJ, andWild CJ (2006). Secondary analysis of case-control data, Statistics in Medicine, 25, 1323-1339. https://doi.org/10.1002/sim.2283
  15. Kang G, Bi W, Zhang H, et al. (2017). A robust and powerful set-valued approach to rare variant association analyses of secondary traits in case-control sequencing studies, Genetics, 205, 1049-1062. https://doi.org/10.1534/genetics.116.192377
  16. Kim RS and Kaplan RC (2014). Analysis of secondary outcomes in nested case-control study designs, Statistics in Medicine, 33, 4215-4226. https://doi.org/10.1002/sim.6231
  17. Landi MT, Consonni D, Rotunno M, et al. (2008). Environment and Genetics in Lung Cancer Etiology (EAGLE) study: an integrative population-based case-control study of lung cancer, BMC Public Health, 8, 203. https://doi.org/10.1186/1471-2458-8-203
  18. Lee AJ, McMurchy L, and Scott AJ (1997). Re-using data from case-control studies, Statistics in Medicine, 16, 1377-1389. https://doi.org/10.1002/(SICI)1097-0258(19970630)16:12<1377::AID-SIM557>3.0.CO;2-K
  19. Li H and Gail MH (2012). Efficient adaptively weighted analysis of secondary phenotypes in case-control genome-wide association studies, Human Heredity, 73, 159-173. https://doi.org/10.1159/000338943
  20. Li H, Gail MH, Berndt S, and Chatterjee N (2010). Using cases to strengthen inference on the association between single nucleotide polymorphisms and a secondary phenotype in genome-wide association studies, Genetic Epidemiology, 433, 427-433.
  21. Liang L, Ma Y, Wei Y, and Carroll RJ (2018). Semiparametrically efficient estimation in quantile regression of secondary analysis, Journal of the Royal Statistical Society, Series B, 80, 625-648. https://doi.org/10.1111/rssb.12272
  22. Lin DY and Zeng D (2009). Proper analysis of secondary phenotype data in case-control association studies, Genetic Epidemiology, 33, 256-265. https://doi.org/10.1002/gepi.20377
  23. Liu DJ and Leal SM (2012). A flexible likelihood framework for detecting associations with secondary phenotypes in genetic studies using selected samples: application to sequence data, European Journal of Human Genetics, 20, 449-456. https://doi.org/10.1038/ejhg.2011.211
  24. Lutz SM, Hokanson JE, and Lange C (2014). An alternative hypothesis testing strategy for secondary phenotype data in case-control genetic association studies, Frontiers in Genetics, 5, 188.
  25. Ma Y and Carroll RJ (2016). Semiparametric estimation in the secondary analysis of case-control studies, Journal of the Royal Statistical Society, Series B, 78, 127-151. https://doi.org/10.1111/rssb.12107
  26. Monsees GM, Tamimi RM, and Kraft P (2009). Genome-wide association scans for secondary traits using case-control samples, Genetic Epidemiology, 33, 717-728. https://doi.org/10.1002/gepi.20424
  27. Nagelkerke NJ, Moses S, Plummer FA, Brunham RC, and Fish D (1995). Logistic regression in case-control studies: the effect of using independent as dependent variables, Statistics in Medicine, 14, 769-775. https://doi.org/10.1002/sim.4780140806
  28. Palmgren J (1989). Regression models for bivariate binary responses, School of Public Health and Community Medicine, University of Washington, 101.
  29. Prentice RL and Pyke R (1979). Logistic disease incidence models and case-control studies, Biometrika, 66, 403-411. https://doi.org/10.1093/biomet/66.3.403
  30. Ray D and Basu S (2017). A novel association test for multiple secondary phenotypes from a case- control GWAS, Genetic Epidemiology, 41, 413-426. https://doi.org/10.1002/gepi.22045
  31. Reilly M (1996). Optimal sampling strategies for two-stage studies, American Journal of Epidemiology, 143, 92-100. https://doi.org/10.1093/oxfordjournals.aje.a008662
  32. Reilly M, Torrang A, and Klint A (2005). Re-use of case-control data for analysis of new outcome variables, Statistics in Medicine, 24, 4009-4019. https://doi.org/10.1002/sim.2398
  33. Richardson DB, Rzehak P, Klenk J, and Weiland SK (2007). Analyses of case-control data for additional outcomes, Epidemiology, 18, 441-445. https://doi.org/10.1097/EDE.0b013e318060d25c
  34. Robins JM, Rotnitzky A, and Zhao LP (1994). Estimation of regression coefficients when some regressors are not always observed, Journal of the American Statistical Association, 89, 198-203.
  35. Rothman KJ (1986). Modern Epidemiology, Little Brown & Company, Boston.
  36. Schifano ED, Bar H, and Harel O (2015). Methods for analyzing secondary outcomes in public health case-control studies, Chen DG (Din) and Wilson JR (Eds), (Chapter 1, pp. 3-15) Innovative Statistical Methods for Public Health Data, Springer, Switzerland.
  37. Schifano ED, Li L, Christiani DC, and Lin X (2013). Genome-wide association analysis for multiple continuous phenotypes, American Journal of Human Genetics, 92, 744-759. https://doi.org/10.1016/j.ajhg.2013.04.004
  38. Schlesselman JJ (1981). Case-Control Studies: Design, Conduct, Analysis, Oxford University Press, Oxford.
  39. Scott AJ and Wild CJ (1986). Fitting logistic models under case-control or choice based sampling, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 48, 170-192. https://doi.org/10.1111/j.2517-6161.1986.tb01400.x
  40. Scott AJ and Wild CJ (1995). Maximum likelihood estimation for case-control data, Department of Statistics, University of Auckland, 4.
  41. Scott AJ andWild CJ (2002). On the robustness of weighted methods for fitting models to case-control data, Journal of the Royal Statistical Society: Series B, 64, 207-219. https://doi.org/10.1111/1467-9868.00333
  42. Sofer T (2013). RECSO: Robust and Efficient Analysis using Control function approach, of a Secondary Outcome, R package version 1.0. Available from: https://CRAN.R-project.org/package=RECSO
  43. Sofer T, Cornelis MC, Kraft P, and Tchetgen Tchetgen, EJ (2017a). Control function assisted IPW estimation with a secondary outcome in case-control studies, Statistica Sinica, 27, 785-804.
  44. Sofer T, Schifano ED, Christiani DC, and Lin X (2017b). Weighted pseudolikelihood for SNP set analysis of multiple secondary phenotypes in case-control genetic association studies, Biometrics, 73, 1210-1220. https://doi.org/10.1111/biom.12680
  45. Solipaca A and Ricciardi W (2016). Rapporto Osservasalute: Stato di salute e qualita dell'assistenza nelle regioni italiane, Osservatorio Nazionale Sulla Salute Nelle Regioni Itaniane, 212-213.
  46. Song X, Ionita-Laza I, Liu M, Reibman J, and Wei Y (2016a). A general and robust framework for secondary traits analysis, Genetics, 202, 1329-1343. https://doi.org/10.1534/genetics.115.181073
  47. Song X, Ionita-Laza I, Liu M, Reitman J, and Wei Y (2016). WEE: weighted estimated equation (WEE) approaches in genetic case-control studies, R package version 1.0. Available from: https://CRAN.R-project.org/package=WEE
  48. Tapsoba JdeD, Kooperberg C, Reiner A, Wang CY, and Dai JY (2014). Robust estimation for secondary trait association in case-control genetic studies, American Journal of Epidemiology, 179, 1264-1272. https://doi.org/10.1093/aje/kwu039
  49. Tchetgen Tchetgen E (2014). A general regression framework for a secondary, Biostatistics, 15, 117-128. https://doi.org/10.1093/biostatistics/kxt041
  50. Tseng TS, Park JY, Zabaleta J, Moody-Thomas S, Sothern MS, Chen T, Evans DE, and Lin HY (2014). Role of nicotine dependence on the relationship between variants in the nicotinic receptor genes and risk of lung adenocarcinoma, PLoS ONE, 9, e107268. https://doi.org/10.1371/journal.pone.0107268
  51. VanderWeele TJ, Asomaning K, Tchetgen Tchetgen EJ, et al. (2012). Genetic variants on 15q25.1, smoking, and lung cancer: an assessment of mediation and interaction, American Journal of Epidemiology, 175, 1013-1020. https://doi.org/10.1093/aje/kwr467
  52. Wang J and Shete S (2011a). Estimation of odds ratios of genetic variants for the secondary phenotypes associated with primary diseases, Genetic Epidemiology, 35, 190-200. https://doi.org/10.1002/gepi.20568
  53. Wang J and Shete S (2011b). Power and type I error results for a bias-correction approach recently shown to provide accurate odds ratios of genetic variants for the secondary phenotypes associated with primary diseases, Genetic Epidemiology, 35, 739-743. https://doi.org/10.1002/gepi.20611
  54. Wang J and Shete S (2012). Analysis of secondary phenotype involving the interactive effect of the secondary phenotype and genetic variants on the primary disease, Annals of Human Genetics, 76, 484-499. https://doi.org/10.1111/j.1469-1809.2012.00725.x
  55. Wei J, Carrroll RJ, Muller U, Van Keilegon I, and Chatterjee N (2013). Locally efficient estimation for homoscedastic regression in the secondary analysis of case-control data, Journal of the Royal Statistical Society, Series B, 75, 186-206.
  56. Wei Y, Song X, Liu M, Ionita-Laza I, and Reibman J (2016). Quantile regression in the secondary analysis of Case-control data, Journal of the American Statistical Association, 111, 344-354. https://doi.org/10.1080/01621459.2015.1008101
  57. Xing C, M McCarthy J, Dupuis J, Adrienne Cupples L, B Meigs J, Lin X, and S Allen A (2016). Robust analysis of secondary phenotypes in case-control genetic association studies, Statistics in Medicine, 35, 4226-4237. https://doi.org/10.1002/sim.6976
  58. Zhao LP and Lipsitz S (1992). Designs and analysis of two-stage studies, Statistics in Medicine, 11, 769-782. https://doi.org/10.1002/sim.4780110608