Browse > Article
http://dx.doi.org/10.29220/CSAM.2019.26.2.103

A review of analysis methods for secondary outcomes in case-control studies  

Schifano, Elizabeth D. (Department of Statistics, University of Connecticut)
Publication Information
Communications for Statistical Applications and Methods / v.26, no.2, 2019 , pp. 103-129 More about this Journal
Abstract
The main goal of a case-control study is to learn the association between various risk factors and a primary outcome (e.g., disease status). Particularly recently, it is also quite common to perform secondary analyses of the case-control data in order to understand certain associations between the risk factors of the primary outcome. It has been repeatedly documented with case-control data, association studies of the risk factors that ignore the case-control sampling scheme can produce highly biased estimates of the population effects. In this article, we review the issues of the naive secondary analyses that do not account for the biased sampling scheme, and also the various methods that have been proposed to account for the case-control ascertainment. We additionally compare the results of many of the discussed methods in an example examining the association of a particular genetic variant with smoking behavior, where the data were obtained from a lung cancer case-control study.
Keywords
ascertainment; data reuse; retrospective study; sampling bias; selection bias;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Wang J and Shete S (2012). Analysis of secondary phenotype involving the interactive effect of the secondary phenotype and genetic variants on the primary disease, Annals of Human Genetics, 76, 484-499.   DOI
2 Wei J, Carrroll RJ, Muller U, Van Keilegon I, and Chatterjee N (2013). Locally efficient estimation for homoscedastic regression in the secondary analysis of case-control data, Journal of the Royal Statistical Society, Series B, 75, 186-206.
3 Wei Y, Song X, Liu M, Ionita-Laza I, and Reibman J (2016). Quantile regression in the secondary analysis of Case-control data, Journal of the American Statistical Association, 111, 344-354.   DOI
4 Xing C, M McCarthy J, Dupuis J, Adrienne Cupples L, B Meigs J, Lin X, and S Allen A (2016). Robust analysis of secondary phenotypes in case-control genetic association studies, Statistics in Medicine, 35, 4226-4237.   DOI
5 Breslow NE and Day NE (1980). Statistical Methods in Cancer Research Volume 1, The Analysis of Case Control Studies, International Agency for Research on Cancer, Lyon.
6 Chen HY, Kittles R, and ZhangW(2013). Bias correction to secondary trait analysis with case-control design, Statistics in Medicine, 32, 1494-1508.   DOI
7 Etter JF, Duc TV, and Perneger TV (1999). Validity of the Fagerstrom test for nicotine dependence and of the Heaviness of Smoking Index among relatively light smokers, Addiction, 94, 269-281.   DOI
8 Flanders WD and Greenland S (1991). Analytic methods for two-stage case-control studies and other stratified designs, Statistics in Medicine, 10, 739-747.   DOI
9 Gallus S, Pacifici R, Colombo P, La Vecchia C, Garattini S, Apolone G, and Zuccaro P (2005). Tobacco dependence in the general population in Italy, Annals of Oncology, 16, 703-706.   DOI
10 Gazioglu S, Wei J, Jennings EM, and Carroll RJ (2013). A note on penalized regression spline estimation in the secondary analysis of case-control data, Statistics in Biosciences, 5, 250-260.   DOI
11 Jiang Y, Scott AJ, andWild CJ (2006). Secondary analysis of case-control data, Statistics in Medicine, 25, 1323-1339.   DOI
12 Ghosh A, Wright F, and Zou F (2013). Unified analysis of secondary traits in case-control association studies, Journal of the American Statistical Association, 108, 566-576.   DOI
13 Hancock DB, Reginsson GW, Gaddis NC, et al. (2015). Genome-wide meta-analysis reveals common splice site acceptor variant in CHRNA4 associated with nicotine dependence, Translational Psychiatry, 5, e651.   DOI
14 He J, Li H, Edmondson AC, Rader DJ, and Li M (2012). A Gaussian copula approach for the analysis of secondary phenotypes in case-control genetic association studies, Biostatistics, 13, 497-508.   DOI
15 Heatherton TF, Kozlowski LT, Frecker RC, and Fagerstrom KO (1991). The Fagerstrom test for nicotine dependence: a revision of the Fagerstrom tolerance questionnaire, British Journal of Addiction, 86, 1119-1127.   DOI
16 Horvitz DG and Thompson DJ (1952). A generalization of sampling without replacement from a finite universe, Journal of the American Statistical Association, 47, 663-685.   DOI
17 Kang G, Bi W, Zhang H, et al. (2017). A robust and powerful set-valued approach to rare variant association analyses of secondary traits in case-control sequencing studies, Genetics, 205, 1049-1062.   DOI
18 Kim RS and Kaplan RC (2014). Analysis of secondary outcomes in nested case-control study designs, Statistics in Medicine, 33, 4215-4226.   DOI
19 Landi MT, Consonni D, Rotunno M, et al. (2008). Environment and Genetics in Lung Cancer Etiology (EAGLE) study: an integrative population-based case-control study of lung cancer, BMC Public Health, 8, 203.   DOI
20 Lee AJ, McMurchy L, and Scott AJ (1997). Re-using data from case-control studies, Statistics in Medicine, 16, 1377-1389.   DOI
21 Lutz SM, Hokanson JE, and Lange C (2014). An alternative hypothesis testing strategy for secondary phenotype data in case-control genetic association studies, Frontiers in Genetics, 5, 188.
22 Li H and Gail MH (2012). Efficient adaptively weighted analysis of secondary phenotypes in case-control genome-wide association studies, Human Heredity, 73, 159-173.   DOI
23 Li H, Gail MH, Berndt S, and Chatterjee N (2010). Using cases to strengthen inference on the association between single nucleotide polymorphisms and a secondary phenotype in genome-wide association studies, Genetic Epidemiology, 433, 427-433.
24 Liang L, Ma Y, Wei Y, and Carroll RJ (2018). Semiparametrically efficient estimation in quantile regression of secondary analysis, Journal of the Royal Statistical Society, Series B, 80, 625-648.   DOI
25 Lin DY and Zeng D (2009). Proper analysis of secondary phenotype data in case-control association studies, Genetic Epidemiology, 33, 256-265.   DOI
26 Liu DJ and Leal SM (2012). A flexible likelihood framework for detecting associations with secondary phenotypes in genetic studies using selected samples: application to sequence data, European Journal of Human Genetics, 20, 449-456.   DOI
27 Ma Y and Carroll RJ (2016). Semiparametric estimation in the secondary analysis of case-control studies, Journal of the Royal Statistical Society, Series B, 78, 127-151.   DOI
28 Monsees GM, Tamimi RM, and Kraft P (2009). Genome-wide association scans for secondary traits using case-control samples, Genetic Epidemiology, 33, 717-728.   DOI
29 Nagelkerke NJ, Moses S, Plummer FA, Brunham RC, and Fish D (1995). Logistic regression in case-control studies: the effect of using independent as dependent variables, Statistics in Medicine, 14, 769-775.   DOI
30 Palmgren J (1989). Regression models for bivariate binary responses, School of Public Health and Community Medicine, University of Washington, 101.
31 Prentice RL and Pyke R (1979). Logistic disease incidence models and case-control studies, Biometrika, 66, 403-411.   DOI
32 Ray D and Basu S (2017). A novel association test for multiple secondary phenotypes from a case- control GWAS, Genetic Epidemiology, 41, 413-426.   DOI
33 Reilly M (1996). Optimal sampling strategies for two-stage studies, American Journal of Epidemiology, 143, 92-100.   DOI
34 Reilly M, Torrang A, and Klint A (2005). Re-use of case-control data for analysis of new outcome variables, Statistics in Medicine, 24, 4009-4019.   DOI
35 Richardson DB, Rzehak P, Klenk J, and Weiland SK (2007). Analyses of case-control data for additional outcomes, Epidemiology, 18, 441-445.   DOI
36 Robins JM, Rotnitzky A, and Zhao LP (1994). Estimation of regression coefficients when some regressors are not always observed, Journal of the American Statistical Association, 89, 198-203.
37 Schlesselman JJ (1981). Case-Control Studies: Design, Conduct, Analysis, Oxford University Press, Oxford.
38 Rothman KJ (1986). Modern Epidemiology, Little Brown & Company, Boston.
39 Schifano ED, Bar H, and Harel O (2015). Methods for analyzing secondary outcomes in public health case-control studies, Chen DG (Din) and Wilson JR (Eds), (Chapter 1, pp. 3-15) Innovative Statistical Methods for Public Health Data, Springer, Switzerland.
40 Schifano ED, Li L, Christiani DC, and Lin X (2013). Genome-wide association analysis for multiple continuous phenotypes, American Journal of Human Genetics, 92, 744-759.   DOI
41 Zhao LP and Lipsitz S (1992). Designs and analysis of two-stage studies, Statistics in Medicine, 11, 769-782.   DOI
42 Sofer T, Cornelis MC, Kraft P, and Tchetgen Tchetgen, EJ (2017a). Control function assisted IPW estimation with a secondary outcome in case-control studies, Statistica Sinica, 27, 785-804.
43 Scott AJ and Wild CJ (1995). Maximum likelihood estimation for case-control data, Department of Statistics, University of Auckland, 4.
44 Scott AJ andWild CJ (2002). On the robustness of weighted methods for fitting models to case-control data, Journal of the Royal Statistical Society: Series B, 64, 207-219.   DOI
45 Sofer T (2013). RECSO: Robust and Efficient Analysis using Control function approach, of a Secondary Outcome, R package version 1.0. Available from: https://CRAN.R-project.org/package=RECSO
46 Sofer T, Schifano ED, Christiani DC, and Lin X (2017b). Weighted pseudolikelihood for SNP set analysis of multiple secondary phenotypes in case-control genetic association studies, Biometrics, 73, 1210-1220.   DOI
47 Solipaca A and Ricciardi W (2016). Rapporto Osservasalute: Stato di salute e qualita dell'assistenza nelle regioni italiane, Osservatorio Nazionale Sulla Salute Nelle Regioni Itaniane, 212-213.
48 Scott AJ and Wild CJ (1986). Fitting logistic models under case-control or choice based sampling, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 48, 170-192.   DOI
49 Song X, Ionita-Laza I, Liu M, Reibman J, and Wei Y (2016a). A general and robust framework for secondary traits analysis, Genetics, 202, 1329-1343.   DOI
50 Song X, Ionita-Laza I, Liu M, Reitman J, and Wei Y (2016). WEE: weighted estimated equation (WEE) approaches in genetic case-control studies, R package version 1.0. Available from: https://CRAN.R-project.org/package=WEE
51 Tapsoba JdeD, Kooperberg C, Reiner A, Wang CY, and Dai JY (2014). Robust estimation for secondary trait association in case-control genetic studies, American Journal of Epidemiology, 179, 1264-1272.   DOI
52 Wang J and Shete S (2011a). Estimation of odds ratios of genetic variants for the secondary phenotypes associated with primary diseases, Genetic Epidemiology, 35, 190-200.   DOI
53 Tchetgen Tchetgen E (2014). A general regression framework for a secondary, Biostatistics, 15, 117-128.   DOI
54 Tseng TS, Park JY, Zabaleta J, Moody-Thomas S, Sothern MS, Chen T, Evans DE, and Lin HY (2014). Role of nicotine dependence on the relationship between variants in the nicotinic receptor genes and risk of lung adenocarcinoma, PLoS ONE, 9, e107268.   DOI
55 VanderWeele TJ, Asomaning K, Tchetgen Tchetgen EJ, et al. (2012). Genetic variants on 15q25.1, smoking, and lung cancer: an assessment of mediation and interaction, American Journal of Epidemiology, 175, 1013-1020.   DOI
56 Wang J and Shete S (2011b). Power and type I error results for a bias-correction approach recently shown to provide accurate odds ratios of genetic variants for the secondary phenotypes associated with primary diseases, Genetic Epidemiology, 35, 739-743.   DOI
57 Breslow NE and Cain KC (1988). Logistic regression for two-stage case-control data, Biometrika, 75, 11-20.   DOI
58 Breslow NE, Amorim G, Pettinger MB, and Rossouw J (2013). Using the whole cohort in the analysis of case-control data: application to the women's health initiative, Statistics in Biosciences, 5.