Browse > Article
http://dx.doi.org/10.29220/CSAM.2018.25.4.385

Multiple imputation for competing risks survival data via pseudo-observations  

Han, Seungbong (Department of Applied Statistics, Gachon University)
Andrei, Adin-Cristian (Department of Preventive Medicine, Northwestern University)
Tsui, Kam-Wah (Department of Statistics, University of Wisconsin-Madison)
Publication Information
Communications for Statistical Applications and Methods / v.25, no.4, 2018 , pp. 385-396 More about this Journal
Abstract
Competing risks are commonly encountered in biomedical research. Regression models for competing risks data can be developed based on data routinely collected in hospitals or general practices. However, these data sets usually contain the covariate missing values. To overcome this problem, multiple imputation is often used to fit regression models under a MAR assumption. Here, we introduce a multivariate imputation in a chained equations algorithm to deal with competing risks survival data. Using pseudo-observations, we make use of the available outcome information by accommodating the competing risk structure. Lastly, we illustrate the practical advantages of our approach using simulations and two data examples from a coronary artery disease data and hepatocellular carcinoma data.
Keywords
competing risks; missing data; multiple imputation; pseudo-observations; random forest;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Nicolaie MA, van Houwelingen JC, deWitte TM, and Putter H (2013). Dynamic pseudo-observations: a robust approach to dynamic prediction in competing risks, Biometrics, 69, 1043-1052.   DOI
2 Ripley B (2014). tree: Classification and regression trees. R package version 1.0-35, from: http://CRAN.R-project.org/package=tree
3 Royston P and White IR (2011). Multiple imputation by chained equations (MICE): implementation in Stata, Journal of Statistical Software, 45, 1-20.
4 Rubin DB (1987). Multiple Imputation for Nonresponse in Surveys, Wiley, New York.
5 Shah AD, Bartlett JW, Carpenter J, Nicholas O, and Hemingway H (2014). Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study, American Journal of Epidemiology, 179, 764-774.   DOI
6 Seung KB, Park DW, Kim YH et al. (2008). Stents versus coronary-artery bypass grafting for left main coronary artery disease, The New England Journal of Medicine, 358, 1781-1792.   DOI
7 Shim JH, Yoon DL, Han S, et al. (2012). Is Serum Alpha-Fetoprotein useful for predicting recurrence and mortality specific to hepatocellular carcinoma after hepatectomy? A test based on propensity scores and competing risks analysis, Annals of Surgical Oncology, 19, 3687-3696.   DOI
8 van Buuren S, Boshuizen HC, and Knook DL (1999). Multiple imputation of missing blood pressure covariates in survival analysis, Statistics in Medicine, 18, 681-694   DOI
9 van Buuren S and Groothuis-Oudshoorn K (2011). mice: multivariate imputation by chained equations in R, Journal of Statistical Software, 45, 1-67.
10 Breiman L (2001). Random forests, Machine Learning, 45, 5-32.   DOI
11 Burgette LF and Reiter JP (2010). Multiple imputation for missing data via sequential regression trees, American Journal of Epidemiology, 172, 1070-1076.   DOI
12 Do G and Kim YJ (2017). Analysis of interval censored competing risk data with missing causes of failure using pseudo values approach, Journal of Statistical Computation and Simulation, 87, 631-639.   DOI
13 Fine J and Gray R (1999). A proportional hazards model for the subdistribution of a competing risk, Journal of the American Statistical Association, 94, 496-509.   DOI
14 Graham JW, Olchowski AE, and Gilreath TD (2007). How many imputations are really needed? Some practical clarifications of multiple imputation theory, Prevention Science, 8, 206-213.   DOI
15 Gray B (2014). cmprsk: Subdistribution Analysis of Competing Risks, R package version 2.2-7. http://CRAN.R-project.org/package=cmprsk
16 Moreno-Betancur M and Latouche A (2013). Regression modeling of the cumulative incidence function with missing causes of failure using pseudo-values, Statistics in Medicine, 32, 3206-3223.   DOI
17 Graw F, Gerds TA, and Schumacher M (2009). On pseudo-values for regression analysis in competing risks models, Lifetime Data Analysis, 15, 241-255.   DOI
18 Kim S and Kim YJ (2016). Regression analysis of interval censored competing risk data using a pseudo-value approach, Communications for Statistical Applications and Methods, 23, 555-562.   DOI
19 Klein JP and Andersen PK (2005). Regression modeling of competing risks data based on pseudovalues of the cumulative incidence function, Biometrics, 61, 223-229.   DOI
20 Beyersmann J, Allignol A, and Schumacher M (2012). Competing Risks and Multistate Models with R, Springer-Verlag New York, Chapter 3, 45-50.
21 Liaw A and Wiener M (2002). Classification and regression by randomForest, R News, 2, 18-22.
22 Logan BR, Zhang MJ, and Klein JP (2011). Marginal models for clustered time to event data with competing risks using pseudovalues, Biometrics, 67, 1-7.   DOI
23 Mogensen UB and Gerds TA (2013). A random forest approach for competing risks based on pseudovalues, Statistics in Medicine, 32, 3102-3114.   DOI
24 Andersen PK and Perme MP (2010). Pseudo-observations in survival analysis, Statistical Methods in Medical Research, 19, 71-99.   DOI
25 Ambler G, Omar RZ, Royston P, Kinsman R, Keogh BE, and Taylor KM (2005). Generic, simple risk stratification model for heart valve surgery, Circulation, 112, 224-231.   DOI
26 Ahn KW and Mendolia F (2014). Pseudo-value approach for comparing survival medians for dependent data, Statistics in Medicine, 33, 1531-1538.   DOI
27 Ambler G, Omar RZ, and Royston P (2007). A comparison of imputation techniques for handling missing predictor values in a risk model with a binary outcome, Statistical Methods in Medical Research, 16, 277-298.   DOI
28 Aalen O, Borgan O, and Gjessing H (2008). Survival and Event History Analysis, Springer, New York.