DOI QR코드

DOI QR Code

Multiple imputation for competing risks survival data via pseudo-observations

  • Received : 2018.03.08
  • Accepted : 2018.05.16
  • Published : 2018.07.31

Abstract

Competing risks are commonly encountered in biomedical research. Regression models for competing risks data can be developed based on data routinely collected in hospitals or general practices. However, these data sets usually contain the covariate missing values. To overcome this problem, multiple imputation is often used to fit regression models under a MAR assumption. Here, we introduce a multivariate imputation in a chained equations algorithm to deal with competing risks survival data. Using pseudo-observations, we make use of the available outcome information by accommodating the competing risk structure. Lastly, we illustrate the practical advantages of our approach using simulations and two data examples from a coronary artery disease data and hepatocellular carcinoma data.

Keywords

References

  1. Aalen O, Borgan O, and Gjessing H (2008). Survival and Event History Analysis, Springer, New York.
  2. Ahn KW and Mendolia F (2014). Pseudo-value approach for comparing survival medians for dependent data, Statistics in Medicine, 33, 1531-1538. https://doi.org/10.1002/sim.6072
  3. Andersen PK and Perme MP (2010). Pseudo-observations in survival analysis, Statistical Methods in Medical Research, 19, 71-99. https://doi.org/10.1177/0962280209105020
  4. Ambler G, Omar RZ, Royston P, Kinsman R, Keogh BE, and Taylor KM (2005). Generic, simple risk stratification model for heart valve surgery, Circulation, 112, 224-231. https://doi.org/10.1161/CIRCULATIONAHA.104.515049
  5. Ambler G, Omar RZ, and Royston P (2007). A comparison of imputation techniques for handling missing predictor values in a risk model with a binary outcome, Statistical Methods in Medical Research, 16, 277-298. https://doi.org/10.1177/0962280206074466
  6. Moreno-Betancur M and Latouche A (2013). Regression modeling of the cumulative incidence function with missing causes of failure using pseudo-values, Statistics in Medicine, 32, 3206-3223. https://doi.org/10.1002/sim.5755
  7. Beyersmann J, Allignol A, and Schumacher M (2012). Competing Risks and Multistate Models with R, Springer-Verlag New York, Chapter 3, 45-50.
  8. Breiman L (2001). Random forests, Machine Learning, 45, 5-32. https://doi.org/10.1023/A:1010933404324
  9. Burgette LF and Reiter JP (2010). Multiple imputation for missing data via sequential regression trees, American Journal of Epidemiology, 172, 1070-1076. https://doi.org/10.1093/aje/kwq260
  10. Do G and Kim YJ (2017). Analysis of interval censored competing risk data with missing causes of failure using pseudo values approach, Journal of Statistical Computation and Simulation, 87, 631-639. https://doi.org/10.1080/00949655.2016.1222530
  11. Fine J and Gray R (1999). A proportional hazards model for the subdistribution of a competing risk, Journal of the American Statistical Association, 94, 496-509. https://doi.org/10.1080/01621459.1999.10474144
  12. Graham JW, Olchowski AE, and Gilreath TD (2007). How many imputations are really needed? Some practical clarifications of multiple imputation theory, Prevention Science, 8, 206-213. https://doi.org/10.1007/s11121-007-0070-9
  13. Gray B (2014). cmprsk: Subdistribution Analysis of Competing Risks, R package version 2.2-7. http://CRAN.R-project.org/package=cmprsk
  14. Graw F, Gerds TA, and Schumacher M (2009). On pseudo-values for regression analysis in competing risks models, Lifetime Data Analysis, 15, 241-255. https://doi.org/10.1007/s10985-008-9107-z
  15. Kim S and Kim YJ (2016). Regression analysis of interval censored competing risk data using a pseudo-value approach, Communications for Statistical Applications and Methods, 23, 555-562. https://doi.org/10.5351/CSAM.2016.23.6.555
  16. Klein JP and Andersen PK (2005). Regression modeling of competing risks data based on pseudovalues of the cumulative incidence function, Biometrics, 61, 223-229. https://doi.org/10.1111/j.0006-341X.2005.031209.x
  17. Liaw A and Wiener M (2002). Classification and regression by randomForest, R News, 2, 18-22.
  18. Logan BR, Zhang MJ, and Klein JP (2011). Marginal models for clustered time to event data with competing risks using pseudovalues, Biometrics, 67, 1-7. https://doi.org/10.1111/j.1541-0420.2010.01416.x
  19. Mogensen UB and Gerds TA (2013). A random forest approach for competing risks based on pseudovalues, Statistics in Medicine, 32, 3102-3114. https://doi.org/10.1002/sim.5775
  20. Nicolaie MA, van Houwelingen JC, deWitte TM, and Putter H (2013). Dynamic pseudo-observations: a robust approach to dynamic prediction in competing risks, Biometrics, 69, 1043-1052. https://doi.org/10.1111/biom.12061
  21. Ripley B (2014). tree: Classification and regression trees. R package version 1.0-35, from: http://CRAN.R-project.org/package=tree
  22. Royston P and White IR (2011). Multiple imputation by chained equations (MICE): implementation in Stata, Journal of Statistical Software, 45, 1-20.
  23. Rubin DB (1987). Multiple Imputation for Nonresponse in Surveys, Wiley, New York.
  24. Shah AD, Bartlett JW, Carpenter J, Nicholas O, and Hemingway H (2014). Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study, American Journal of Epidemiology, 179, 764-774. https://doi.org/10.1093/aje/kwt312
  25. Seung KB, Park DW, Kim YH et al. (2008). Stents versus coronary-artery bypass grafting for left main coronary artery disease, The New England Journal of Medicine, 358, 1781-1792. https://doi.org/10.1056/NEJMoa0801441
  26. Shim JH, Yoon DL, Han S, et al. (2012). Is Serum Alpha-Fetoprotein useful for predicting recurrence and mortality specific to hepatocellular carcinoma after hepatectomy? A test based on propensity scores and competing risks analysis, Annals of Surgical Oncology, 19, 3687-3696. https://doi.org/10.1245/s10434-012-2416-1
  27. van Buuren S, Boshuizen HC, and Knook DL (1999). Multiple imputation of missing blood pressure covariates in survival analysis, Statistics in Medicine, 18, 681-694 https://doi.org/10.1002/(SICI)1097-0258(19990330)18:6<681::AID-SIM71>3.0.CO;2-R
  28. van Buuren S and Groothuis-Oudshoorn K (2011). mice: multivariate imputation by chained equations in R, Journal of Statistical Software, 45, 1-67.