DOI QR코드

DOI QR Code

Review of Statistical Methods for Evaluating the Performance of Survival or Other Time-to-Event Prediction Models (from Conventional to Deep Learning Approaches)

  • Seo Young Park (Department of Statistics and Data Science, Korea National Open University) ;
  • Ji Eun Park (Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center) ;
  • Hyungjin Kim (Department of Radiology, Seoul National University College of Medicine, Seoul National University Hospital) ;
  • Seong Ho Park (Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center)
  • Received : 2021.03.20
  • Accepted : 2021.05.17
  • Published : 2021.10.01

Abstract

The recent introduction of various high-dimensional modeling methods, such as radiomics and deep learning, has created a much greater diversity in modeling approaches for survival prediction (or, more generally, time-to-event prediction). The newness of the recent modeling approaches and unfamiliarity with the model outputs may confuse some researchers and practitioners about the evaluation of the performance of such models. Methodological literacy to critically appraise the performance evaluation of the models and, ideally, the ability to conduct such an evaluation would be needed for those who want to develop models or apply them in practice. This article intends to provide intuitive, conceptual, and practical explanations of the statistical methods for evaluating the performance of survival prediction models with minimal usage of mathematical descriptions. It covers from conventional to deep learning methods, and emphasis has been placed on recent modeling approaches. This review article includes straightforward explanations of C indices (Harrell's C index, etc.), time-dependent receiver operating characteristic curve analysis, calibration plot, other methods for evaluating the calibration performance, and Brier score.

Keywords

References

  1. Wang P, Li Y, Reddy CK. Machine learning for survival analysis: a survey. ACM Comput Surv 2019;51:1-36 
  2. Park SH, Han K. Methodologic guide for evaluating clinical performance and effect of artificial intelligence technology for medical diagnosis and prediction. Radiology 2018;286:800-809 
  3. Park HJ, Park B, Lee SS. Radiomics and deep learning: hepatic applications. Korean J Radiol 2020;21:387-401 
  4. Park JE, Kickingereder P, Kim HS. Radiomics and deep learning from research to clinical workflow: neuro-oncologic imaging. Korean J Radiol 2020;21:1126-1137 
  5. Do S, Song KD, Chung JW. Basics of deep learning: a radiologist's guide to understanding published radiology articles on deep learning. Korean J Radiol 2020;21:33-41 
  6. Lee G, Park H, Bak SH, Lee HY. Radiomics in lung cancer from basic to advanced: current status and future directions. Korean J Radiol 2020;21:159-171 
  7. Lee SH, Park H, Ko ES. Radiomics in breast imaging from techniques to clinical applications: a review. Korean J Radiol 2020;21:779-792 
  8. Punt CJ, Buyse M, Kohne CH, Hohenberger P, Labianca R, Schmoll HJ, et al. Endpoints in adjuvant treatment trials: a systematic review of the literature in colon cancer and proposed definitions for future trials. J Natl Cancer Inst 2007;99:998-1003 
  9. Clark TG, Bradburn MJ, Love SB, Altman DG. Survival analysis part I: basic concepts and first analyses. Br J Cancer 2003;89:232-238 
  10. Katzman JL, Shaham U, Cloninger A, Bates J, Jiang T, Kluger Y. DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med Res Methodol 2018;18:24 
  11. Tibshirani R. The lasso method for variable selection in the Cox model. Stat Med 1997;16:385-395 
  12. Park JE, Kim HS, Jo Y, Yoo RE, Choi SH, Nam SJ, et al. Radiomics prognostication model in glioblastoma using diffusion- and perfusion-weighted MRI. Sci Rep 2020;10:4250 
  13. Han K, Song K, Choi BW. How to develop, validate, and compare clinical prediction models involving radiological parameters: study design and statistical methods. Korean J Radiol 2016;17:339-350 
  14. Kim DW, Lee SS, Kim SO, Kim JH, Kim HJ, Byun JH, et al. Estimating recurrence after upfront surgery in patients with resectable pancreatic ductal adenocarcinoma by using pancreatic CT: development and validation of a risk score. Radiology 2020;296:541-551 
  15. Gensheimer MF, Narasimhan B. A scalable discrete-time survival model for neural networks. PeerJ 2019;7:e6257 
  16. Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS. Random survival forests. Ann Appl Stat 2008;2:841-860 
  17. Kim H, Goo JM, Lee KH, Kim YT, Park CM. Preoperative CT-based deep learning model for predicting disease-free survival in patients with lung adenocarcinomas. Radiology 2020;296:216-224 
  18. Uno H, Cai T, Pencina MJ, D'Agostino RB, Wei LJ. On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med 2011;30:1105-1117 
  19. Harrell FE Jr, Califf RM, Pryor DB, Lee KL, Rosati RA. Evaluating the yield of medical tests. JAMA 1982;247:2543-2546 
  20. Brentnall AR, Cuzick J. Use of the concordance index for predictors of censored survival data. Stat Methods Med Res 2018;27:2359-2373 
  21. Pencina MJ, D'Agostino RB Sr. Evaluating discrimination of risk prediction models: the C statistic. JAMA 2015;314:1063-1064 
  22. Park SH, Choi J, Byeon JS. Key principles of clinical validation, device approval, and insurance coverage decisions of artificial intelligence. Korean J Radiol 2021;22:442-453 
  23. Park SH, Goo JM, Jo CH. Receiver operating characteristic (ROC) curve: practical review for radiologists. Korean J Radiol 2004;5:11-18 
  24. Kamarudin AN, Cox T, Kolamunnage-Dona R. Time-dependent ROC curve analysis in medical research: current methods and applications. BMC Med Res Methodol 2017;17:53 
  25. Heagerty PJ, Zheng Y. Survival model predictive accuracy and ROC curves. Biometrics 2005;61:92-105 
  26. Harrell FE Jr, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 1996;15:361-387 
  27. Potapov S, Adler W, Schmid M. Package 'survAUC'. Cran. r-project.org Web site. https://cran.r-project.org/web/packages/survAUC/survAUC.pdf. Accessed April 29, 2021 
  28. Heagerty PJ, Saha-Chaudhuri P. Package 'risksetROC'. Cran. r-project.org Web site. https://cran.r-project.org/web/packages/risksetROC/risksetROC.pdf. Accessed April 29, 2021 
  29. Steyerberg EW. Evaluation of performance. In: Steyerberg EW, ed. Clinical prediction models: a practical approach to development, validation, and updating. New York: Springer-Verlag New York, 2010 
  30. Crowson CS, Atkinson EJ, Therneau TM. Assessing calibration of prognostic risk scores. Stat Methods Med Res 2016;25:1692-1706 
  31. Van Calster B, McLernon DJ, van Smeden M, Wynants L, Steyerberg EW. Calibration: the Achilles heel of predictive analytics. BMC Med 2019;17:230 
  32. Kuhn AM, Wing J, Weston S, Williams A, Keefer C, Engelhardt A, et al. Package 'caret'. Cran.r-project.org Web site. https://cran.r-project.org/web/packages/caret/caret.pdf. Accessed April 29, 2021 
  33. Frank E Harrell Jr. Package 'rms'. Cran.r-project.org Web site. https://cran.r-project.org/web/packages/rms/rms.pdf. Accessed April 29, 2021 
  34. Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, et al. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology 2010;21:128-138 
  35. Ji GW, Zhu FP, Xu Q, Wang K, Wu MY, Tang WW, et al. Radiomic features at contrast-enhanced CT predict recurrence in early stage hepatocellular carcinoma: a multi-institutional study. Radiology 2020;294:568-579 
  36. Kickingereder P, Neuberger U, Bonekamp D, Piechotta PL, Gotz M, Wick A, et al. Radiomic subtyping improves disease stratification beyond key molecular, clinical, and standard imaging characteristics in patients with glioblastoma. Neuro Oncol 2018;20:848-857 
  37. Gerds TA. Package 'pec'. Cran.r-project.org Web site. https://cran.r-project.org/web/packages/pec/pec.pdf. Accessed April 29, 2021 
  38. Austin PC, Pencinca MJ, Steyerberg EW. Predictive accuracy of novel risk factors and markers: a simulation study of the sensitivity of different performance measures for the Cox proportional hazards regression model. Stat Methods Med Res 2017;26:1053-1077 
  39. Rahman MS, Ambler G, Choodari-Oskooei B, Omar RZ. Review and evaluation of performance measures for survival prediction models in external validation settings. BMC Med Res Methodol 2017;17:60 
  40. Royston P, Sauerbrei W. A new measure of prognostic separation in survival data. Stat Med 2004;23:723-748 
  41. O'Quigley J, Xu R, Stare J. Explained randomness in proportional hazards models. Stat Med 2005;24:479-489 
  42. Kent JT, O'Quigley J. Measures of dependence for censored survival data. Biometrika 1988;75:525-534 
  43. Chu SG. Comparison of measures evaluating performance for a new factor in survival data. Riss.kr Web site. http://www.riss.kr/link?id=T14004195&outLink=K. Accessed April 29, 2021 
  44. Carpenter J, Bithell J. Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. Stat Med 2000;19:1141-1164 
  45. Bae S, Choi YS, Ahn SS, Chang JH, Kang SG, Kim EH, et al. Radiomic MRI phenotyping of glioblastoma: improving survival prediction. Radiology 2018;289:797-806 
  46. Kang L, Chen W, Petrick NA, Gallas BD. Comparing two correlated C indices with right-censored survival outcome: a one-shot nonparametric approach. Stat Med 2015;34:685-703 
  47. Park C, Kim JH, Kim PH, Kim SY, Gwon DI, Chu HH, et al. Imaging predictors of survival in patients with single small hepatocellular carcinoma treated with transarterial chemoembolization. Korean J Radiol 2021;22:213-224