DOI QR코드

DOI QR Code

Comparison of survival prediction models for pancreatic cancer: Cox model versus machine learning models

  • Kim, Hyunsuk (Department of Statistics, University of California) ;
  • Park, Taesung (Department of Statistics, Seoul National University) ;
  • Jang, Jinyoung (Department of Surgery and Cancer Research Institute, Seoul National University College of Medicine) ;
  • Lee, Seungyeoun (Department of Mathematics and Statistics, Sejong University)
  • Received : 2022.05.23
  • Accepted : 2022.06.08
  • Published : 2022.06.30

Abstract

A survival prediction model has recently been developed to evaluate the prognosis of resected nonmetastatic pancreatic ductal adenocarcinoma based on a Cox model using two nationwide databases: Surveillance, Epidemiology and End Results (SEER) and Korea Tumor Registry System-Biliary Pancreas (KOTUS-BP). In this study, we applied two machine learning methods-random survival forests (RSF) and support vector machines (SVM)-for survival analysis and compared their prediction performance using the SEER and KOTUS-BP datasets. Three schemes were used for model development and evaluation. First, we utilized data from SEER for model development and used data from KOTUS-BP for external evaluation. Second, these two datasets were swapped by taking data from KOTUS-BP for model development and data from SEER for external evaluation. Finally, we mixed these two datasets half and half and utilized the mixed datasets for model development and validation. We used 9,624 patients from SEER and 3,281 patients from KOTUS-BP to construct a prediction model with seven covariates: age, sex, histologic differentiation, adjuvant treatment, resection margin status, and the American Joint Committee on Cancer 8th edition T-stage and N-stage. Comparing the three schemes, the performance of the Cox model, RSF, and SVM was better when using the mixed datasets than when using the unmixed datasets. When using the mixed datasets, the C-index, 1-year, 2-year, and 3-year time-dependent areas under the curve for the Cox model were 0.644, 0.698, 0.680, and 0.687, respectively. The Cox model performed slightly better than RSF and SVM.

Keywords

Acknowledgement

This research was supported by a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (grant number: HI16C2037).

References

  1. Kang JS, Mok L, Heo JS, Han IW, Shin SH, Yoon YS, et al. Development and external validation of survival prediction model for pancreatic cancer using two nationwide databases: Surveillance, Epidemiology and End Results (SEER) and Korea Tumor Registry System-Biliary Pancreas (KOTUS-BP). Gut Liver 2021;15:912-921. https://doi.org/10.5009/gnl20306
  2. Hothorn T, Lausen B, Benner A, Radespiel-Troger M. Bagging survival trees. Stat Med 2004;23:77-91. https://doi.org/10.1002/sim.1593
  3. Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS. Random survival forests. Ann Appl Stat 2008;2:841-860.
  4. Ishwaran H, Kogalur UB, Chen X, Minn AJ. Random survival forests for high-dimensional data. Stat Anal Data Mining ASA Data Sci J 2011;4:115-132. https://doi.org/10.1002/sam.10103
  5. Shivaswamy PK, Chu W, Jansche M. A support vector approach to censored targets. In: 7th IEEE International Conference on Data Mining (ICDM 2007), 2007 Oct 28-31, Omaha, NE, USA. New York: Institute of Electrical and Electronics Engineers, 2008. pp. 655-660.
  6. Khan FM, Zubek VB. Support vector regression for censored data (SVRc): a novel tool for survival analysis. In: 8th IEEE International Conference on Data Mining, 2008 Dec 15-19, Pisa, Italy. New York: Institute of Electrical and Electronics Engineers, 2009. pp. 863-868.
  7. Van Belle V, Pelckmans K, Suykens JA, Van Huffel S. Support vector machines for survival analysis. In: Proceedings of the Third International Conference on Computational Intelligence in Medicine and Healthcare (CIMED2007), 2007 Jul 25-27, Plymouth, UK. pp. 1-8.
  8. Van Belle V, Pelckmans K, Van Huffel S, Suykens JA. Support vector methods for survival analysis: a comparison between ranking and regression approaches. Artif Intell Med 2011;53:107-118. https://doi.org/10.1016/j.artmed.2011.06.006
  9. De Bin R. Boosting in Cox regression: a comparison between the likelihood-based and the model-based approaches with focus on the R-packages CoxBoost and mboost. Technical report number 180. Munchen: University of Munich, 2015.
  10. Herbrich R, Graepel T, Obermayer K. Large margin rank boundaries for ordinal regression. In: Advances in Large Margin Classifiers (Smola A, Bertlett P, Scholkopf B, Schuurmans D, eds.). Cambridge: MIT Press, 2000. pp. 115-132.
  11. Daemen A, De Moor B. Development of a kernel function for clinical data. Annu Int Conf IEEE Eng Med Biol Soc 2009;2009: 5913-5917.
  12. Segal MR. Regression trees for censored data. Biometrics 1988;44:35-47. https://doi.org/10.2307/2531894
  13. LeBlanc M, Crowley J. Survival trees by goodness of split. J Am Stat Assoc 1993;88:457-467. https://doi.org/10.1080/01621459.1993.10476296
  14. Brier GW. Verification of forecasts expressed in terms of probability. Monthly Weather Rev 1950;78:1-3. https://doi.org/10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2
  15. Hothorn T, Lausen B. On the exact distribution of maximally selected rank statistics. Comp Stat Data Anal 2003;43:121-137. https://doi.org/10.1016/S0167-9473(02)00225-6
  16. Katzman JL, Shaham U, Cloninger A, Bates J, Jiang T, Kluger Y. DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med Res Methodol 2018;18:24. https://doi.org/10.1186/s12874-018-0482-1