DOI QR코드

DOI QR Code

Development and Validation of a Prediction Model: Application to Digestive Cancer Research

예측모형의 구축과 검증: 소화기암연구 사례를 중심으로

  • Yonghan Kwon (Department of Biostatistics and Computing, Yonsei University Graduate School) ;
  • Kyunghwa Han (Department of Radiology, Research Institute of Radiological Science and Center for Clinical Imaging Data Science, Yonsei University College of Medicine)
  • 권용한 (연세대학교 일반대학원 의학전산통계학협동과정) ;
  • 한경화 (연세대학교 의과대학 영상의학교실 방사선의과학연구소 의료영상데이터사이언스센터)
  • Received : 2023.11.01
  • Accepted : 2023.12.04
  • Published : 2023.12.20

Abstract

Prediction is a significant topic in clinical research. The development and validation of a prediction model has been increasingly published in clinical research. In this review, we investigated analytical methods and validation schemes for a clinical prediction model used in digestive cancer research. Deep learning and logistic regression, with split-sample validation as an internal or external validation, were the most commonly used methods. Furthermore, we briefly introduced and summarized the advantages and disadvantages of each method. Finally, we discussed several points to consider when conducting prediction model studies.

Keywords

Acknowledgement

이 논문은 2021년도 정부(교육부)의 재원으로 한국연구재단의 지원을 받아 수행된 기초연구사업임(No. NRF-2021R1I1A1A01059893). 연구비 제공자는 연구 설계, 데이터 수집 및 분석, 출판준비 및 결정에 아무런 역할을 하지 않았다.

References

  1. Yang C, Kors JA, Ioannou S, et al. Trends in the conduct and reporting of clinical prediction model development and validation: a systematic review. J Am Med Inform Assoc 2022;29:983-989. https://doi.org/10.1093/jamia/ocac002
  2. van den Boorn HG, Engelhardt EG, van Kleef J, et al. Prediction models for patients with esophageal or gastric cancer: a systematic review and meta-analysis. PLoS One 2018;13:e0192310. https://doi.org/10.1371/journal.pone.0192310
  3. Backes Y, Schwartz MP, Ter Borg F, et al. Multicentre prospective evaluation of real-time optical diagnosis of T1 colorectal cancer in large non-pedunculated colorectal polyps using narrow band imaging (the OPTICAL study). Gut 2019;68:271-279. https://doi.org/10.1136/gutjnl-2017-314723
  4. Bae JS, Chang W, Kim SH, et al. Development of a predictive model for extragastric recurrence after curative resection for early gastric cancer. Gastric Cancer 2022;25: 255-264. https://doi.org/10.1007/s10120-021-01217-1
  5. Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): the TRIPOD statement.. Ann Intern Med 2015;162:55-63. https://doi.org/10.7326/m14-0697
  6. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;521:436-444. https://doi.org/10.1038/nature14539
  7. Shrestha A, Mahmood A. Review of deep learning algorithms and architectures. IEEE Access 2019;7:53040-53065. https://doi.org/10.1109/ACCESS.2019.2912200
  8. Gong EJ, Bang CS, Lee JJ, et al. Deep learning-based clinical decision support system for gastric neoplasms in real-time endoscopy: development and validation study. Endoscopy 2023;55:701-708. https://doi.org/10.1055/a-2031-0691
  9. Agresti A. Categorical data analysis. Hoboken: John Wiley & Sons, 2012.
  10. Geng ZH, Zhu Y, Li QL, et al. Muscular injury as an independent risk factor for esophageal stenosis after endoscopic submucosal dissection of esophageal squamous cell cancer. Gastrointest Endosc 2023;98:534-542.e7. https://doi.org/10.1016/j.gie.2023.05.046
  11. Cox DR. Regression models and life-tables. J Royal Stat Soc Ser B 1972;34:187-220.
  12. George B, Seals S, Aban I. Survival analysis and regression models. J Nucl Cardiol 2014;21:686-694. https://doi. org/10.1007/s12350-014-9908-2
  13. Breiman L. Random forests. Mach Learn 2001;45:5-32. https://doi.org/10.1023/A:1010933404324
  14. Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction. New York: Springer, 2009. p.587-604.
  15. Ziegler A, Konig IR. Mining data with random forests: current options for real-world applications. WIREs 2014;4:55-63. https://doi.org/10.1002/widm.1114
  16. Liwinski T, Casar C, Ruehlemann MC, et al. A diseasespecific decline of the relative abundance of Bifidobacterium in patients with autoimmune hepatitis. Aliment Pharmacol Ther 2020;51:1417-1428. https://doi.org/10.1111/apt.15754
  17. Friedman JH. Greedy function approximation: a gradient boosting machine. Ann Statist 2001;95:1189-1232. https://doi.org/10.1214/aos/1013203451
  18. Chen T, He T, Benesty M, et al. Xgboost: extreme gradient boosting. R package version 0.4-2. 2015. https://xgboost.readthedocs.io/en/stable/R-package/xgboostPresentation.html (accessed Oct 1, 2023).
  19. Ke G, Meng Q, Finley T, et al. LightGBM: a highly efficient gradient boosting decision tree [abstract]. In: Proceedings of the 31st International Conference on Neural Information Processing Systems; 2017 Dec 4-9; Long Beach, USA. p.3149-3157.
  20. Sagi O, Rokach L. Approximating XGBoost with an interpretable decision tree. Inf Sci 2021;572:522-542. https://doi.org/10.1016/j.ins.2021.05.055
  21. Kwon Y, Kwon JW, Ha J, et al. Remission of type 2 diabetes after gastrectomy for gastric cancer: diabetes prediction score. Gastric Cancer 2022;25:265-274. https://doi.org/10.1007/s10120-021-01216-2
  22. Tibshirani R. Regression shrinkage and selection via the lasso. J Royal Stat Soc Ser B 1996;58:267-288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
  23. Hastie T, Tibshirani R, Wainwright M. Statistical learning with sparsity: the lasso and generalizations. Boca Raton: CRC Press, 2015.
  24. Zou H, Hastie T. Regularization and variable selection via the elastic net. J Royal Stat Soc Ser B 2005;67:301-320. https://doi.org/10.1111/j.1467-9868.2005.00503.x
  25. Ali H, Patel P, Malik TF, et al. Endoscopic sleeve gastroplasty reintervention score using supervised machine learning. Gastrointest Endosc 2023;98:747-754.e5. https://doi.org/10.1016/j.gie.2023.05.059
  26. Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B. Support vector machines. IEEE Intell Syst Appl 1998;13:18-28. https://doi.org/10.1109/5254.708428
  27. Chen PH, Lin CJ, Scholkopf B. A tutorial on ν-support vector machines. Appl Stoch Models Bus Ind 2005;21:111-136. https://doi.org/10.1002/asmb.537
  28. Salcedo-Sanz S, Rojo-Alvarez JL, Martinez-Ramon M, Camps-Valls G. Support vector machines in engineering: an overview. WIREs 2014;4:234-267. https://doi.org/10.1002/widm.1125
  29. Yu S, Li Y, Liao Z, et al. Plasma extracellular vesicle long RNA profiling identifies a diagnostic signature for the detection of pancreatic ductal adenocarcinoma. Gut 2020;69:540-550. https://doi.org/10.1136/gutjnl-2019-318860
  30. Bleeker SE, Moll HA, Steyerberg EW, et al. External validation is necessary in prediction research: a clinical example. J Clin Epidemiol 2003;56:826-832. https://doi.org/10.1016/s0895-4356(03)00207-5
  31. Xu Y, Goodacre R. On splitting training and validation set: a comparative study of cross-validation, bootstrap and systematic sampling for estimating the generalization performance of supervised learning. J Anal Test 2018;2:249-262. https://doi.org/10.1007/s41664-018-0068-2
  32. Gholamy A, Kreinovich V, Kosheleva O. Why 70/30 or 80/20 relation between training and testing sets: a pedagogical explanation. 2018 Feb. Report No.: UTEP-CS-18-09.
  33. Prechelt L. Early stopping-but when? In: Orr GB, Muller KR, eds. Neural networks: tricks of the trade. Berlin, Heidelberg: Springer, 2002:55-69.
  34. Berrar D. Cross-validation. Encycl Bioinform Comput Biol 2019;1:542-545. https://doi.org/10.1016/B978-0-12-809633-8.20349-X
  35. Efron B, Tibshirani RJ. An introduction to the bootstrap. New York: Chapman and Hall, 1994.
  36. Harrell FE. Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis. Cham: Springer International Publishing, 2015.
  37. Kuhn M, Johnson K. Applied predictive modeling. New York: Springer, 2013.
  38. Riley RD, Ensor J, Snell KIE, et al. Calculating the sample size required for developing a clinical prediction model. BMJ 2020;368:m441. https://doi.org/10.1136/bmj.m441
  39. Van Calster B, Steyerberg EW, Wynants L, van Smeden M. There is no such thing as a validated prediction model. BMC Med 2023;21:70. https://doi.org/10.1186/s12916-023-02779-w
  40. Moons KG, Altman DG, Reitsma JB, et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med 2015;162:W1-W73. https://doi.org/10.7326/M14-0698
  41. Collins GS, Moons KGM. Reporting of artificial intelligence prediction models. Lancet 2019;393:1577-1579. https://doi.org/10.1016/S0140-6736(19)30037-6