Development and Validation of a Prediction Model: Application to Digestive Cancer Research

Yonghan Kwon;Kyunghwa Han;

doi:10.52927/jdcr.2023.11.3.157

Journal of Digestive Cancer Research

Volume 11 Issue 3
/
Pages.157-164
/
2023
/
2950-9394(pISSN)
/
2950-9505(eISSN)

Korean Society of Gastrointestinal Cancer Research (대한소화기암연구학회)

DOI QR Code

Development and Validation of a Prediction Model: Application to Digestive Cancer Research

예측모형의 구축과 검증: 소화기암연구 사례를 중심으로

Yonghan Kwon (Department of Biostatistics and Computing, Yonsei University Graduate School) ;
Kyunghwa Han (Department of Radiology, Research Institute of Radiological Science and Center for Clinical Imaging Data Science, Yonsei University College of Medicine)

권용한 (연세대학교 일반대학원 의학전산통계학협동과정) ;
한경화 (연세대학교 의과대학 영상의학교실 방사선의과학연구소 의료영상데이터사이언스센터)

Received : 2023.11.01
Accepted : 2023.12.04
Published : 2023.12.20

https://doi.org/10.52927/jdcr.2023.11.3.157 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

Prediction is a significant topic in clinical research. The development and validation of a prediction model has been increasingly published in clinical research. In this review, we investigated analytical methods and validation schemes for a clinical prediction model used in digestive cancer research. Deep learning and logistic regression, with split-sample validation as an internal or external validation, were the most commonly used methods. Furthermore, we briefly introduced and summarized the advantages and disadvantages of each method. Finally, we discussed several points to consider when conducting prediction model studies.

Keywords

Acknowledgement

이 논문은 2021년도 정부(교육부)의 재원으로 한국연구재단의 지원을 받아 수행된 기초연구사업임(No. NRF-2021R1I1A1A01059893). 연구비 제공자는 연구 설계, 데이터 수집 및 분석, 출판준비 및 결정에 아무런 역할을 하지 않았다.

References

Yang C, Kors JA, Ioannou S, et al. Trends in the conduct and reporting of clinical prediction model development and validation: a systematic review. J Am Med Inform Assoc 2022;29:983-989. https://doi.org/10.1093/jamia/ocac002
van den Boorn HG, Engelhardt EG, van Kleef J, et al. Prediction models for patients with esophageal or gastric cancer: a systematic review and meta-analysis. PLoS One 2018;13:e0192310. https://doi.org/10.1371/journal.pone.0192310
Backes Y, Schwartz MP, Ter Borg F, et al. Multicentre prospective evaluation of real-time optical diagnosis of T1 colorectal cancer in large non-pedunculated colorectal polyps using narrow band imaging (the OPTICAL study). Gut 2019;68:271-279. https://doi.org/10.1136/gutjnl-2017-314723
Bae JS, Chang W, Kim SH, et al. Development of a predictive model for extragastric recurrence after curative resection for early gastric cancer. Gastric Cancer 2022;25: 255-264. https://doi.org/10.1007/s10120-021-01217-1
Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): the TRIPOD statement.. Ann Intern Med 2015;162:55-63. https://doi.org/10.7326/m14-0697
LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;521:436-444. https://doi.org/10.1038/nature14539
Shrestha A, Mahmood A. Review of deep learning algorithms and architectures. IEEE Access 2019;7:53040-53065. https://doi.org/10.1109/ACCESS.2019.2912200
Gong EJ, Bang CS, Lee JJ, et al. Deep learning-based clinical decision support system for gastric neoplasms in real-time endoscopy: development and validation study. Endoscopy 2023;55:701-708. https://doi.org/10.1055/a-2031-0691
Agresti A. Categorical data analysis. Hoboken: John Wiley & Sons, 2012.
Geng ZH, Zhu Y, Li QL, et al. Muscular injury as an independent risk factor for esophageal stenosis after endoscopic submucosal dissection of esophageal squamous cell cancer. Gastrointest Endosc 2023;98:534-542.e7. https://doi.org/10.1016/j.gie.2023.05.046
Cox DR. Regression models and life-tables. J Royal Stat Soc Ser B 1972;34:187-220.
George B, Seals S, Aban I. Survival analysis and regression models. J Nucl Cardiol 2014;21:686-694. https://doi. org/10.1007/s12350-014-9908-2
Breiman L. Random forests. Mach Learn 2001;45:5-32. https://doi.org/10.1023/A:1010933404324
Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction. New York: Springer, 2009. p.587-604.
Ziegler A, Konig IR. Mining data with random forests: current options for real-world applications. WIREs 2014;4:55-63. https://doi.org/10.1002/widm.1114
Liwinski T, Casar C, Ruehlemann MC, et al. A diseasespecific decline of the relative abundance of Bifidobacterium in patients with autoimmune hepatitis. Aliment Pharmacol Ther 2020;51:1417-1428. https://doi.org/10.1111/apt.15754
Friedman JH. Greedy function approximation: a gradient boosting machine. Ann Statist 2001;95:1189-1232. https://doi.org/10.1214/aos/1013203451
Chen T, He T, Benesty M, et al. Xgboost: extreme gradient boosting. R package version 0.4-2. 2015. https://xgboost.readthedocs.io/en/stable/R-package/xgboostPresentation.html (accessed Oct 1, 2023).
Ke G, Meng Q, Finley T, et al. LightGBM: a highly efficient gradient boosting decision tree [abstract]. In: Proceedings of the 31st International Conference on Neural Information Processing Systems; 2017 Dec 4-9; Long Beach, USA. p.3149-3157.
Sagi O, Rokach L. Approximating XGBoost with an interpretable decision tree. Inf Sci 2021;572:522-542. https://doi.org/10.1016/j.ins.2021.05.055
Kwon Y, Kwon JW, Ha J, et al. Remission of type 2 diabetes after gastrectomy for gastric cancer: diabetes prediction score. Gastric Cancer 2022;25:265-274. https://doi.org/10.1007/s10120-021-01216-2
Tibshirani R. Regression shrinkage and selection via the lasso. J Royal Stat Soc Ser B 1996;58:267-288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
Hastie T, Tibshirani R, Wainwright M. Statistical learning with sparsity: the lasso and generalizations. Boca Raton: CRC Press, 2015.
Zou H, Hastie T. Regularization and variable selection via the elastic net. J Royal Stat Soc Ser B 2005;67:301-320. https://doi.org/10.1111/j.1467-9868.2005.00503.x
Ali H, Patel P, Malik TF, et al. Endoscopic sleeve gastroplasty reintervention score using supervised machine learning. Gastrointest Endosc 2023;98:747-754.e5. https://doi.org/10.1016/j.gie.2023.05.059
Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B. Support vector machines. IEEE Intell Syst Appl 1998;13:18-28. https://doi.org/10.1109/5254.708428
Chen PH, Lin CJ, Scholkopf B. A tutorial on ν-support vector machines. Appl Stoch Models Bus Ind 2005;21:111-136. https://doi.org/10.1002/asmb.537
Salcedo-Sanz S, Rojo-Alvarez JL, Martinez-Ramon M, Camps-Valls G. Support vector machines in engineering: an overview. WIREs 2014;4:234-267. https://doi.org/10.1002/widm.1125
Yu S, Li Y, Liao Z, et al. Plasma extracellular vesicle long RNA profiling identifies a diagnostic signature for the detection of pancreatic ductal adenocarcinoma. Gut 2020;69:540-550. https://doi.org/10.1136/gutjnl-2019-318860
Bleeker SE, Moll HA, Steyerberg EW, et al. External validation is necessary in prediction research: a clinical example. J Clin Epidemiol 2003;56:826-832. https://doi.org/10.1016/s0895-4356(03)00207-5
Xu Y, Goodacre R. On splitting training and validation set: a comparative study of cross-validation, bootstrap and systematic sampling for estimating the generalization performance of supervised learning. J Anal Test 2018;2:249-262. https://doi.org/10.1007/s41664-018-0068-2
Gholamy A, Kreinovich V, Kosheleva O. Why 70/30 or 80/20 relation between training and testing sets: a pedagogical explanation. 2018 Feb. Report No.: UTEP-CS-18-09.
Prechelt L. Early stopping-but when? In: Orr GB, Muller KR, eds. Neural networks: tricks of the trade. Berlin, Heidelberg: Springer, 2002:55-69.
Berrar D. Cross-validation. Encycl Bioinform Comput Biol 2019;1:542-545. https://doi.org/10.1016/B978-0-12-809633-8.20349-X
Efron B, Tibshirani RJ. An introduction to the bootstrap. New York: Chapman and Hall, 1994.
Harrell FE. Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis. Cham: Springer International Publishing, 2015.
Kuhn M, Johnson K. Applied predictive modeling. New York: Springer, 2013.
Riley RD, Ensor J, Snell KIE, et al. Calculating the sample size required for developing a clinical prediction model. BMJ 2020;368:m441. https://doi.org/10.1136/bmj.m441
Van Calster B, Steyerberg EW, Wynants L, van Smeden M. There is no such thing as a validated prediction model. BMC Med 2023;21:70. https://doi.org/10.1186/s12916-023-02779-w
Moons KG, Altman DG, Reitsma JB, et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med 2015;162:W1-W73. https://doi.org/10.7326/M14-0698
Collins GS, Moons KGM. Reporting of artificial intelligence prediction models. Lancet 2019;393:1577-1579. https://doi.org/10.1016/S0140-6736(19)30037-6

Journal of Digestive Cancer Research

Development and Validation of a Prediction Model: Application to Digestive Cancer Research

예측모형의 구축과 검증: 소화기암연구 사례를 중심으로

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)