DOI QR코드

DOI QR Code

예측모형의 머신러닝 방법론과 통계학적 방법론의 비교: 영상의학 연구에서의 적용

Machine Learning vs. Statistical Model for Prediction Modelling: Application in Medical Imaging Research

  • 유리하 (연세대학교 일반대학원 의학전산통계학협동과정) ;
  • 한경화 (연세대학교 의과대학 영상의학교실, 방사선의과학연구소, 의료영상데이터사이언스센터)
  • Leeha Ryu (Department of Biostatistics and Computing, Yonsei University Graduate School) ;
  • Kyunghwa Han (Department of Radiology, Research Institute of Radiological Science, Yonsei University College of Medicine)
  • 투고 : 2022.08.09
  • 심사 : 2022.11.13
  • 발행 : 2022.11.01

초록

최근 영상의학 연구 분야에서 영상 인자를 포함한 임상 예측 모형의 수요가 증가하고 있고, 특히 라디오믹스 연구가 활발하게 이루어지면서 기존의 전통적인 회귀 모형뿐만 아니라 머신러닝을 사용하는 연구들이 많아지고 있다. 본 종설에서는 영상의학 분야에서 예측 모형 연구에 사용된 통계학적 방법과 머신 러닝 방법들을 조사하여 정리하고, 각 방법론에 대한 설명과 장단점을 살펴보고자 한다. 마지막으로 예측 모형 연구에서 분석 방법 선택에서의 고려사항을 정리해 보고자 한다.

Clinical prediction models has been increasingly published in radiology research. In particular, as a radiomics research is being actively conducted, the prediction model is developed based on the traditional statistical model, as well as machine learning, to account for the high-dimensional data. In this review, we investigated the statistical and machine learning methods used in clinical prediction model research, and briefly summarized each analytical method for statistical model, machine learning, and statistical learning. Finally, we discussed several considerations for choosing the prediction modeling method.

키워드

과제정보

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2021R1I1A1A01059893).

참고문헌

  1. Han K, Song K, Choi BW. How to develop, validate, and compare clinical prediction models involving radiological parameters: study design and statistical methods. Korean J Radiol 2016;17:339-350 https://doi.org/10.3348/kjr.2016.17.3.339
  2. Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med 2015;162:55-63 https://doi.org/10.7326/M14-0697
  3. Gillies RJ, Kinahan PE, Hricak H. Radiomics: images are more than pictures, they are data. Radiology 2016;278:563-577 https://doi.org/10.1148/radiol.2015151169
  4. Bzdok D, Altman N, Krzywinski M. Statistics versus machine learning. Nat Methods 2018;15:233-234 https://doi.org/10.1038/nmeth.4642
  5. Rosner B. Fundamentals of biostatistics. Boston, MA: Cengage Learning 2015
  6. Kononenko I. Machine learning for medical diagnosis: history, state of the art and perspective. Artif Intell Med 2001;23:89-109 https://doi.org/10.1016/S0933-3657(01)00077-X
  7. Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction. 2nd ed. New York: Springer 2009
  8. Andaur Navarro CL, Damen JAA, Takada T, Nijman SWJ, Dhiman P, Ma J, et al. Risk of bias in studies on prediction models developed using supervised machine learning techniques: systematic review. BMJ 2021;375:n2281
  9. Yang C, Kors JA, Ioannou S, John LH, Markus AF, Rekkas A, et al. Trends in the conduct and reporting of clinical prediction model development and validation: a systematic review. J Am Med Inform Assoc 2022;29:983-989 https://doi.org/10.1093/jamia/ocac002
  10. Sun Z, Dong W, Shi H, Ma H, Cheng L, Huang Z. Comparing machine learning models and statistical models for predicting heart failure events: a systematic review and meta-analysis. Front Cardiovasc Med 2022;9:812276
  11. Sufriyana H, Husnayain A, Chen YL, Kuo CY, Singh O, Yeh TY, et al. Comparison of multivariable logistic regression and other machine learning algorithms for prognostic prediction studies in pregnancy care: systematic review and meta-analysis. JMIR Med Inform 2020;8:e16503
  12. Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Van Calster B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol 2019;110:12-22 https://doi.org/10.1016/j.jclinepi.2019.02.004
  13. Leger S, Zwanenburg A, Pilz K, Lohaus F, Linge A, Zophel K, et al. A comparative study of machine learning methods for time-to-event survival data for radiomics risk modelling. Sci Rep 2017;7:13206
  14. Parmar C, Grossmann P, Bussink J, Lambin P, Aerts HJWL. Machine learning methods for quantitative radiomic biomarkers. Sci Rep 2015;5:13087
  15. Pavlou M, Ambler G, Seaman S, De Iorio M, Omar RZ. Review and evaluation of penalised regression methods for risk prediction in low-dimensional data with few events. Stat Med 2016;35:1159-1177 https://doi.org/10.1002/sim.6782
  16. Park JE, Park SY, Kim HJ, Kim HS. Reproducibility and generalizability in radiomics modeling: possible strategies in radiologic and statistical perspectives. Korean J Radiol 2019;20:1124-1137 https://doi.org/10.3348/kjr.2018.0070
  17. Bae S, Choi YS, Ahn SS, Chang JH, Kang SG, Kim EH, et al. Radiomic MRI phenotyping of glioblastoma: improving survival prediction. Radiology 2018;289:797-806 https://doi.org/10.1148/radiol.2018180200
  18. Tan CO, Lam S, Kuppens D, Bergmans RHJ, Parameswaran BK, Forghani R, et al. Spot and diffuse signs: quantitative markers of intracranial hematoma expansion at dual-energy CT. Radiology 2018;290:179-186 https://doi.org/10.1148/radiol.2018180322
  19. Eun NL, Kang D, Son EJ, Park JS, Youk JH, Kim JA, et al. Texture analysis with 3.0-T MRI for association of response to neoadjuvant chemotherapy in breast cancer. Radiology 2020;294:31-41 https://doi.org/10.1148/radiol.2019182718
  20. Johnson KM, Johnson HE, Zhao Y, Dowe DA, Staib LH. Scoring of coronary artery disease characteristics on coronary CT angiograms by using machine learning. Radiology 2019;292:354-362 https://doi.org/10.1148/radiol.2019182061
  21. Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol 1996;49:1373-1379 https://doi.org/10.1016/S0895-4356(96)00236-3
  22. Peduzzi P, Concato J, Feinstein AR, Holford TR. Importance of events per independent variable in proportional hazards regression analysis. II. Accuracy and precision of regression estimates. J Clin Epidemiol 1995;48:1503-1510 https://doi.org/10.1016/0895-4356(95)00048-8
  23. Vittinghoff E, McCulloch CE. Relaxing the rule of ten events per variable in logistic and Cox regression. Am J Epidemiol 2007;165:710-718 https://doi.org/10.1093/aje/kwk052
  24. Riley RD, Ensor J, Snell KIE, Harrell FE Jr, Martin GP, Reitsma JB, et al. Calculating the sample size required for developing a clinical prediction model. BMJ 2020;368:m441
  25. Moons KG, Altman DG, Reitsma JB, Ioannidis JP, Macaskill P, Steyerberg EW, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med 2015;162:W1-W73 https://doi.org/10.7326/M14-0698
  26. TRIPOD. TRIPOD clustered data. Available at: https://www.tripod-statement.org/clustered/. Accessed July 8, 2022
  27. Collins GS, Dhiman P, Andaur Navarro CL, Ma J, Hooft L, Reitsma JB, et al. Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence. BMJ Open 2021;11:e048008