Key Principles of Clinical Validation, Device Approval, and Insurance Coverage Decisions of Artificial Intelligence

Seong Ho Park;Jaesoon Choi;Jeong-Sik Byeon;

doi:10.3348/kjr.2021.0048

Korean Journal of Radiology

Volume 22 Issue 3
/
Pages.442-453
/
2021
/
1229-6929(pISSN)
/
2005-8330(eISSN)

The Korean Society of Radiology (대한영상의학회)

DOI QR Code

Key Principles of Clinical Validation, Device Approval, and Insurance Coverage Decisions of Artificial Intelligence

Seong Ho Park (Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine) ;
Jaesoon Choi (Department of Biomedical Engineering, Asan Medical Center, University of Ulsan College of Medicine) ;
Jeong-Sik Byeon (Department of Gastroenterology, Asan Medical Center, University of Ulsan College of Medicine)

Received : 2021.01.15
Accepted : 2021.01.15
Published : 2021.03.01

https://doi.org/10.3348/kjr.2021.0048 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

Artificial intelligence (AI) will likely affect various fields of medicine. This article aims to explain the fundamental principles of clinical validation, device approval, and insurance coverage decisions of AI algorithms for medical diagnosis and prediction. Discrimination accuracy of AI algorithms is often evaluated with the Dice similarity coefficient, sensitivity, specificity, and traditional or free-response receiver operating characteristic curves. Calibration accuracy should also be assessed, especially for algorithms that provide probabilities to users. As current AI algorithms have limited generalizability to real-world practice, clinical validation of AI should put it to proper external testing and assisting roles. External testing could adopt diagnostic case-control or diagnostic cohort designs. A diagnostic case-control study evaluates the technical validity/accuracy of AI while the latter tests the clinical validity/accuracy of AI in samples representing target patients in real-world clinical scenarios. Ultimate clinical validation of AI requires evaluations of its impact on patient outcomes, referred to as clinical utility, and for which randomized clinical trials are ideal. Device approval of AI is typically granted with proof of technical validity/accuracy and thus does not intend to directly indicate if AI is beneficial for patient care or if it improves patient outcomes. Neither can it categorically address the issue of limited generalizability of AI. After achieving device approval, it is up to medical professionals to determine if the approved AI algorithms are beneficial for real-world patient care. Insurance coverage decisions generally require a demonstration of clinical utility that the use of AI has improved patient outcomes.

Keywords

Acknowledgement

This article is a republication of the original paper published in Korean in the Journal of the Korean Medical Association (J Korean Med Assoc 2020;63:696-708), translated into English with the original publisher's consent.

References

Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med 2019;25:44-56 https://doi.org/10.1038/s41591-018-0300-7
Park SH, Lim TH. Artificial intelligence: guide for healthcare personnel. Seoul: Koonja, 2020
Do S, Song KD, Chung JW. Basics of deep learning: a radiologist's guide to understanding published radiology articles on deep learning. Korean J Radiol Korean J Radiol 2020;21:33-41 https://doi.org/10.3348/kjr.2019.0312
Choi JS, Han BK, Ko ES, Bae JM, Ko EY, Song SH, et al. Effect of a deep learning framework-based computer-aided diagnosis system on the diagnostic performance of radiologists in differentiating between malignant and benign masses on breast ultrasonography. Korean J Radiol 2019;20:749-758 https://doi.org/10.3348/kjr.2018.0530
Park SH, Goo JM, Jo CH. Receiver operating characteristic (ROC) curve: practical review for radiologists. Korean J Radiol 2004;5:11-18 https://doi.org/10.3348/kjr.2004.5.1.11
Park SH, Han K. Methodologic guide for evaluating clinical performance and effect of artificial intelligence technology for medical diagnosis and prediction. Radiology 2018;286:800-809 https://doi.org/10.1148/radiol.2017171920
Tajbakhsh N, Gurudu SR, Liang J. Automatic polyp detection in colonoscopy videos using an ensemble of convolutional neural networks. Proceedings of 2015 IEEE 12th International Symposium on Biomedical Imaging; 2015 Apr 16-19; New York, USA: IEEE; 2015; p. 79-83
Moskowitz CS. Using free-response receiver operating characteristic curves to assess the accuracy of machine diagnosis of cancer. JAMA 2017;318:2250-2251 https://doi.org/10.1001/jama.2017.18686
Chakraborty DP. Welcome to Prof. Dev Chakraborty's FROC methodology. Devchakraborty.com Web site. http://www.devchakraborty.com/. Published 2019. Accessed September 14, 2020
Mutasa S, Sun S, Ha R. Understanding artificial intelligence based radiology studies: what is overfitting? Clin Imaging 2020;65:96-99 https://doi.org/10.1016/j.clinimag.2020.04.025
Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med 2019;17:195
Zech JR, Badgeley MA, Liu M, Costa AB, Titano JJ, Oermann EK. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLoS Med 2018;15:e1002683
Ting DSW, Cheung CY, Lim G, Tan GSW, Quang ND, Gan A, et al. Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. JAMA 2017;318:2211-2223 https://doi.org/10.1001/jama.2017.18152
Ridley EL. Deep-learning algorithms need real-world testing. Auntminnie.com Web site. https://www.auntminnie.com/index.aspx?sec=nws&sub=rad&pag=dis&ItemID=123871. Published 2018. Accessed September 14, 2020
Hwang EJ, Park S, Jin KN, Kim JI, Choi SY, Lee JH, et al. Development and validation of a deep learning-based automated detection algorithm for major thoracic diseases on chest radiographs. JAMA Netw Open 2019;2:e191095
Lee JH, Joo I, Kang TW, Paik YH, Sinn DH, Ha SY, et al. Deep learning with ultrasonography: automated classification of liver fibrosis using a deep convolutional neural network. Eur Radiol 2020;30:1264-1273 https://doi.org/10.1007/s00330-019-06407-1
Ghassemi M, Naumann T, Schulam P, Beam AL, Chen IY, Ranganath R. Practical guidance on artificial intelligence for health-care data. Lancet Digit Health 2019;1:e157-e159 https://doi.org/10.1016/S2589-7500(19)30084-6
Park SH, Kim YH, Lee JY, Yoo S, Kim CJ. Ethical challenges regarding artificial intelligence in medicine from the perspective of scientific editing and peer review. Sci Ed 2019;6:91-98 https://doi.org/10.6087/kcse.164
Willemink MJ, Koszek WA, Hardell C, Wu J, Fleischmann D, Harvey H, et al. Preparing medical imaging data for machine learning. Radiology 2020;295:4-15 https://doi.org/10.1148/radiol.2020192224
Adamson AS, Welch HG. Machine learning and the cancer-diagnosis problem-no gold standard. N Engl J Med 2019;381:2285-2287 https://doi.org/10.1056/NEJMp1907407
Bluemke DA, Moy L, Bredella MA, Ertl-Wagner BB, Fowler KJ, Goh VJ, et al. Assessing radiology research on artificial intelligence: a brief guide for authors, reviewers, and readers-from the radiology editorial board. Radiology 2020;294:487-489 https://doi.org/10.1148/radiol.2019192515
Mehta MC, Katz IT, Jha AK. Transforming global health with AI. N Engl J Med 2020;382:791-793 https://doi.org/10.1056/NEJMp1912079
Nevin L; PLOS medicine editors. Advancing the beneficial use of machine learning in health care and medicine: toward a community understanding. PLoS Med 2018;15:e1002708
Nsoesie EO. Evaluating artificial intelligence applications in clinical settings. JAMA Netw Open 2018;1:e182658
Parikh RB, Obermeyer Z, Navathe AS. Regulation of predictive analytics in medicine. Science 2019;363:810-812 https://doi.org/10.1126/science.aaw0029
Park SH, Do KH, Choi JI, Sim JS, Yang DM, Eo H, et al. Principles for evaluating the clinical implementation of novel digital healthcare devices. J Korean Med Assoc 2018;61:765-775 https://doi.org/10.5124/jkma.2018.61.12.765
Van Calster B, Wynants L, Timmerman D, Steyerberg EW, Collins GS. Predictive analytics in health care: how can we know it works? J Am Med Inform Assoc 2019;26:1651-1654 https://doi.org/10.1093/jamia/ocz130
Yu KH, Kohane IS. Framing the challenges of artificial intelligence in medicine. BMJ Qual Saf 2019;28:238-241 https://doi.org/10.1136/bmjqs-2018-008551
Tang A, Tam R, Cadrin-Chenevert A, Guest W, Chong J, Barfett J, et al. Canadian Association of Radiologists white paper on artificial intelligence in radiology. Can Assoc Radiol J 2018;69:120-135 https://doi.org/10.1016/j.carj.2018.02.002
Kim DW, Jang HY, Kim KW, Shin Y, Park SH. Design characteristics of studies reporting the performance of artificial intelligence algorithms for diagnostic analysis of medical images: results from recently published papers. Korean J Radiol 2019;20:405-410 https://doi.org/10.3348/kjr.2019.0025
Kim DW, Jang HY, Ko Y, Son JH, Kim PH, Kim SO, et al. Inconsistency in the use of the term "validation" in studies reporting the performance of deep learning algorithms in providing diagnosis from medical imaging. PLoS One 2020;15:e0238908
Faes L, Liu X, Wagner SK, Fu DJ, Balaskas K, Sim DA, et al. A clinician's guide to artificial intelligence: how to critically appraise machine learning studies. Transl Vis Sci Technol 2020;9:7
Liu X, Faes L, Kale AU, Wagner SK, Fu DJ, Bruynseels A, et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit Health 2019;1:e271-e297 https://doi.org/10.1016/S2589-7500(19)30123-2
Park SH. Diagnostic case-control versus diagnostic cohort studies for clinical validation of artificial intelligence algorithm performance. Radiology 2019;290:272-273 https://doi.org/10.1148/radiol.2018182294
Rutjes AW, Reitsma JB, Vandenbroucke JP, Glas AS, Bossuyt PM. Case-control and two-gate designs in diagnostic accuracy studies. Clin Chem 2005;51:1335-1341 https://doi.org/10.1373/clinchem.2005.048595
McGlothlin AE, Lewis RJ. Minimal clinically important difference: defining what really matters to patients. JAMA 2014;312:1342-1343 https://doi.org/10.1001/jama.2014.13128
Eng J. Sample size estimation: how many individuals should be studied? Radiology 2003;227:309-313 https://doi.org/10.1148/radiol.2272012051
Obuchowski NA. Sample size calculations in studies of test accuracy. Stat Methods Med Res 1998;7:371-392 https://doi.org/10.1177/096228029800700405
Ahn S, Park SH, Lee KH. How to demonstrate similarity by using noninferiority and equivalence statistical testing in radiology research. Radiology 2013;267:328-338 https://doi.org/10.1148/radiol.12120725
Liu X, Cruz Rivera S, Moher D, Calvert MJ, Denniston AK; SPIRIT-AI and CONSORT-AI Working Group. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Nat Med 2020;26:1364-1374 https://doi.org/10.1038/s41591-020-1034-x
INFANT Collaborative Group. Computerised interpretation of fetal heart rate during labour (INFANT): a randomised controlled trial. Lancet 2017;389:1719-1729 https://doi.org/10.1016/S0140-6736(17)30568-8
Wijnberge M, Geerts BF, Hol L, Lemmers N, Mulder MP, Berge P, et al. Effect of a machine learning-derived early warning system for intraoperative hypotension vs standard care on depth and duration of intraoperative hypotension during elective noncardiac surgery: the HYPE randomized clinical trial. JAMA 2020;323:1052-1060 https://doi.org/10.1001/jama.2020.0592
Repici A, Badalamenti M, Maselli R, Correale L, Radaelli F, Rondonotti E, et al. Efficacy of real-time computer-aided detection of colorectal neoplasia in a randomized trial. Gastroenterology 2020;159:512-520 https://doi.org/10.1053/j.gastro.2020.04.062
Wang P, Liu X, Berzin TM, Glissen Brown JR, Liu P, Zhou C, et al. Effect of a deep-learning computer-aided detection system on adenoma detection during colonoscopy (CADe-DB trial): a double-blind randomised study. Lancet Gastroenterol Hepatol 2020;5:343-351 https://doi.org/10.1016/S2468-1253(19)30411-X
Wang P, Berzin TM, Glissen Brown JR, Bharadwaj S, Becq A, Xiao X, et al. Real-time automatic detection system increases colonoscopic polyp and adenoma detection rates: a prospective randomised controlled study. Gut 2019;68:1813-1819 https://doi.org/10.1136/gutjnl-2018-317500
Wu L, Zhang J, Zhou W, An P, Shen L, Liu J, et al. Randomised controlled trial of WISENSE, a real-time quality improving system for monitoring blind spots during esophagogastroduodenoscopy. Gut 2019;68:2161-2169 https://doi.org/10.1136/gutjnl-2018-317366
Park SH. Regulatory approval versus clinical validation of artificial intelligence diagnostic tools. Radiology 2018;288:910-911 https://doi.org/10.1148/radiol.2018181310
Eaneff S, Obermeyer Z, Butte AJ. The Case for Algorithmic Stewardship for Artificial Intelligence and Machine Learning Technologies. JAMA 2020 Sep [Epub]. https://doi.org/10.1001/jama.2020.9371
Park SH, Do KH, Kim S, Park JH, Lim YS. What should medical students know about artificial intelligence in medicine? J Educ Eval Health Prof 2019;16:18

Korean Journal of Radiology

Key Principles of Clinical Validation, Device Approval, and Insurance Coverage Decisions of Artificial Intelligence

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)