DOI QR코드

DOI QR Code

Improving the Performance of Radiologists Using Artificial Intelligence-Based Detection Support Software for Mammography: A Multi-Reader Study

  • Jeong Hoon Lee (Lunit Inc.) ;
  • Ki Hwan Kim (Lunit Inc.) ;
  • Eun Hye Lee (Department of Radiology, Soonchunhyang University Bucheon Hospital, Soonchunhyang University College of Medicine) ;
  • Jong Seok Ahn (Lunit Inc.) ;
  • Jung Kyu Ryu (Department of Radiology, Kyung Hee University Hospital at Gangdong) ;
  • Young Mi Park (Department of Radiology, Inje University Busan Paik Hospital, Inje University College of Medicine) ;
  • Gi Won Shin (Department of Radiology, Inje University Busan Paik Hospital, Inje University College of Medicine) ;
  • Young Joong Kim (Department of Radiology, Konyang University Hospital, Konyang University College of Medicine) ;
  • Hye Young Choi (Department of Radiology, Gyeongsang National University Hospital and College of Medicine, Gyeongsang National University)
  • Received : 2021.03.10
  • Accepted : 2022.01.24
  • Published : 2022.05.01

Abstract

Objective: To evaluate whether artificial intelligence (AI) for detecting breast cancer on mammography can improve the performance and time efficiency of radiologists reading mammograms. Materials and Methods: A commercial deep learning-based software for mammography was validated using external data collected from 200 patients, 100 each with and without breast cancer (40 with benign lesions and 60 without lesions) from one hospital. Ten readers, including five breast specialist radiologists (BSRs) and five general radiologists (GRs), assessed all mammography images using a seven-point scale to rate the likelihood of malignancy in two sessions, with and without the aid of the AI-based software, and the reading time was automatically recorded using a web-based reporting system. Two reading sessions were conducted with a two-month washout period in between. Differences in the area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, and reading time between reading with and without AI were analyzed, accounting for data clustering by readers when indicated. Results: The AUROC of the AI alone, BSR (average across five readers), and GR (average across five readers) groups was 0.915 (95% confidence interval, 0.876-0.954), 0.813 (0.756-0.870), and 0.684 (0.616-0.752), respectively. With AI assistance, the AUROC significantly increased to 0.884 (0.840-0.928) and 0.833 (0.779-0.887) in the BSR and GR groups, respectively (p = 0.007 and p < 0.001, respectively). Sensitivity was improved by AI assistance in both groups (74.6% vs. 88.6% in BSR, p < 0.001; 52.1% vs. 79.4% in GR, p < 0.001), but the specificity did not differ significantly (66.6% vs. 66.4% in BSR, p = 0.238; 70.8% vs. 70.0% in GR, p = 0.689). The average reading time pooled across readers was significantly decreased by AI assistance for BSRs (82.73 vs. 73.04 seconds, p < 0.001) but increased in GRs (35.44 vs. 42.52 seconds, p < 0.001). Conclusion: AI-based software improved the performance of radiologists regardless of their experience and affected the reading time.

Keywords

Acknowledgement

The English in this document has been checked by at least two professional editors, both native speakers of English. For a certificate, please see: http://www.textcheck.com/certificate/ILXoRO. The scientific guarantor of this publication is the corresponding author, Eun Hye Lee.

References

  1. DeSantis CE, Ma J, Gaudet MM, Newman LA, Miller KD, Goding Sauer A, et al. Breast cancer statistics, 2019. CA Cancer J Clin 2019;69:438-451
  2. Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2018;68:394-424
  3. Oeffinger KC, Fontham ET, Etzioni R, Herzig A, Michaelson JS, Shih YC, et al. Breast cancer screening for women at average risk: 2015 guideline update from the American Cancer Society. JAMA 2015;314:1599-1614
  4. Myers ER, Moorman P, Gierisch JM, Havrilesky LJ, Grimm LJ, Ghate S, et al. Benefits and harms of breast cancer screening: a systematic review. JAMA 2015;314:1615-1634
  5. Marmot MG, Altman DG, Cameron DA, Dewar JA, Thompson SG, Wilcox M. The benefits and harms of breast cancer screening: an independent review. Br J Cancer 2013;108:2205-2240
  6. Baker JA, Rosen EL, Lo JY, Gimenez EI, Walsh R, Soo MS. Computer-aided detection (CAD) in screening mammography: sensitivity of commercial CAD systems for detecting architectural distortion. AJR Am J Roentgenol 2003;181:1083-1088
  7. Eberhard JW, Alyassin AM, Kapur A. Computer aided detection (CAD) for 3D digital mammography. Web site. https://patents.google.com/patent/US7218766B2/en. Accessed January 11, 2021
  8. Warren Burhenne LJ, Wood SA, D'Orsi CJ, Feig SA, Kopans DB, O'Shaughnessy KF, et al. Potential contribution of computer-aided detection to the sensitivity of screening mammography. Radiology 2000;215:554-562
  9. Ribli D, Horvath A, Unger Z, Pollner P, Csabai I. Detecting and classifying lesions in mammograms with deep learning. Sci Rep 2018;8:4165
  10. Kim HE, Kim HH, Han BK, Kim KH, Han K, Nam H, et al. Changes in cancer detection and false-positive recall in mammography using artificial intelligence: a retrospective, multireader study. Lancet Digit Health 2020;2:e138-e148
  11. Yoon JH, Kim EK. Deep learning-based artificial intelligence for mammography. Korean J Radiol 2021;22:1225-1239
  12. Cole EB, Zhang Z, Marques HS, Edward Hendrick R, Yaffe MJ, Pisano ED. Impact of computer-aided detection systems on radiologist accuracy with digital mammography. AJR Am J Roentgenol 2014;203:909-916
  13. Lehman CD, Wellman RD, Buist DS, Kerlikowske K, Tosteson AN, Miglioretti DL; Breast Cancer Surveillance Consortium. Diagnostic accuracy of digital screening mammography with and without computer-aided detection. JAMA Intern Med 2015;175:1828-1837
  14. Fenton JJ, Abraham L, Taplin SH, Geller BM, Carney PA, D'Orsi C, et al. Effectiveness of computer-aided detection in community mammography practice. J Natl Cancer Inst 2015;103:1152-1161
  15. Mahoney MC, Meganathan K. False positive marks on unsuspicious screening mammography with computer-aided detection. J Digit Imaging 2015;24:772-777
  16. Rodriguez-Ruiz A, Krupinski E, Mordang JJ, Schilling K, Heywang-Kobrunner SH, Sechopoulos I, et al. Detection of breast cancer with mammography: effect of an artificial intelligence support system. Radiology 2019;290:305-314
  17. McKinney SM, Sieniek M, Godbole V, Godwin J, Antropova N, Ashrafian H, et al. International evaluation of an AI system for breast cancer screening. Nature 2020;577:89-94
  18. Rodriguez-Ruiz A, Lang K, Gubern-Merida A, Broeders M, Gennaro G, Clauser P, et al. Stand-alone artificial intelligence for breast cancer detection in mammography: comparison with 101 radiologists. J Natl Cancer Inst 2019;111:916-922
  19. Schaffter T, Buist DSM, Lee CI, Nikulin Y, Ribli D, Guan Y, et al. Evaluation of combined artificial intelligence and radiologist assessment to interpret screening mammograms. JAMA Netw Open 2020;3:e200265
  20. Park SH, Han K. Methodologic guide for evaluating clinical performance and effect of artificial intelligence technology for medical diagnosis and prediction. Radiology 2018;286:800-809
  21. Cabitza F, Rasoini R, Gensini GF. Unintended consequences of machine learning in medicine. JAMA 2017;318:517-518
  22. Obuchowski NA. Sample size tables for receiver operating characteristic studies. AJR Am J Roentgenol 2000;175:603-608
  23. Sung J, Park S, Lee SM, Bae W, Park B, Jung E, et al. Added value of deep learning-based detection system for multiple major findings on chest radiographs: a randomized crossover study. Radiology 2021;299:450-459
  24. Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 2011;12:77
  25. Chen W, Samuelson FW. The average receiver operating characteristic curve in multireader multicase imaging studies. Br J Radiol 2014;87:20140016
  26. Gallas BD, Hillis SL. Generalized Roe and Metz receiver operating characteristic model: analytic link between simulated decision scores and empirical AUC variances and covariances. J Med Imaging (Bellingham) 2014;1:031006
  27. Swensson RG. Unified measurement of observer performance in detecting and localizing target objects on images. Med Phys 1996;23:1709-1725
  28. He X, Frey E. ROC, LROC, FROC, AFROC: an alphabet soup. J Am Coll Radiol 2009;6:652-655
  29. Hillis SL. A comparison of denominator degrees of freedom methods for multiple observer ROC analysis. Stat Med 2007;26:596-619
  30. Dorfman DD, Berbaum KS, Metz CE. Receiver operating characteristic rating analysis. Generalization to the population of readers and patients with the jackknife method. Invest Radiol 1992;27:723-731
  31. Chakraborty DP. Analysis of location specific observer performance data: validated extensions of the jackknife free-response (JAFROC) method. Acad Radiol 2006;13:1187-1193
  32. Chakraborty DP, Berbaum KS. Observer studies involving detection and localization: modeling, analysis, and validation. Med Phys 2004;31:2313-2330
  33. Stock C, Hielscher T. DTComPair: comparison of binary diagnostic tests in a paired study design. R package version 1.0.3. Web site. http://CRAN.R-project.org/package=DTComPair. Published 2014. Accessed July 16, 2020
  34. Hojsgaard S, Halekoh U, Yan J. The R package geepack for generalized estimating equations. J Stat Softw 2006;15:1-11
  35. Kooi T, Litjens G, van Ginneken B, Gubern-Merida A, Sanchez CI, Mann R, et al. Large scale deep learning for computer aided detection of mammographic lesions. Med Image Anal 2017;35:303-312
  36. Conant EF, Toledano AY, Periaswamy S, Fotin SV, Go J, Boatsman JE, et al. Improving accuracy and efficiency with concurrent use of artificial intelligence for digital breast tomosynthesis. Radiol Artif Intell 2019;1:e180096