• 제목/요약/키워드: Performance objective

검색결과 5,686건 처리시간 0.031초

Artificial Intelligence-Based Identification of Normal Chest Radiographs: A Simulation Study in a Multicenter Health Screening Cohort

  • Hyunsuk Yoo;Eun Young Kim;Hyungjin Kim;Ye Ra Choi;Moon Young Kim;Sung Ho Hwang;Young Joong Kim;Young Jun Cho;Kwang Nam Jin
    • Korean Journal of Radiology
    • /
    • 제23권10호
    • /
    • pp.1009-1018
    • /
    • 2022
  • Objective: This study aimed to investigate the feasibility of using artificial intelligence (AI) to identify normal chest radiography (CXR) from the worklist of radiologists in a health-screening environment. Materials and Methods: This retrospective simulation study was conducted using the CXRs of 5887 adults (mean age ± standard deviation, 55.4 ± 11.8 years; male, 4329) from three health screening centers in South Korea using a commercial AI (Lunit INSIGHT CXR3, version 3.5.8.8). Three board-certified thoracic radiologists reviewed CXR images for referable thoracic abnormalities and grouped the images into those with visible referable abnormalities (identified as abnormal by at least one reader) and those with clearly visible referable abnormalities (identified as abnormal by at least two readers). With AI-based simulated exclusion of normal CXR images, the percentages of normal images sorted and abnormal images erroneously removed were analyzed. Additionally, in a random subsample of 480 patients, the ability to identify visible referable abnormalities was compared among AI-unassisted reading (i.e., all images read by human readers without AI), AI-assisted reading (i.e., all images read by human readers with AI assistance as concurrent readers), and reading with AI triage (i.e., human reading of only those rendered abnormal by AI). Results: Of 5887 CXR images, 405 (6.9%) and 227 (3.9%) contained visible and clearly visible abnormalities, respectively. With AI-based triage, 42.9% (2354/5482) of normal CXR images were removed at the cost of erroneous removal of 3.5% (14/405) and 1.8% (4/227) of CXR images with visible and clearly visible abnormalities, respectively. In the diagnostic performance study, AI triage removed 41.6% (188/452) of normal images from the worklist without missing visible abnormalities and increased the specificity for some readers without decreasing sensitivity. Conclusion: This study suggests the feasibility of sorting and removing normal CXRs using AI with a tailored cut-off to increase efficiency and reduce the workload of radiologists.

Impact of the Liver Imaging Reporting and Data System on Research Studies of Diagnosing Hepatocellular Carcinoma Using MRI

  • Yura Ahn;Sang Hyun Choi;Jong Keon Jang;So Yeon Kim;Ju Hyun Shim;Seung Soo Lee;Jae Ho Byun
    • Korean Journal of Radiology
    • /
    • 제23권5호
    • /
    • pp.529-538
    • /
    • 2022
  • Objective: Since its introduction in 2011, the CT/MRI diagnostic Liver Imaging Reporting and Data System (LI-RADS) has been updated in 2014, 2017, and 2018. We evaluated the impact of CT/MRI diagnostic LI-RADS on liver MRI research methodology for the diagnosis of hepatocellular carcinoma (HCC). Materials and Methods: The MEDLINE, EMBASE, and Cochrane databases were searched for original articles reporting the diagnostic performance of liver MRI for HCC between 2011 and 2019. The MRI techniques, image analysis methods, and diagnostic criteria for HCC used in each study were investigated. The studies were classified into three groups according to the year of publication (2011-2013, 2014-2016, and 2017-2019). We compared the percentage of studies adopting MRI techniques recommended by LI-RADS, image analysis methods in accordance with the lexicon defined in LI-RADS, and diagnostic criteria endorsed by LI-RADS. We compared the pooled sensitivity and specificity between studies that used the LI-RADS and those that did not. Results: This systematic review included 179 studies. The percentages of studies using imaging techniques recommended by LI-RADS were 77.8% for 2011-2013, 85.7% for 2014-2016, and 84.2% for 2017-2019, with no significant difference (p = 0.951). After the introduction of LI-RADS, the percentages of studies following the LI-RADS lexicon were 0.0%, 18.4%, and 56.6% in the respective periods (p < 0.001), while the percentages of studies using the LI-RADS diagnostic imaging criteria were 0.0%, 22.9%, and 60.7%, respectively (p < 0.001). Studies that did not use the LI-RADS and those that used the LIRADS version 2018 showed no significant difference in sensitivity and specificity (86.3% vs. 77.7%, p = 0.102 and 91.4% vs. 89.9%, p = 0.770, respectively), with some difference in heterogeneity (I2 = 94.3% vs. 86.7% in sensitivity and I2 = 86.6% vs. 53.2% in specificity). Conclusion: LI-RADS imparted significant changes in the image analysis methods and diagnostic criteria used in liver MRI research for the diagnosis of HCC.

Validation of CT-Based Risk Stratification System for Lymph Node Metastasis in Patients With Thyroid Cancer

  • Yun Hwa Roh;Sae Rom Chung;Jung Hwan Baek;Young Jun Choi;Tae-Yon Sung;Dong Eun Song;Tae Yong Kim;Jeong Hyun Lee
    • Korean Journal of Radiology
    • /
    • 제24권10호
    • /
    • pp.1028-1037
    • /
    • 2023
  • Objective: To evaluate the computed tomography (CT) features for diagnosing metastatic cervical lymph nodes (LNs) in patients with differentiated thyroid cancer (DTC) and validate the CT-based risk stratification system suggested by the Korean Thyroid Imaging Reporting and Data System (K-TIRADS) guidelines. Materials and Methods: A total of 463 LNs from 399 patients with DTC who underwent preoperative CT staging and ultrasound-guided fine-needle aspiration were included. The following CT features for each LN were evaluated: absence of hilum, cystic changes, calcification, strong enhancement, and heterogeneous enhancement. Multivariable logistic regression analysis was performed to identify independent CT features associated with metastatic LNs, and their diagnostic performances were evaluated. LNs were classified into probably benign, indeterminate, and suspicious categories according to the K-TIRADS and the modified LN classification proposed in our study. The diagnostic performance of both classification systems was compared using the exact McNemar and Kosinski tests. Results: The absence of hilum (odds ratio [OR], 4.859; 95% confidence interval [CI], 1.593-14.823; P = 0.005), strong enhancement (OR, 28.755; 95% CI, 12.719-65.007; P < 0.001), and cystic changes (OR, 46.157; 95% CI, 5.07-420.234; P = 0.001) were independently associated with metastatic LNs. All LNs showing calcification were diagnosed as metastases. Heterogeneous enhancement did not show a significant independent association with metastatic LNs. Strong enhancement, calcification, and cystic changes showed moderate to high specificity (70.1%-100%) and positive predictive value (PPV) (91.8%-100%). The absence of the hilum showed high sensitivity (97.8%) but low specificity (34.0%). The modified LN classification, which excluded heterogeneous enhancement from the K-TIRADS, demonstrated higher specificity (70.1% vs. 62.9%, P = 0.016) and PPV (92.5% vs. 90.9%, P = 0.011) than the K-TIRADS. Conclusion: Excluding heterogeneous enhancement as a suspicious feature resulted in a higher specificity and PPV for diagnosing metastatic LNs than the K-TIRADS. Our research results may provide a basis for revising the LN classification in future guidelines.

Bone Age Assessment Using Artificial Intelligence in Korean Pediatric Population: A Comparison of Deep-Learning Models Trained With Healthy Chronological and Greulich-Pyle Ages as Labels

  • Pyeong Hwa Kim;Hee Mang Yoon;Jeong Rye Kim;Jae-Yeon Hwang;Jin-Ho Choi;Jisun Hwang;Jaewon Lee;Jinkyeong Sung;Kyu-Hwan Jung;Byeonguk Bae;Ah Young Jung;Young Ah Cho;Woo Hyun Shim;Boram Bak;Jin Seong Lee
    • Korean Journal of Radiology
    • /
    • 제24권11호
    • /
    • pp.1151-1163
    • /
    • 2023
  • Objective: To develop a deep-learning-based bone age prediction model optimized for Korean children and adolescents and evaluate its feasibility by comparing it with a Greulich-Pyle-based deep-learning model. Materials and Methods: A convolutional neural network was trained to predict age according to the bone development shown on a hand radiograph (bone age) using 21036 hand radiographs of Korean children and adolescents without known bone development-affecting diseases/conditions obtained between 1998 and 2019 (median age [interquartile range {IQR}], 9 [7-12] years; male:female, 11794:9242) and their chronological ages as labels (Korean model). We constructed 2 separate external datasets consisting of Korean children and adolescents with healthy bone development (Institution 1: n = 343; median age [IQR], 10 [4-15] years; male: female, 183:160; Institution 2: n = 321; median age [IQR], 9 [5-14] years; male: female, 164:157) to test the model performance. The mean absolute error (MAE), root mean square error (RMSE), and proportions of bone age predictions within 6, 12, 18, and 24 months of the reference age (chronological age) were compared between the Korean model and a commercial model (VUNO Med-BoneAge version 1.1; VUNO) trained with Greulich-Pyle-based age as the label (GP-based model). Results: Compared with the GP-based model, the Korean model showed a lower RMSE (11.2 vs. 13.8 months; P = 0.004) and MAE (8.2 vs. 10.5 months; P = 0.002), a higher proportion of bone age predictions within 18 months of chronological age (88.3% vs. 82.2%; P = 0.031) for Institution 1, and a lower MAE (9.5 vs. 11.0 months; P = 0.022) and higher proportion of bone age predictions within 6 months (44.5% vs. 36.4%; P = 0.044) for Institution 2. Conclusion: The Korean model trained using the chronological ages of Korean children and adolescents without known bone development-affecting diseases/conditions as labels performed better in bone age assessment than the GP-based model in the Korean pediatric population. Further validation is required to confirm its accuracy.

Validation of Ultrasound and Computed Tomography-Based Risk Stratification System and Biopsy Criteria for Cervical Lymph Nodes in Preoperative Patients With Thyroid Cancer

  • Young Hun Jeon;Ji Ye Lee;Roh-Eul Yoo;Jung Hyo Rhim;Kyung Hoon Lee;Kyu Sung Choi;Inpyeong Hwang;Koung Mi Kang;Ji-hoon Kim
    • Korean Journal of Radiology
    • /
    • 제24권9호
    • /
    • pp.912-923
    • /
    • 2023
  • Objective: This study aimed to validate the risk stratification system (RSS) and biopsy criteria for cervical lymph nodes (LNs) proposed by the Korean Society of Thyroid Radiology (KSThR). Materials and Methods: This retrospective study included a consecutive series of preoperative patients with thyroid cancer who underwent LN biopsy, ultrasound (US), and computed tomography (CT) between December 2006 and June 2015. LNs were categorized as probably benign, indeterminate, or suspicious according to the current US- and CT-based RSS and the size thresholds for cervical LN biopsy as suggested by the KSThR. The diagnostic performance and unnecessary biopsy rates were calculated. Results: A total of 277 LNs (53.1% metastatic) in 228 patients (mean age ± standard deviation, 47.4 years ± 14) were analyzed. In US, the malignancy risks were significantly different among the three categories (all P < 0.001); however, CT-detected probably benign and indeterminate LNs showed similarly low malignancy risks (P = 0.468). The combined US + CT criteria stratified the malignancy risks among the three categories (all P < 0.001) and reduced the proportion of indeterminate LNs (from 20.6% to 14.4%) and the malignancy risk in the indeterminate LNs (from 31.6% to 12.5%) compared with US alone. In all image-based classifications, nodal size did not affect the malignancy risks (short diameter [SD] ≤ 5 mm LNs vs. SD > 5 mm LNs, P ≥ 0.177). The criteria covering only suspicious LNs showed higher specificity and lower unnecessary biopsy rates than the current criteria, while maintaining sensitivity in all imaging modalities. Conclusion: Integrative evaluation of US and CT helps in reducing the proportion of indeterminate LNs and the malignancy risk among them. Nodal size did not affect the malignancy risk of LNs, and the addition of indeterminate LNs to biopsy candidates did not have an advantage in detecting LN metastases in all imaging modalities.

Development and Validation of 18F-FDG PET/CT-Based Multivariable Clinical Prediction Models for the Identification of Malignancy-Associated Hemophagocytic Lymphohistiocytosis

  • Xu Yang;Xia Lu;Jun Liu;Ying Kan;Wei Wang;Shuxin Zhang;Lei Liu;Jixia Li;Jigang Yang
    • Korean Journal of Radiology
    • /
    • 제23권4호
    • /
    • pp.466-478
    • /
    • 2022
  • Objective: 18F-fluorodeoxyglucose (FDG) PET/CT is often used for detecting malignancy in patients with newly diagnosed hemophagocytic lymphohistiocytosis (HLH), with acceptable sensitivity but relatively low specificity. The aim of this study was to improve the diagnostic ability of 18F-FDG PET/CT in identifying malignancy in patients with HLH by combining 18F-FDG PET/CT and clinical parameters. Materials and Methods: Ninety-seven patients (age ≥ 14 years) with secondary HLH were retrospectively reviewed and divided into the derivation (n = 71) and validation (n = 26) cohorts according to admission time. In the derivation cohort, 22 patients had malignancy-associated HLH (M-HLH) and 49 patients had non-malignancy-associated HLH (NM-HLH). Data on pretreatment 18F-FDG PET/CT and laboratory results were collected. The variables were analyzed using the Mann-Whitney U test or Pearson's chi-square test, and a nomogram for predicting M-HLH was constructed using multivariable binary logistic regression. The predictors were also ranked using decision-tree analysis. The nomogram and decision tree were validated in the validation cohort (10 patients with M-HLH and 16 patients with NM-HLH). Results: The ratio of the maximal standardized uptake value (SUVmax) of the lymph nodes to that of the mediastinum, the ratio of the SUVmax of bone lesions or bone marrow to that of the mediastinum, and age were selected for constructing the model. The nomogram showed good performance in predicting M-HLH in the validation cohort, with an area under the receiver operating characteristic curve of 0.875 (95% confidence interval, 0.686-0.971). At an appropriate cutoff value, the sensitivity and specificity for identifying M-HLH were 90% (9/10) and 68.8% (11/16), respectively. The decision tree integrating the same variables showed 70% (7/10) sensitivity and 93.8% (15/16) specificity for identifying M-HLH. In comparison, visual analysis of 18F-FDG PET/CT images demonstrated 100% (10/10) sensitivity and 12.5% (2/16) specificity. Conclusion: 18F-FDG PET/CT may be a practical technique for identifying M-HLH. The model constructed using 18F-FDG PET/CT features and age was able to detect malignancy with better accuracy than visual analysis of 18F-FDG PET/CT images.

Deep Learning-Assisted Diagnosis of Pediatric Skull Fractures on Plain Radiographs

  • Jae Won Choi;Yeon Jin Cho;Ji Young Ha;Yun Young Lee;Seok Young Koh;June Young Seo;Young Hun Choi;Jung-Eun Cheon;Ji Hoon Phi;Injoon Kim;Jaekwang Yang;Woo Sun Kim
    • Korean Journal of Radiology
    • /
    • 제23권3호
    • /
    • pp.343-354
    • /
    • 2022
  • Objective: To develop and evaluate a deep learning-based artificial intelligence (AI) model for detecting skull fractures on plain radiographs in children. Materials and Methods: This retrospective multi-center study consisted of a development dataset acquired from two hospitals (n = 149 and 264) and an external test set (n = 95) from a third hospital. Datasets included children with head trauma who underwent both skull radiography and cranial computed tomography (CT). The development dataset was split into training, tuning, and internal test sets in a ratio of 7:1:2. The reference standard for skull fracture was cranial CT. Two radiology residents, a pediatric radiologist, and two emergency physicians participated in a two-session observer study on an external test set with and without AI assistance. We obtained the area under the receiver operating characteristic curve (AUROC), sensitivity, and specificity along with their 95% confidence intervals (CIs). Results: The AI model showed an AUROC of 0.922 (95% CI, 0.842-0.969) in the internal test set and 0.870 (95% CI, 0.785-0.930) in the external test set. The model had a sensitivity of 81.1% (95% CI, 64.8%-92.0%) and specificity of 91.3% (95% CI, 79.2%-97.6%) for the internal test set and 78.9% (95% CI, 54.4%-93.9%) and 88.2% (95% CI, 78.7%-94.4%), respectively, for the external test set. With the model's assistance, significant AUROC improvement was observed in radiology residents (pooled results) and emergency physicians (pooled results) with the difference from reading without AI assistance of 0.094 (95% CI, 0.020-0.168; p = 0.012) and 0.069 (95% CI, 0.002-0.136; p = 0.043), respectively, but not in the pediatric radiologist with the difference of 0.008 (95% CI, -0.074-0.090; p = 0.850). Conclusion: A deep learning-based AI model improved the performance of inexperienced radiologists and emergency physicians in diagnosing pediatric skull fractures on plain radiographs.

Use of Artificial Intelligence for Reducing Unnecessary Recalls at Screening Mammography: A Simulation Study

  • Yeon Soo Kim;Myoung-jin Jang;Su Hyun Lee;Soo-Yeon Kim;Su Min Ha;Bo Ra Kwon;Woo Kyung Moon;Jung Min Chang
    • Korean Journal of Radiology
    • /
    • 제23권12호
    • /
    • pp.1241-1250
    • /
    • 2022
  • Objective: To conduct a simulation study to determine whether artificial intelligence (AI)-aided mammography reading can reduce unnecessary recalls while maintaining cancer detection ability in women recalled after mammography screening. Materials and Methods: A retrospective reader study was performed by screening mammographies of 793 women (mean age ± standard deviation, 50 ± 9 years) recalled to obtain supplemental mammographic views regarding screening mammography-detected abnormalities between January 2016 and December 2019 at two screening centers. Initial screening mammography examinations were interpreted by three dedicated breast radiologists sequentially, case by case, with and without AI aid, in a single session. The area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and recall rate for breast cancer diagnosis were obtained and compared between the two reading modes. Results: Fifty-four mammograms with cancer (35 invasive cancers and 19 ductal carcinomas in situ) and 739 mammograms with benign or negative findings were included. The reader-averaged AUC improved after AI aid, from 0.79 (95% confidence interval [CI], 0.74-0.85) to 0.89 (95% CI, 0.85-0.94) (p < 0.001). The reader-averaged specificities before and after AI aid were 41.9% (95% CI, 39.3%-44.5%) and 53.9% (95% CI, 50.9%-56.9%), respectively (p < 0.001). The reader-averaged sensitivity was not statistically different between AI-unaided and AI-aided readings: 89.5% (95% CI, 83.1%-95.9%) vs. 92.6% (95% CI, 86.2%-99.0%) (p = 0.053), although the sensitivities of the least experienced radiologists before and after AI aid were 79.6% (43 of 54 [95% CI, 66.5%-89.4%]) and 90.7% (49 of 54 [95% CI, 79.7%-96.9%]), respectively (p = 0.031). With AI aid, the reader-averaged recall rate decreased by from 60.4% (95% CI, 57.8%-62.9%) to 49.5% (95% CI, 46.5%-52.4%) (p < 0.001). Conclusion: AI-aided reading reduced the number of recalls and improved the diagnostic performance in our simulation using women initially recalled for supplemental mammographic views after mammography screening.

Automated Measurement of Native T1 and Extracellular Volume Fraction in Cardiac Magnetic Resonance Imaging Using a Commercially Available Deep Learning Algorithm

  • Suyon Chang;Kyunghwa Han;Suji Lee;Young Joong Yang;Pan Ki Kim;Byoung Wook Choi;Young Joo Suh
    • Korean Journal of Radiology
    • /
    • 제23권12호
    • /
    • pp.1251-1259
    • /
    • 2022
  • Objective: T1 mapping provides valuable information regarding cardiomyopathies. Manual drawing is time consuming and prone to subjective errors. Therefore, this study aimed to test a DL algorithm for the automated measurement of native T1 and extracellular volume (ECV) fractions in cardiac magnetic resonance (CMR) imaging with a temporally separated dataset. Materials and Methods: CMR images obtained for 95 participants (mean age ± standard deviation, 54.5 ± 15.2 years), including 36 left ventricular hypertrophy (12 hypertrophic cardiomyopathy, 12 Fabry disease, and 12 amyloidosis), 32 dilated cardiomyopathy, and 27 healthy volunteers, were included. A commercial deep learning (DL) algorithm based on 2D U-net (Myomics-T1 software, version 1.0.0) was used for the automated analysis of T1 maps. Four radiologists, as study readers, performed manual analysis. The reference standard was the consensus result of the manual analysis by two additional expert readers. The segmentation performance of the DL algorithm and the correlation and agreement between the automated measurement and the reference standard were assessed. Interobserver agreement among the four radiologists was analyzed. Results: DL successfully segmented the myocardium in 99.3% of slices in the native T1 map and 89.8% of slices in the post-T1 map with Dice similarity coefficients of 0.86 ± 0.05 and 0.74 ± 0.17, respectively. Native T1 and ECV showed strong correlation and agreement between DL and the reference: for T1, r = 0.967 (95% confidence interval [CI], 0.951-0.978) and bias of 9.5 msec (95% limits of agreement [LOA], -23.6-42.6 msec); for ECV, r = 0.987 (95% CI, 0.980-0.991) and bias of 0.7% (95% LOA, -2.8%-4.2%) on per-subject basis. Agreements between DL and each of the four radiologists were excellent (intraclass correlation coefficient [ICC] of 0.98-0.99 for both native T1 and ECV), comparable to the pairwise agreement between the radiologists (ICC of 0.97-1.00 and 0.99-1.00 for native T1 and ECV, respectively). Conclusion: The DL algorithm allowed automated T1 and ECV measurements comparable to those of radiologists.

Development and Validation of MRI-Based Radiomics Models for Diagnosing Juvenile Myoclonic Epilepsy

  • Kyung Min Kim;Heewon Hwang;Beomseok Sohn;Kisung Park;Kyunghwa Han;Sung Soo Ahn;Wonwoo Lee;Min Kyung Chu;Kyoung Heo;Seung-Koo Lee
    • Korean Journal of Radiology
    • /
    • 제23권12호
    • /
    • pp.1281-1289
    • /
    • 2022
  • Objective: Radiomic modeling using multiple regions of interest in MRI of the brain to diagnose juvenile myoclonic epilepsy (JME) has not yet been investigated. This study aimed to develop and validate radiomics prediction models to distinguish patients with JME from healthy controls (HCs), and to evaluate the feasibility of a radiomics approach using MRI for diagnosing JME. Materials and Methods: A total of 97 JME patients (25.6 ± 8.5 years; female, 45.5%) and 32 HCs (28.9 ± 11.4 years; female, 50.0%) were randomly split (7:3 ratio) into a training (n = 90) and a test set (n = 39) group. Radiomic features were extracted from 22 regions of interest in the brain using the T1-weighted MRI based on clinical evidence. Predictive models were trained using seven modeling methods, including a light gradient boosting machine, support vector classifier, random forest, logistic regression, extreme gradient boosting, gradient boosting machine, and decision tree, with radiomics features in the training set. The performance of the models was validated and compared to the test set. The model with the highest area under the receiver operating curve (AUROC) was chosen, and important features in the model were identified. Results: The seven tested radiomics models, including light gradient boosting machine, support vector classifier, random forest, logistic regression, extreme gradient boosting, gradient boosting machine, and decision tree, showed AUROC values of 0.817, 0.807, 0.783, 0.779, 0.767, 0.762, and 0.672, respectively. The light gradient boosting machine with the highest AUROC, albeit without statistically significant differences from the other models in pairwise comparisons, had accuracy, precision, recall, and F1 scores of 0.795, 0.818, 0.931, and 0.871, respectively. Radiomic features, including the putamen and ventral diencephalon, were ranked as the most important for suggesting JME. Conclusion: Radiomic models using MRI were able to differentiate JME from HCs.