망막 이미지에서의 질병 진단: 교차 데이터셋 연구

Disease Diagnosis on Fundus Images: A Cross-Dataset Study

  • ;
  • ;
  • 추현승 (성균관대학교 전자전기컴퓨터공학과)
  • Van-Nguyen Pham (Dept. of Electrical and Computer Engineering, Sungkyunkwan University) ;
  • Sun Xiaoying (Dept. of Electrical and Computer Engineering, Sungkyunkwan University) ;
  • Hyunseung Choo (Dept. of Electrical and Computer Engineering, Sungkyunkwan University)
  • 발행 : 2024.10.31

초록

This paper presents a comparative study of five deep learning models-ResNet50, DenseNet121, Vision Transformer (ViT), Swin Transformer (SwinT), and CoatNet-on the task of multi-label classification of fundus images for ocular diseases. The models were trained on the Ocular Disease Recognition (ODIR) dataset and validated on the Retinal Fundus Multi-disease Image Dataset (RFMiD), with a focus on five disease classes: diabetic retinopathy, glaucoma, cataract, age-related macular degeneration, and myopia. The performance was evaluated using the area under the receiver operating characteristic curve (AUC-ROC) score for each class. CoatNet achieved the best AUC-ROC scores for diabetic retinopathy, glaucoma, cataract, and myopia, while ViT outperformed CoatNet for age-related macular degeneration. Overall, CoatNet exhibited the highest average performance across all classes, highlighting the effectiveness of hybrid architectures in medical image classification. These findings suggest that CoatNet may be a promising model for multi-label classification of fundus images in cross-dataset scenarios.

키워드

과제정보

This work was supported in part by the BK21 FOUR Project (50%) and the Korea government (MSIT), IITP, Korea, under the ICT Creative Consilience program (RS-2020-II201821, 25%), Development of Brain Disease (Stroke) (RS-2024-00459512, 25%).

참고문헌

  1. He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep residual learning for image recognition." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778. 2016.
  2. Huang, Gao, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q. Weinberger. "Densely connected convolutional networks." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700-4708. 2017.
  3. Dosovitskiy, Alexey. "An image is worth 16x16 words: Transformers for image recognition at scale." arXiv preprint arXiv:2010.11929 (2020).
  4. Liu, Ze, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. "Swin transformer: Hierarchical vision transformer using shifted windows." In Proceedings of the IEEE/CVF international conference on computer vision, pp. 10012-10022. 2021.
  5. Dai, Zihang, Hanxiao Liu, Quoc V. Le, and Mingxing Tan. "Coatnet: Marrying convolution and attention for all data sizes." Advances in neural information processing systems 34 (2021): 3965-3977.