DOI QR코드

DOI QR Code

From Masked Reconstructions to Disease Diagnostics: A Vision Transformer Approach for Fundus Images

마스크된 복원에서 질병 진단까지: 안저 영상을 위한 비전 트랜스포머 접근법

  • Toan Duc Nguyen (Dept. of AI Systems Engineering, Sungkyunkwan University) ;
  • Gyurin Byun (Dept. of AI Systems Engineering, Sungkyunkwan University) ;
  • Hyunseung Choo (Dept. of AI Systems Engineering, Sungkyunkwan University)
  • ;
  • 변규린 (성균관대학교 AI 시스템공학과) ;
  • 추현승 (성균관대학교 AI 시스템공학과)
  • Published : 2023.11.02

Abstract

In this paper, we introduce a pre-training method leveraging the capabilities of the Vision Transformer (ViT) for disease diagnosis in conventional Fundus images. Recognizing the need for effective representation learning in medical images, our method combines the Vision Transformer with a Masked Autoencoder to generate meaningful and pertinent image augmentations. During pre-training, the Masked Autoencoder produces an altered version of the original image, which serves as a positive pair. The Vision Transformer then employs contrastive learning techniques with this image pair to refine its weight parameters. Our experiments demonstrate that this dual-model approach harnesses the strengths of both the ViT and the Masked Autoencoder, resulting in robust and clinically relevant feature embeddings. Preliminary results suggest significant improvements in diagnostic accuracy, underscoring the potential of our methodology in enhancing automated disease diagnosis in fundus imaging.

Keywords

Acknowledgement

This work was funded in part by the Institute for Information and Communications Technology Planning and Evaluation (IITP) Grant funded by the Korea Government [Ministry of Science and ICT (MSIT)] under the Artificial Intelligence Innovation Hub (2021-0-02068), ICT Creative Consilience Program (2023-2020-0-01821), and AI Graduate School program (IITP-2019-0-00421).