DOI QR코드

DOI QR Code

Self-Supervised Multi-Modal Learning for Fundus Image Analysis Using Contrastive and Generative Learning

대조적 학습과 생성적 학습을 활용한 안저 이미지 분석을 위한 자가 지도 다중 모달 학습

  • Toan Duc Nguyen (Dept. of AI Systems Engineering, Sungkyunkwan University) ;
  • Sun Xiaoying (Dept. of Electrical and Computer Engineering, Sungkyunkwan University) ;
  • Hyunseung Choo (Dept. of AI Systems Engineering, Sungkyunkwan University)
  • ;
  • 손소영 (성균관대학교 전자전기컴퓨터공학과 ) ;
  • 추현승 (성균관대학교 AI 시스템공학과)
  • Published : 2024.10.31

Abstract

In this study, we propose a self-supervised learning framework for fundus image processing, utilizing both contrastive and generative learning techniques for pre-training. Our contrastive learning approach integrates both image and text modalities through cross-attention mechanisms, allowing the model to learn more informative and semantically rich representations. After pre-training, the model is fine-tuned for downstream tasks, including zero-shot, few-shot, and full fine-tuning. Experimental results show that our method significantly outperforms existing approaches, achieving 15% higher performance in zero-shot, 4.5% in few-shot, and 10.1% in fine-tuning scenarios. The proposed method demonstrates its potential in the medical imaging field, where access to large annotated datasets is often limited. By efficiently leveraging both image and textual information, our approach contributes to improving the accuracy and generalizability of models in fundus image analysis, highlighting its broader applicability in medical diagnostics and healthcare.

Keywords

Acknowledgement

This work was partly supported by the BK21 FOUR Project, Korea government (MSIT), IITP, Korea, under the ICT Creative Consilience program (RS-2020-II201821, 50%), AI Innovation Hub (RS-2021-II212068, 25%), and AI Graduate School Program (Sungkyunkwan University, (RS-2019-II190421, 25%).

References

  1. Litjens, Geert, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen Awm Van Der Laak, Bram Van Ginneken, and Clara I. Sanchez. "A survey on deep learning in medical image analysis." Medical image analysis 42 (2017): 60-88.
  2. Azizi, Shekoofeh, Basil Mustafa, Fiona Ryan, Zachary Beaver, Jan Freyberg, Jonathan Deaton, Aaron Loh et al. "Big self-supervised models advance medical image classification." In Proceedings of the IEEE/CVF international conference on computer vision, pp. 3478-3488. 2021.
  3. Dosovitskiy, Alexey. "An image is worth 16x16 words: Transformers for image recognition at scale."arXiv preprint arXiv:2010.11929 (2020).
  4. Radford, Alec, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry et al. "Learning transferable visual models from natural language supervision." In International conference on machine learning, pp. 8748-8763. PMLR, 2021.
  5. Yu, Jiahui, Zirui Wang, Vijay Vasudevan, Legg Yeung, Mojtaba Seyedhosseini, and Yonghui Wu. "Coca: Contrastive captioners are image-text foundation models." arXiv preprint arXiv:2205.01917 (2022).
  6. Fang, Yuxin, Quan Sun, Xinggang Wang, Tiejun Huang, Xinlong Wang, and Yue Cao. "Eva-02: A visual representation for neon genesis." Image and Vision Computing 149 (2024): 105171