RET-CLIP: A Retinal Image Foundation Model Pre-trained with Clinical Diagnostic Reports

Jiawei Du; Jia Guo; Weihang Zhang; Shengzhu Yang; Hanruo Liu; Huiqi Li; Ningli Wang

doi:10.1007/978-3-031-72390-2_66

RET-CLIP: A Retinal Image Foundation Model Pre-trained with Clinical Diagnostic Reports

Jiawei Du, Jia Guo, Weihang Zhang, Shengzhu Yang, Hanruo Liu, Huiqi Li^*, Ningli Wang

^*此作品的通讯作者

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

1 引用（Scopus）

摘要

The Vision-Language Foundation model is increasingly investigated in the fields of computer vision and natural language processing, yet its exploration in ophthalmology and broader medical applications remains limited. The challenge is the lack of labeled data for the training of foundation model. To handle this issue, a CLIP-style retinal image foundation model is developed in this paper. Our foundation model, RET-CLIP, is specifically trained on a dataset of 193,865 patients to extract general features of color fundus photographs (CFPs), employing a tripartite optimization strategy to focus on left eye, right eye, and patient level to reflect real-world clinical scenarios. Extensive experiments demonstrate that RET-CLIP outperforms existing benchmarks across eight diverse datasets spanning four critical diagnostic categories: diabetic retinopathy, glaucoma, multiple disease diagnosis, and multi-label classification of multiple diseases, which demonstrate the performance and generality of our foundation model. The sourse code and pre-trained model are available at https://github.com/sStonemason/RET-CLIP.

源语言	英语
主期刊名	Medical Image Computing and Computer Assisted Intervention – MICCAI 2024 - 27th International Conference, Proceedings
编辑	Marius George Linguraru, Qi Dou, Aasa Feragen, Stamatia Giannarou, Ben Glocker, Karim Lekadir, Julia A. Schnabel
出版商	Springer Science and Business Media Deutschland GmbH
页	709-719
页数	11
ISBN（印刷版）	9783031723896
DOI	https://doi.org/10.1007/978-3-031-72390-2_66
出版状态	已出版 - 2024
活动	27th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2024 - Marrakesh, 摩洛哥期限: 6 10月 2024 → 10 10月 2024

出版系列

姓名	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
卷	15012 LNCS
ISSN（印刷版）	0302-9743
ISSN（电子版）	1611-3349

会议

会议	27th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2024
国家/地区	摩洛哥
市	Marrakesh
时期	6/10/24 → 10/10/24

联合国可持续发展目标

此成果有助于实现下列可持续发展目标：

访问文件

10.1007/978-3-031-72390-2_66

其它文件与链接

链接到 Scopus 的出版物

引用此

Du, J., Guo, J., Zhang, W., Yang, S., Liu, H., Li, H., & Wang, N. (2024). RET-CLIP: A Retinal Image Foundation Model Pre-trained with Clinical Diagnostic Reports. 在 M. G. Linguraru, Q. Dou, A. Feragen, S. Giannarou, B. Glocker, K. Lekadir, & J. A. Schnabel (编辑), Medical Image Computing and Computer Assisted Intervention – MICCAI 2024 - 27th International Conference, Proceedings (页码 709-719). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 15012 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-72390-2_66

Du, Jiawei ; Guo, Jia ; Zhang, Weihang 等. / RET-CLIP : A Retinal Image Foundation Model Pre-trained with Clinical Diagnostic Reports. Medical Image Computing and Computer Assisted Intervention – MICCAI 2024 - 27th International Conference, Proceedings. 编辑 / Marius George Linguraru ; Qi Dou ; Aasa Feragen ; Stamatia Giannarou ; Ben Glocker ; Karim Lekadir ; Julia A. Schnabel. Springer Science and Business Media Deutschland GmbH, 2024. 页码 709-719 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{c3e0b48fb2f0466aa5283e5359b987cd,

title = "RET-CLIP: A Retinal Image Foundation Model Pre-trained with Clinical Diagnostic Reports",

abstract = "The Vision-Language Foundation model is increasingly investigated in the fields of computer vision and natural language processing, yet its exploration in ophthalmology and broader medical applications remains limited. The challenge is the lack of labeled data for the training of foundation model. To handle this issue, a CLIP-style retinal image foundation model is developed in this paper. Our foundation model, RET-CLIP, is specifically trained on a dataset of 193,865 patients to extract general features of color fundus photographs (CFPs), employing a tripartite optimization strategy to focus on left eye, right eye, and patient level to reflect real-world clinical scenarios. Extensive experiments demonstrate that RET-CLIP outperforms existing benchmarks across eight diverse datasets spanning four critical diagnostic categories: diabetic retinopathy, glaucoma, multiple disease diagnosis, and multi-label classification of multiple diseases, which demonstrate the performance and generality of our foundation model. The sourse code and pre-trained model are available at https://github.com/sStonemason/RET-CLIP.",

keywords = "Foundation Model, Retinal Fundus Image, Vision-Language Pre-training",

author = "Jiawei Du and Jia Guo and Weihang Zhang and Shengzhu Yang and Hanruo Liu and Huiqi Li and Ningli Wang",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.; 27th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2024 ; Conference date: 06-10-2024 Through 10-10-2024",

year = "2024",

doi = "10.1007/978-3-031-72390-2_66",

language = "English",

isbn = "9783031723896",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "709--719",

editor = "Linguraru, {Marius George} and Qi Dou and Aasa Feragen and Stamatia Giannarou and Ben Glocker and Karim Lekadir and Schnabel, {Julia A.}",

booktitle = "Medical Image Computing and Computer Assisted Intervention – MICCAI 2024 - 27th International Conference, Proceedings",

address = "Germany",

}

Du, J, Guo, J, Zhang, W, Yang, S, Liu, H, Li, H & Wang, N 2024, RET-CLIP: A Retinal Image Foundation Model Pre-trained with Clinical Diagnostic Reports. 在 MG Linguraru, Q Dou, A Feragen, S Giannarou, B Glocker, K Lekadir & JA Schnabel (编辑), Medical Image Computing and Computer Assisted Intervention – MICCAI 2024 - 27th International Conference, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 卷 15012 LNCS, Springer Science and Business Media Deutschland GmbH, 页码 709-719, 27th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2024, Marrakesh, 摩洛哥, 6/10/24. https://doi.org/10.1007/978-3-031-72390-2_66

RET-CLIP: A Retinal Image Foundation Model Pre-trained with Clinical Diagnostic Reports. / Du, Jiawei; Guo, Jia; Zhang, Weihang 等.
Medical Image Computing and Computer Assisted Intervention – MICCAI 2024 - 27th International Conference, Proceedings. 编辑 / Marius George Linguraru; Qi Dou; Aasa Feragen; Stamatia Giannarou; Ben Glocker; Karim Lekadir; Julia A. Schnabel. Springer Science and Business Media Deutschland GmbH, 2024. 页码 709-719 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 15012 LNCS).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - RET-CLIP

T2 - 27th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2024

AU - Du, Jiawei

AU - Guo, Jia

AU - Zhang, Weihang

AU - Yang, Shengzhu

AU - Liu, Hanruo

AU - Li, Huiqi

AU - Wang, Ningli

N1 - Publisher Copyright: © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.

PY - 2024

Y1 - 2024

N2 - The Vision-Language Foundation model is increasingly investigated in the fields of computer vision and natural language processing, yet its exploration in ophthalmology and broader medical applications remains limited. The challenge is the lack of labeled data for the training of foundation model. To handle this issue, a CLIP-style retinal image foundation model is developed in this paper. Our foundation model, RET-CLIP, is specifically trained on a dataset of 193,865 patients to extract general features of color fundus photographs (CFPs), employing a tripartite optimization strategy to focus on left eye, right eye, and patient level to reflect real-world clinical scenarios. Extensive experiments demonstrate that RET-CLIP outperforms existing benchmarks across eight diverse datasets spanning four critical diagnostic categories: diabetic retinopathy, glaucoma, multiple disease diagnosis, and multi-label classification of multiple diseases, which demonstrate the performance and generality of our foundation model. The sourse code and pre-trained model are available at https://github.com/sStonemason/RET-CLIP.

AB - The Vision-Language Foundation model is increasingly investigated in the fields of computer vision and natural language processing, yet its exploration in ophthalmology and broader medical applications remains limited. The challenge is the lack of labeled data for the training of foundation model. To handle this issue, a CLIP-style retinal image foundation model is developed in this paper. Our foundation model, RET-CLIP, is specifically trained on a dataset of 193,865 patients to extract general features of color fundus photographs (CFPs), employing a tripartite optimization strategy to focus on left eye, right eye, and patient level to reflect real-world clinical scenarios. Extensive experiments demonstrate that RET-CLIP outperforms existing benchmarks across eight diverse datasets spanning four critical diagnostic categories: diabetic retinopathy, glaucoma, multiple disease diagnosis, and multi-label classification of multiple diseases, which demonstrate the performance and generality of our foundation model. The sourse code and pre-trained model are available at https://github.com/sStonemason/RET-CLIP.

KW - Foundation Model

KW - Retinal Fundus Image

KW - Vision-Language Pre-training

UR - http://www.scopus.com/inward/record.url?scp=85208174466&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-72390-2_66

DO - 10.1007/978-3-031-72390-2_66

M3 - Conference contribution

AN - SCOPUS:85208174466

SN - 9783031723896

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 709

EP - 719

BT - Medical Image Computing and Computer Assisted Intervention – MICCAI 2024 - 27th International Conference, Proceedings

A2 - Linguraru, Marius George

A2 - Dou, Qi

A2 - Feragen, Aasa

A2 - Giannarou, Stamatia

A2 - Glocker, Ben

A2 - Lekadir, Karim

A2 - Schnabel, Julia A.

PB - Springer Science and Business Media Deutschland GmbH

Y2 - 6 October 2024 through 10 October 2024

ER -

Du J, Guo J, Zhang W, Yang S, Liu H, Li H 等. RET-CLIP: A Retinal Image Foundation Model Pre-trained with Clinical Diagnostic Reports. 在 Linguraru MG, Dou Q, Feragen A, Giannarou S, Glocker B, Lekadir K, Schnabel JA, 编辑, Medical Image Computing and Computer Assisted Intervention – MICCAI 2024 - 27th International Conference, Proceedings. Springer Science and Business Media Deutschland GmbH. 2024. 页码 709-719. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-031-72390-2_66

RET-CLIP: A Retinal Image Foundation Model Pre-trained with Clinical Diagnostic Reports

摘要

出版系列

会议

联合国可持续发展目标

访问文件

其它文件与链接

指纹

引用此