Clinical-BERT: Vision-Language Pre-training for Radiograph Diagnosis and Reports Generation

Bin Yan; Mingtao Pei

Clinical-BERT: Vision-Language Pre-training for Radiograph Diagnosis and Reports Generation

Bin Yan, Mingtao Pei^*

^*此作品的通讯作者

计算机学院

Beijing Institute of Technology

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

32 引用（Scopus）

摘要

In this paper, we propose a vision-language pre-training model, Clinical-BERT, for the medical domain, and devise three domain-specific tasks: Clinical Diagnosis (CD), Masked MeSH Modeling (MMM), Image-MeSH Matching (IMM), together with one general pre-training task: Masked Language Modeling (MLM), to pre-train the model. The CD task helps the model to learn medical domain knowledge by predicting disease from radiographs. Medical Subject Headings (MeSH) words are important semantic components in radiograph reports, and the MMM task helps the model focus on the prediction of MeSH words. The IMM task helps the model learn the alignment of MeSH words with radiographs by matching scores obtained by a two-level sparse attention: region sparse attention and word sparse attention. Region sparse attention generates corresponding visual features for each word, and word sparse attention enhances the contribution of images-MeSH matching to the matching scores. To the best of our knowledge, this is the first attempt to learn domain knowledge during pre-training for the medical domain. We evaluate the pre-training model on Radiograph Diagnosis and Reports Generation tasks across four challenging datasets: MIMIC-CXR, IU X-Ray, COV-CTR, and NIH, and achieve state-of-the-art results for all the tasks, which demonstrates the effectiveness of our pre-training model.

源语言	英语
主期刊名	AAAI-22 Technical Tracks 3
出版商	Association for the Advancement of Artificial Intelligence
页	2982-2990
页数	9
ISBN（电子版）	1577358767, 9781577358763
出版状态	已出版 - 30 6月 2022
活动	36th AAAI Conference on Artificial Intelligence, AAAI 2022 - Virtual, Online 期限: 22 2月 2022 → 1 3月 2022

出版系列

姓名	Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022
卷	36

会议

会议	36th AAAI Conference on Artificial Intelligence, AAAI 2022
市	Virtual, Online
时期	22/02/22 → 1/03/22

其它文件与链接

链接到 Scopus 的出版物

引用此

@inproceedings{d4516855b2dd4fe795f2b42feb467488,

title = "Clinical-BERT: Vision-Language Pre-training for Radiograph Diagnosis and Reports Generation",

abstract = "In this paper, we propose a vision-language pre-training model, Clinical-BERT, for the medical domain, and devise three domain-specific tasks: Clinical Diagnosis (CD), Masked MeSH Modeling (MMM), Image-MeSH Matching (IMM), together with one general pre-training task: Masked Language Modeling (MLM), to pre-train the model. The CD task helps the model to learn medical domain knowledge by predicting disease from radiographs. Medical Subject Headings (MeSH) words are important semantic components in radiograph reports, and the MMM task helps the model focus on the prediction of MeSH words. The IMM task helps the model learn the alignment of MeSH words with radiographs by matching scores obtained by a two-level sparse attention: region sparse attention and word sparse attention. Region sparse attention generates corresponding visual features for each word, and word sparse attention enhances the contribution of images-MeSH matching to the matching scores. To the best of our knowledge, this is the first attempt to learn domain knowledge during pre-training for the medical domain. We evaluate the pre-training model on Radiograph Diagnosis and Reports Generation tasks across four challenging datasets: MIMIC-CXR, IU X-Ray, COV-CTR, and NIH, and achieve state-of-the-art results for all the tasks, which demonstrates the effectiveness of our pre-training model.",

author = "Bin Yan and Mingtao Pei",

note = "Publisher Copyright: Copyright {\textcopyright} 2022, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.; 36th AAAI Conference on Artificial Intelligence, AAAI 2022 ; Conference date: 22-02-2022 Through 01-03-2022",

year = "2022",

month = jun,

day = "30",

language = "English",

series = "Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022",

publisher = "Association for the Advancement of Artificial Intelligence",

pages = "2982--2990",

booktitle = "AAAI-22 Technical Tracks 3",

}

Yan, B & Pei, M 2022, Clinical-BERT: Vision-Language Pre-training for Radiograph Diagnosis and Reports Generation. 在 AAAI-22 Technical Tracks 3. Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022, 卷 36, Association for the Advancement of Artificial Intelligence, 页码 2982-2990, 36th AAAI Conference on Artificial Intelligence, AAAI 2022, Virtual, Online, 22/02/22.

Clinical-BERT: Vision-Language Pre-training for Radiograph Diagnosis and Reports Generation. / Yan, Bin; Pei, Mingtao.
AAAI-22 Technical Tracks 3. Association for the Advancement of Artificial Intelligence, 2022. 页码 2982-2990 (Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022; 卷 36).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Clinical-BERT

T2 - 36th AAAI Conference on Artificial Intelligence, AAAI 2022

AU - Yan, Bin

AU - Pei, Mingtao

PY - 2022/6/30

Y1 - 2022/6/30

N2 - In this paper, we propose a vision-language pre-training model, Clinical-BERT, for the medical domain, and devise three domain-specific tasks: Clinical Diagnosis (CD), Masked MeSH Modeling (MMM), Image-MeSH Matching (IMM), together with one general pre-training task: Masked Language Modeling (MLM), to pre-train the model. The CD task helps the model to learn medical domain knowledge by predicting disease from radiographs. Medical Subject Headings (MeSH) words are important semantic components in radiograph reports, and the MMM task helps the model focus on the prediction of MeSH words. The IMM task helps the model learn the alignment of MeSH words with radiographs by matching scores obtained by a two-level sparse attention: region sparse attention and word sparse attention. Region sparse attention generates corresponding visual features for each word, and word sparse attention enhances the contribution of images-MeSH matching to the matching scores. To the best of our knowledge, this is the first attempt to learn domain knowledge during pre-training for the medical domain. We evaluate the pre-training model on Radiograph Diagnosis and Reports Generation tasks across four challenging datasets: MIMIC-CXR, IU X-Ray, COV-CTR, and NIH, and achieve state-of-the-art results for all the tasks, which demonstrates the effectiveness of our pre-training model.

AB - In this paper, we propose a vision-language pre-training model, Clinical-BERT, for the medical domain, and devise three domain-specific tasks: Clinical Diagnosis (CD), Masked MeSH Modeling (MMM), Image-MeSH Matching (IMM), together with one general pre-training task: Masked Language Modeling (MLM), to pre-train the model. The CD task helps the model to learn medical domain knowledge by predicting disease from radiographs. Medical Subject Headings (MeSH) words are important semantic components in radiograph reports, and the MMM task helps the model focus on the prediction of MeSH words. The IMM task helps the model learn the alignment of MeSH words with radiographs by matching scores obtained by a two-level sparse attention: region sparse attention and word sparse attention. Region sparse attention generates corresponding visual features for each word, and word sparse attention enhances the contribution of images-MeSH matching to the matching scores. To the best of our knowledge, this is the first attempt to learn domain knowledge during pre-training for the medical domain. We evaluate the pre-training model on Radiograph Diagnosis and Reports Generation tasks across four challenging datasets: MIMIC-CXR, IU X-Ray, COV-CTR, and NIH, and achieve state-of-the-art results for all the tasks, which demonstrates the effectiveness of our pre-training model.

UR - http://www.scopus.com/inward/record.url?scp=85146499554&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85146499554

T3 - Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022

SP - 2982

EP - 2990

BT - AAAI-22 Technical Tracks 3

PB - Association for the Advancement of Artificial Intelligence

Y2 - 22 February 2022 through 1 March 2022

ER -

Clinical-BERT: Vision-Language Pre-training for Radiograph Diagnosis and Reports Generation

摘要

出版系列

会议

其它文件与链接

指纹

引用此