Improving Radiology Report Generation with Adaptive Attention

Lin Wang; Jie Chen

doi:10.1007/978-3-031-14771-5_21

Improving Radiology Report Generation with Adaptive Attention

Lin Wang, Jie Chen^*

^*此作品的通讯作者

Peking University

科研成果: 书/报告/会议事项章节 › 章节 › 同行评审

1 引用（Scopus）

摘要

To avoid the tedious and laborious radiology report writing, automatic radiology reports generation has drawn great attention in recent years. As vision to language task, visual features and language features are equally important for radiology report generation. However, previous methods mainly pay attention to generating fluent reports, which neglects the eminent importance of how to better extract and utilize vision information. Keeping this in mind, we propose a novel architecture with a CLIP-based visual extractor and Multi-Head Adaptive Attention (MHAA) module to address the above two issues: through the vision-language pretrained encoders, more sufficient visual information has been explored, then during report generation, MHAA controls the visual information participating in the generation of each word. Experiments conducted on two public datasets demonstrate that our method outperforms state-of-the-art methods on all the metrics.

源语言	英语
主期刊名	Studies in Computational Intelligence
出版商	Springer Science and Business Media Deutschland GmbH
页	293-305
页数	13
DOI	https://doi.org/10.1007/978-3-031-14771-5_21
出版状态	已出版 - 2023
已对外发布	是

出版系列

姓名	Studies in Computational Intelligence
卷	1060
ISSN（印刷版）	1860-949X
ISSN（电子版）	1860-9503

访问文件

10.1007/978-3-031-14771-5_21

其它文件与链接

链接到 Scopus 的出版物

引用此

Wang, L., & Chen, J. (2023). Improving Radiology Report Generation with Adaptive Attention. 在 Studies in Computational Intelligence (页码 293-305). (Studies in Computational Intelligence; 卷 1060). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-14771-5_21

@inbook{3cf79ff5c03e47f089e075d0d2059e5d,

title = "Improving Radiology Report Generation with Adaptive Attention",

abstract = "To avoid the tedious and laborious radiology report writing, automatic radiology reports generation has drawn great attention in recent years. As vision to language task, visual features and language features are equally important for radiology report generation. However, previous methods mainly pay attention to generating fluent reports, which neglects the eminent importance of how to better extract and utilize vision information. Keeping this in mind, we propose a novel architecture with a CLIP-based visual extractor and Multi-Head Adaptive Attention (MHAA) module to address the above two issues: through the vision-language pretrained encoders, more sufficient visual information has been explored, then during report generation, MHAA controls the visual information participating in the generation of each word. Experiments conducted on two public datasets demonstrate that our method outperforms state-of-the-art methods on all the metrics.",

keywords = "Adaptive attention mechanism, Radiology report generation, Visual encoder",

author = "Lin Wang and Jie Chen",

note = "Publisher Copyright: {\textcopyright} 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.",

year = "2023",

doi = "10.1007/978-3-031-14771-5_21",

language = "English",

series = "Studies in Computational Intelligence",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "293--305",

booktitle = "Studies in Computational Intelligence",

address = "Germany",

}

TY - CHAP

T1 - Improving Radiology Report Generation with Adaptive Attention

AU - Wang, Lin

AU - Chen, Jie

PY - 2023

Y1 - 2023

N2 - To avoid the tedious and laborious radiology report writing, automatic radiology reports generation has drawn great attention in recent years. As vision to language task, visual features and language features are equally important for radiology report generation. However, previous methods mainly pay attention to generating fluent reports, which neglects the eminent importance of how to better extract and utilize vision information. Keeping this in mind, we propose a novel architecture with a CLIP-based visual extractor and Multi-Head Adaptive Attention (MHAA) module to address the above two issues: through the vision-language pretrained encoders, more sufficient visual information has been explored, then during report generation, MHAA controls the visual information participating in the generation of each word. Experiments conducted on two public datasets demonstrate that our method outperforms state-of-the-art methods on all the metrics.

AB - To avoid the tedious and laborious radiology report writing, automatic radiology reports generation has drawn great attention in recent years. As vision to language task, visual features and language features are equally important for radiology report generation. However, previous methods mainly pay attention to generating fluent reports, which neglects the eminent importance of how to better extract and utilize vision information. Keeping this in mind, we propose a novel architecture with a CLIP-based visual extractor and Multi-Head Adaptive Attention (MHAA) module to address the above two issues: through the vision-language pretrained encoders, more sufficient visual information has been explored, then during report generation, MHAA controls the visual information participating in the generation of each word. Experiments conducted on two public datasets demonstrate that our method outperforms state-of-the-art methods on all the metrics.

KW - Adaptive attention mechanism

KW - Radiology report generation

KW - Visual encoder

UR - http://www.scopus.com/inward/record.url?scp=85143127642&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-14771-5_21

DO - 10.1007/978-3-031-14771-5_21

M3 - Chapter

AN - SCOPUS:85143127642

T3 - Studies in Computational Intelligence

SP - 293

EP - 305

BT - Studies in Computational Intelligence

PB - Springer Science and Business Media Deutschland GmbH

ER -

Improving Radiology Report Generation with Adaptive Attention

摘要

出版系列

访问文件

其它文件与链接

指纹

引用此