TY - CHAP
T1 - Improving Radiology Report Generation with Adaptive Attention
AU - Wang, Lin
AU - Chen, Jie
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - To avoid the tedious and laborious radiology report writing, automatic radiology reports generation has drawn great attention in recent years. As vision to language task, visual features and language features are equally important for radiology report generation. However, previous methods mainly pay attention to generating fluent reports, which neglects the eminent importance of how to better extract and utilize vision information. Keeping this in mind, we propose a novel architecture with a CLIP-based visual extractor and Multi-Head Adaptive Attention (MHAA) module to address the above two issues: through the vision-language pretrained encoders, more sufficient visual information has been explored, then during report generation, MHAA controls the visual information participating in the generation of each word. Experiments conducted on two public datasets demonstrate that our method outperforms state-of-the-art methods on all the metrics.
AB - To avoid the tedious and laborious radiology report writing, automatic radiology reports generation has drawn great attention in recent years. As vision to language task, visual features and language features are equally important for radiology report generation. However, previous methods mainly pay attention to generating fluent reports, which neglects the eminent importance of how to better extract and utilize vision information. Keeping this in mind, we propose a novel architecture with a CLIP-based visual extractor and Multi-Head Adaptive Attention (MHAA) module to address the above two issues: through the vision-language pretrained encoders, more sufficient visual information has been explored, then during report generation, MHAA controls the visual information participating in the generation of each word. Experiments conducted on two public datasets demonstrate that our method outperforms state-of-the-art methods on all the metrics.
KW - Adaptive attention mechanism
KW - Radiology report generation
KW - Visual encoder
UR - http://www.scopus.com/inward/record.url?scp=85143127642&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-14771-5_21
DO - 10.1007/978-3-031-14771-5_21
M3 - Chapter
AN - SCOPUS:85143127642
T3 - Studies in Computational Intelligence
SP - 293
EP - 305
BT - Studies in Computational Intelligence
PB - Springer Science and Business Media Deutschland GmbH
ER -