TY - GEN
T1 - Automatic Radiology Reports Generation via Memory Alignment Network
AU - Shen, Hongyu
AU - Pei, Mingtao
AU - Liu, Juncai
AU - Tian, Zhaoxing
N1 - Publisher Copyright:
Copyright © 2024, Association for the Advancement of Artificial Intelligence.
PY - 2024/3/25
Y1 - 2024/3/25
N2 - The automatic generation of radiology reports is of great significance, which can reduce the workload of doctors and improve the accuracy and reliability of medical diagnosis and treatment, and has attracted wide attention in recent years. Cross-modal mapping between images and text, a key component of generating high-quality reports, is challenging due to the lack of corresponding annotations. Despite its importance, previous studies have often overlooked it or lacked adequate designs for this crucial component. In this paper, we propose a method with memory alignment embedding to assist the model in aligning visual and textual features to generate a coherent and informative report. Specifically, we first get the memory alignment embedding by querying the memory matrix, where the query is derived from a combination of the visual features and their corresponding positional embeddings. Then the alignment between the visual and textual features can be guided by the memory alignment embedding during the generation process. The comparison experiments with other alignment methods show that the proposed alignment method is less costly and more effective. The proposed approach achieves better performance than state-of-the-art approaches on two public datasets IU X-Ray and MIMIC-CXR, which further demonstrates the effectiveness of the proposed alignment method.
AB - The automatic generation of radiology reports is of great significance, which can reduce the workload of doctors and improve the accuracy and reliability of medical diagnosis and treatment, and has attracted wide attention in recent years. Cross-modal mapping between images and text, a key component of generating high-quality reports, is challenging due to the lack of corresponding annotations. Despite its importance, previous studies have often overlooked it or lacked adequate designs for this crucial component. In this paper, we propose a method with memory alignment embedding to assist the model in aligning visual and textual features to generate a coherent and informative report. Specifically, we first get the memory alignment embedding by querying the memory matrix, where the query is derived from a combination of the visual features and their corresponding positional embeddings. Then the alignment between the visual and textual features can be guided by the memory alignment embedding during the generation process. The comparison experiments with other alignment methods show that the proposed alignment method is less costly and more effective. The proposed approach achieves better performance than state-of-the-art approaches on two public datasets IU X-Ray and MIMIC-CXR, which further demonstrates the effectiveness of the proposed alignment method.
UR - http://www.scopus.com/inward/record.url?scp=85189533312&partnerID=8YFLogxK
U2 - 10.1609/aaai.v38i5.28279
DO - 10.1609/aaai.v38i5.28279
M3 - Conference contribution
AN - SCOPUS:85189533312
T3 - Proceedings of the AAAI Conference on Artificial Intelligence
SP - 4776
EP - 4783
BT - Technical Tracks 14
A2 - Wooldridge, Michael
A2 - Dy, Jennifer
A2 - Natarajan, Sriraam
PB - Association for the Advancement of Artificial Intelligence
T2 - 38th AAAI Conference on Artificial Intelligence, AAAI 2024
Y2 - 20 February 2024 through 27 February 2024
ER -