TY - GEN
T1 - Cross-modal Network of Mining Text-knowledge for Radiology Report Generation
AU - Yan, Biyu
AU - Guan, Jifu
AU - Zhang, Yating
AU - Kang, Zhenyi
AU - Liu, Zhendong
N1 - Publisher Copyright:
© 2024 Technical Committee on Control Theory, Chinese Association of Automation.
PY - 2024
Y1 - 2024
N2 - To reduce the burden on radiologists and improve the accuracy of disease diagnosis, generating radiology reports automatically aims to automate the generation of accurate and seamless diagnostic reports from radiology images. However, the field still faces several challenges. Firstly, medical images are very similar, and fine-grained visual differences and data bias in the dataset can result in disease details being neglected. In addition, medical reports require a detailed and fluent representation of long paragraphs rather than a single short-sentence description. To address these limitations, this paper proposes a cross-modal network based on text-knowledge mining for radiology report generation. The model uses a Cross-modal Memory Network to facilitate image-text interaction. Then, we cluster ground truth reports and use the clustering results as a second label to learn fine-grained visual details related to the text. In addition, medical guiding vocabulary is introduced to improve the images' encoding capabilities to mitigate data bias. Our proposed method performs well on the benchmark dataset IU X-Ray, outperforming many state-of-the-art models. Furthermore, we provide ablation experiments to demonstrate the effectiveness of the proposed components.
AB - To reduce the burden on radiologists and improve the accuracy of disease diagnosis, generating radiology reports automatically aims to automate the generation of accurate and seamless diagnostic reports from radiology images. However, the field still faces several challenges. Firstly, medical images are very similar, and fine-grained visual differences and data bias in the dataset can result in disease details being neglected. In addition, medical reports require a detailed and fluent representation of long paragraphs rather than a single short-sentence description. To address these limitations, this paper proposes a cross-modal network based on text-knowledge mining for radiology report generation. The model uses a Cross-modal Memory Network to facilitate image-text interaction. Then, we cluster ground truth reports and use the clustering results as a second label to learn fine-grained visual details related to the text. In addition, medical guiding vocabulary is introduced to improve the images' encoding capabilities to mitigate data bias. Our proposed method performs well on the benchmark dataset IU X-Ray, outperforming many state-of-the-art models. Furthermore, we provide ablation experiments to demonstrate the effectiveness of the proposed components.
KW - Cross-modal Memory Networks
KW - Image Caption
KW - Medical Guiding Vocabulary
KW - Radiology Report Generation
KW - Text Clustering
UR - http://www.scopus.com/inward/record.url?scp=85205487912&partnerID=8YFLogxK
U2 - 10.23919/CCC63176.2024.10662724
DO - 10.23919/CCC63176.2024.10662724
M3 - Conference contribution
AN - SCOPUS:85205487912
T3 - Chinese Control Conference, CCC
SP - 8637
EP - 8642
BT - Proceedings of the 43rd Chinese Control Conference, CCC 2024
A2 - Na, Jing
A2 - Sun, Jian
PB - IEEE Computer Society
T2 - 43rd Chinese Control Conference, CCC 2024
Y2 - 28 July 2024 through 31 July 2024
ER -