Cross-modal Network of Mining Text-knowledge for Radiology Report Generation

Biyu Yan*, Jifu Guan, Yating Zhang, Zhenyi Kang, Zhendong Liu

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

To reduce the burden on radiologists and improve the accuracy of disease diagnosis, generating radiology reports automatically aims to automate the generation of accurate and seamless diagnostic reports from radiology images. However, the field still faces several challenges. Firstly, medical images are very similar, and fine-grained visual differences and data bias in the dataset can result in disease details being neglected. In addition, medical reports require a detailed and fluent representation of long paragraphs rather than a single short-sentence description. To address these limitations, this paper proposes a cross-modal network based on text-knowledge mining for radiology report generation. The model uses a Cross-modal Memory Network to facilitate image-text interaction. Then, we cluster ground truth reports and use the clustering results as a second label to learn fine-grained visual details related to the text. In addition, medical guiding vocabulary is introduced to improve the images' encoding capabilities to mitigate data bias. Our proposed method performs well on the benchmark dataset IU X-Ray, outperforming many state-of-the-art models. Furthermore, we provide ablation experiments to demonstrate the effectiveness of the proposed components.

Original languageEnglish
Title of host publicationProceedings of the 43rd Chinese Control Conference, CCC 2024
EditorsJing Na, Jian Sun
PublisherIEEE Computer Society
Pages8637-8642
Number of pages6
ISBN (Electronic)9789887581581
DOIs
Publication statusPublished - 2024
Event43rd Chinese Control Conference, CCC 2024 - Kunming, China
Duration: 28 Jul 202431 Jul 2024

Publication series

NameChinese Control Conference, CCC
ISSN (Print)1934-1768
ISSN (Electronic)2161-2927

Conference

Conference43rd Chinese Control Conference, CCC 2024
Country/TerritoryChina
CityKunming
Period28/07/2431/07/24

Keywords

  • Cross-modal Memory Networks
  • Image Caption
  • Medical Guiding Vocabulary
  • Radiology Report Generation
  • Text Clustering

Fingerprint

Dive into the research topics of 'Cross-modal Network of Mining Text-knowledge for Radiology Report Generation'. Together they form a unique fingerprint.

Cite this