Cross-modal Network of Mining Text-knowledge for Radiology Report Generation

Biyu Yan; Jifu Guan; Yating Zhang; Zhenyi Kang; Zhendong Liu

doi:10.23919/CCC63176.2024.10662724

Cross-modal Network of Mining Text-knowledge for Radiology Report Generation

Biyu Yan^*, Jifu Guan, Yating Zhang, Zhenyi Kang, Zhendong Liu

^*Corresponding author for this work

School of Mechanical Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

To reduce the burden on radiologists and improve the accuracy of disease diagnosis, generating radiology reports automatically aims to automate the generation of accurate and seamless diagnostic reports from radiology images. However, the field still faces several challenges. Firstly, medical images are very similar, and fine-grained visual differences and data bias in the dataset can result in disease details being neglected. In addition, medical reports require a detailed and fluent representation of long paragraphs rather than a single short-sentence description. To address these limitations, this paper proposes a cross-modal network based on text-knowledge mining for radiology report generation. The model uses a Cross-modal Memory Network to facilitate image-text interaction. Then, we cluster ground truth reports and use the clustering results as a second label to learn fine-grained visual details related to the text. In addition, medical guiding vocabulary is introduced to improve the images' encoding capabilities to mitigate data bias. Our proposed method performs well on the benchmark dataset IU X-Ray, outperforming many state-of-the-art models. Furthermore, we provide ablation experiments to demonstrate the effectiveness of the proposed components.

Original language	English
Title of host publication	Proceedings of the 43rd Chinese Control Conference, CCC 2024
Editors	Jing Na, Jian Sun
Publisher	IEEE Computer Society
Pages	8637-8642
Number of pages	6
ISBN (Electronic)	9789887581581
DOIs	https://doi.org/10.23919/CCC63176.2024.10662724
Publication status	Published - 2024
Event	43rd Chinese Control Conference, CCC 2024 - Kunming, China Duration: 28 Jul 2024 → 31 Jul 2024

Publication series

Name	Chinese Control Conference, CCC
ISSN (Print)	1934-1768
ISSN (Electronic)	2161-2927

Conference

Conference	43rd Chinese Control Conference, CCC 2024
Country/Territory	China
City	Kunming
Period	28/07/24 → 31/07/24

Keywords

Cross-modal Memory Networks
Image Caption
Medical Guiding Vocabulary
Radiology Report Generation
Text Clustering

Access to Document

10.23919/CCC63176.2024.10662724

Cite this

@inproceedings{a68acc4fdf7c420da987a9aa78c971a6,

title = "Cross-modal Network of Mining Text-knowledge for Radiology Report Generation",

abstract = "To reduce the burden on radiologists and improve the accuracy of disease diagnosis, generating radiology reports automatically aims to automate the generation of accurate and seamless diagnostic reports from radiology images. However, the field still faces several challenges. Firstly, medical images are very similar, and fine-grained visual differences and data bias in the dataset can result in disease details being neglected. In addition, medical reports require a detailed and fluent representation of long paragraphs rather than a single short-sentence description. To address these limitations, this paper proposes a cross-modal network based on text-knowledge mining for radiology report generation. The model uses a Cross-modal Memory Network to facilitate image-text interaction. Then, we cluster ground truth reports and use the clustering results as a second label to learn fine-grained visual details related to the text. In addition, medical guiding vocabulary is introduced to improve the images' encoding capabilities to mitigate data bias. Our proposed method performs well on the benchmark dataset IU X-Ray, outperforming many state-of-the-art models. Furthermore, we provide ablation experiments to demonstrate the effectiveness of the proposed components.",

keywords = "Cross-modal Memory Networks, Image Caption, Medical Guiding Vocabulary, Radiology Report Generation, Text Clustering",

author = "Biyu Yan and Jifu Guan and Yating Zhang and Zhenyi Kang and Zhendong Liu",

note = "Publisher Copyright: {\textcopyright} 2024 Technical Committee on Control Theory, Chinese Association of Automation.; 43rd Chinese Control Conference, CCC 2024 ; Conference date: 28-07-2024 Through 31-07-2024",

year = "2024",

doi = "10.23919/CCC63176.2024.10662724",

language = "English",

series = "Chinese Control Conference, CCC",

publisher = "IEEE Computer Society",

pages = "8637--8642",

editor = "Jing Na and Jian Sun",

booktitle = "Proceedings of the 43rd Chinese Control Conference, CCC 2024",

address = "United States",

}

Yan, B, Guan, J, Zhang, Y, Kang, Z & Liu, Z 2024, Cross-modal Network of Mining Text-knowledge for Radiology Report Generation. in J Na & J Sun (eds), Proceedings of the 43rd Chinese Control Conference, CCC 2024. Chinese Control Conference, CCC, IEEE Computer Society, pp. 8637-8642, 43rd Chinese Control Conference, CCC 2024, Kunming, China, 28/07/24. https://doi.org/10.23919/CCC63176.2024.10662724

Cross-modal Network of Mining Text-knowledge for Radiology Report Generation. / Yan, Biyu; Guan, Jifu; Zhang, Yating et al.
Proceedings of the 43rd Chinese Control Conference, CCC 2024. ed. / Jing Na; Jian Sun. IEEE Computer Society, 2024. p. 8637-8642 (Chinese Control Conference, CCC).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Cross-modal Network of Mining Text-knowledge for Radiology Report Generation

AU - Yan, Biyu

AU - Guan, Jifu

AU - Zhang, Yating

AU - Kang, Zhenyi

AU - Liu, Zhendong

PY - 2024

Y1 - 2024

N2 - To reduce the burden on radiologists and improve the accuracy of disease diagnosis, generating radiology reports automatically aims to automate the generation of accurate and seamless diagnostic reports from radiology images. However, the field still faces several challenges. Firstly, medical images are very similar, and fine-grained visual differences and data bias in the dataset can result in disease details being neglected. In addition, medical reports require a detailed and fluent representation of long paragraphs rather than a single short-sentence description. To address these limitations, this paper proposes a cross-modal network based on text-knowledge mining for radiology report generation. The model uses a Cross-modal Memory Network to facilitate image-text interaction. Then, we cluster ground truth reports and use the clustering results as a second label to learn fine-grained visual details related to the text. In addition, medical guiding vocabulary is introduced to improve the images' encoding capabilities to mitigate data bias. Our proposed method performs well on the benchmark dataset IU X-Ray, outperforming many state-of-the-art models. Furthermore, we provide ablation experiments to demonstrate the effectiveness of the proposed components.

AB - To reduce the burden on radiologists and improve the accuracy of disease diagnosis, generating radiology reports automatically aims to automate the generation of accurate and seamless diagnostic reports from radiology images. However, the field still faces several challenges. Firstly, medical images are very similar, and fine-grained visual differences and data bias in the dataset can result in disease details being neglected. In addition, medical reports require a detailed and fluent representation of long paragraphs rather than a single short-sentence description. To address these limitations, this paper proposes a cross-modal network based on text-knowledge mining for radiology report generation. The model uses a Cross-modal Memory Network to facilitate image-text interaction. Then, we cluster ground truth reports and use the clustering results as a second label to learn fine-grained visual details related to the text. In addition, medical guiding vocabulary is introduced to improve the images' encoding capabilities to mitigate data bias. Our proposed method performs well on the benchmark dataset IU X-Ray, outperforming many state-of-the-art models. Furthermore, we provide ablation experiments to demonstrate the effectiveness of the proposed components.

KW - Cross-modal Memory Networks

KW - Image Caption

KW - Medical Guiding Vocabulary

KW - Radiology Report Generation

KW - Text Clustering

UR - http://www.scopus.com/inward/record.url?scp=85205487912&partnerID=8YFLogxK

U2 - 10.23919/CCC63176.2024.10662724

DO - 10.23919/CCC63176.2024.10662724

M3 - Conference contribution

AN - SCOPUS:85205487912

T3 - Chinese Control Conference, CCC

SP - 8637

EP - 8642

BT - Proceedings of the 43rd Chinese Control Conference, CCC 2024

A2 - Na, Jing

A2 - Sun, Jian

PB - IEEE Computer Society

T2 - 43rd Chinese Control Conference, CCC 2024

Y2 - 28 July 2024 through 31 July 2024

ER -

Cross-modal Network of Mining Text-knowledge for Radiology Report Generation

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this