VTAG: Visual-Textual Association Guided Radiology Reports Generation

Research output: Contribution to journalArticlepeer-review

Abstract

Radiology report generation, which automatically generates diagnostic textual reports from medical images, plays a crucial role in improving clinical efficiency and diagnostic accuracy. However, existing radiology report generation models face numerous challenges, such as lack of interpretability as well as description inaccuracy. To address these issues, we propose an integrated framework that enhances radiology report generation by combining target detection with contextual alignment of relevant region descriptions. Target detection focuses on clinically significant areas within medical images, while contextual alignment ensures that the generated text is directly linked to visual findings. Additionally, we introduce a full-spectrum feature fusion method that combines both high- and low-frequency features from the images. This approach captures details and broader structures, allowing the model to gain a more comprehensive and hierarchical understanding of the images. We validated the effectiveness of our method on the public dataset MIMIC-CXR. The results indicate that our method outperforms previous approaches on multiple evaluation metrics. Notably, in terms of the average of the six traditional metrics, our method (VTAG) achieved a significant improvement of 14.3%, compared to the state-of-the-art model MLRG.

Original languageEnglish
Pages (from-to)7391-7406
Number of pages16
JournalIEEE Transactions on Image Processing
Volume34
DOIs
Publication statusPublished - 2025

Keywords

  • Domain knowledge query enhancement
  • disease target region and description alignment
  • full spectrum feature fusion
  • radiology report generation

Fingerprint

Dive into the research topics of 'VTAG: Visual-Textual Association Guided Radiology Reports Generation'. Together they form a unique fingerprint.

Cite this