TY - JOUR
T1 - VTAG
T2 - Visual-Textual Association Guided Radiology Reports Generation
AU - Su, Zhaoli
AU - Lin, Yucong
AU - Song, Hong
AU - Jian, Ruoyi
AU - Liu, Bowen
AU - Yang, Jian
N1 - Publisher Copyright:
© 1992-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Radiology report generation, which automatically generates diagnostic textual reports from medical images, plays a crucial role in improving clinical efficiency and diagnostic accuracy. However, existing radiology report generation models face numerous challenges, such as lack of interpretability as well as description inaccuracy. To address these issues, we propose an integrated framework that enhances radiology report generation by combining target detection with contextual alignment of relevant region descriptions. Target detection focuses on clinically significant areas within medical images, while contextual alignment ensures that the generated text is directly linked to visual findings. Additionally, we introduce a full-spectrum feature fusion method that combines both high- and low-frequency features from the images. This approach captures details and broader structures, allowing the model to gain a more comprehensive and hierarchical understanding of the images. We validated the effectiveness of our method on the public dataset MIMIC-CXR. The results indicate that our method outperforms previous approaches on multiple evaluation metrics. Notably, in terms of the average of the six traditional metrics, our method (VTAG) achieved a significant improvement of 14.3%, compared to the state-of-the-art model MLRG.
AB - Radiology report generation, which automatically generates diagnostic textual reports from medical images, plays a crucial role in improving clinical efficiency and diagnostic accuracy. However, existing radiology report generation models face numerous challenges, such as lack of interpretability as well as description inaccuracy. To address these issues, we propose an integrated framework that enhances radiology report generation by combining target detection with contextual alignment of relevant region descriptions. Target detection focuses on clinically significant areas within medical images, while contextual alignment ensures that the generated text is directly linked to visual findings. Additionally, we introduce a full-spectrum feature fusion method that combines both high- and low-frequency features from the images. This approach captures details and broader structures, allowing the model to gain a more comprehensive and hierarchical understanding of the images. We validated the effectiveness of our method on the public dataset MIMIC-CXR. The results indicate that our method outperforms previous approaches on multiple evaluation metrics. Notably, in terms of the average of the six traditional metrics, our method (VTAG) achieved a significant improvement of 14.3%, compared to the state-of-the-art model MLRG.
KW - Domain knowledge query enhancement
KW - disease target region and description alignment
KW - full spectrum feature fusion
KW - radiology report generation
UR - https://www.scopus.com/pages/publications/105020413211
U2 - 10.1109/TIP.2025.3623915
DO - 10.1109/TIP.2025.3623915
M3 - Article
AN - SCOPUS:105020413211
SN - 1057-7149
VL - 34
SP - 7391
EP - 7406
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
ER -