Enhancing discriminative ability in multimodal LLMs: A contrastive learning approach for CT report generation

Qingyong Su, Chong Feng, Ge Shi*, Bo Wang, Yan Zhuang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Automated CT report generation (CTRG) systems hold significant promise for enhancing clinical workflows. However, current approaches, including those leveraging advanced multimodal large language models (MLLMs), continue to face persistent challenges in ensuring quality and reliability of generated reports. A comprehensive analysis of representation dynamics within MLLM-based CTRG models in this study reveals two primary limitations: the entanglement of reports with varying quality in the representation space, and clinical detail blindness, which stems from traditional training paradigms that primarily focus on ground-truth reports. To address these limitations, we propose a novel contrastive learning framework with three main contributions: (1) a systematic method for generating clinically relevant hard negative reports using GPT-4, which introduces realistic but subtle clinical errors while maintaining report structure and plausibility; (2) a contrastive learning approach that leverages reports of varying quality to effectively disentangle quality representations and enhance the model's sensitivity to clinical details, and (3) a hard negative mining strategy designed to tackle false negatives and optimizing the sampling weights of negatives with varying degrees of semantic effectiveness. Extensive experiments on the CTRG-Chest-548K and CTRG-Brain-263K datasets demonstrate significant improvements in natural language generation (NLG) performance, including a 14% increases in BLEU-1 and 17% improvements in both BLEU-4 and ROUGE-L scores on the CTRG-Chest-548K dataset, compared to current state-of-the-art methods.

Original languageEnglish
Article number103240
JournalInformation Fusion
Volume123
DOIs
Publication statusPublished - Nov 2025
Externally publishedYes

Keywords

  • Contrastive learning
  • CT report generation
  • Multimodal LLMs
  • Representation learning

Fingerprint

Dive into the research topics of 'Enhancing discriminative ability in multimodal LLMs: A contrastive learning approach for CT report generation'. Together they form a unique fingerprint.

Cite this