TY - GEN
T1 - Focusing on Abnormal
T2 - 32nd International Conference on Neural Information Processing, ICONIP 2025
AU - Chen, Haoquan
AU - Pei, Mingtao
AU - Nie, Zhengang
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2026.
PY - 2026
Y1 - 2026
N2 - Automating medical report generation from radiology images is vital for accurate, standardized diagnoses. Abnormal regions in medical images, often small and rare, are challenging to detect, leading models to overlook critical disease features and generate repetitive healthy content. We propose V-C2SE, a novel model that enhances abnormality detection through two key strategies within the visual modality: 1) Visual Contrastive Classification (VC2), which aligns disease-specific features across random samples using contrastive learning, improving the model’s focus on abnormal semantics during encoding; 2) Visual Semantic Enhancement (VSE), which constructs healthy templates to amplify abnormal features in a feature-space augmentation paradigm, ensuring precise report generation. By leveraging contrastive learning and healthy templates, V-C2SE detects subtle abnormalities with high precision and generates clinically relevant reports. Evaluated on IU X-Ray and MIMIC-CXR datasets, V-C2SE achieves competitive results with state-of-the-art methods across natural language generation (NLG) and clinical efficacy (CE) metrics, producing high-quality, semantically accurate reports. Our approach addresses the critical challenge of focusing on rare abnormalities and enhancing diagnostic efficiency.
AB - Automating medical report generation from radiology images is vital for accurate, standardized diagnoses. Abnormal regions in medical images, often small and rare, are challenging to detect, leading models to overlook critical disease features and generate repetitive healthy content. We propose V-C2SE, a novel model that enhances abnormality detection through two key strategies within the visual modality: 1) Visual Contrastive Classification (VC2), which aligns disease-specific features across random samples using contrastive learning, improving the model’s focus on abnormal semantics during encoding; 2) Visual Semantic Enhancement (VSE), which constructs healthy templates to amplify abnormal features in a feature-space augmentation paradigm, ensuring precise report generation. By leveraging contrastive learning and healthy templates, V-C2SE detects subtle abnormalities with high precision and generates clinically relevant reports. Evaluated on IU X-Ray and MIMIC-CXR datasets, V-C2SE achieves competitive results with state-of-the-art methods across natural language generation (NLG) and clinical efficacy (CE) metrics, producing high-quality, semantically accurate reports. Our approach addresses the critical challenge of focusing on rare abnormalities and enhancing diagnostic efficiency.
KW - Contrastive Classification
KW - Focus on Abnormal
KW - Medical Report Generation
KW - Semantic Enhancement
UR - https://www.scopus.com/pages/publications/105023303698
U2 - 10.1007/978-981-95-4100-3_10
DO - 10.1007/978-981-95-4100-3_10
M3 - Conference contribution
AN - SCOPUS:105023303698
SN - 9789819540990
T3 - Communications in Computer and Information Science
SP - 132
EP - 147
BT - Neural Information Processing - 32nd International Conference, ICONIP 2025, Proceedings
A2 - Taniguchi, Tadahiro
A2 - Kozuno, Tadashi
A2 - Leung, Chi Sing Andrew
A2 - Yoshimoto, Junichiro
A2 - Mahmud, Mufti
A2 - Doborjeh, Maryam
A2 - Doya, Kenji
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 20 November 2025 through 24 November 2025
ER -