TY - JOUR
T1 - Degradation-Resistant Infrared-Visible Image Fusion With Auto-Generated Textual Objectives and Embedded Contrastive Learning
AU - Wang, Yuhao
AU - Miao, Lingjuan
AU - Zhou, Zhiqiang
AU - Qiao, Yajun
AU - Jiao, Yixuan
N1 - Publisher Copyright:
© 2000-2011 IEEE.
PY - 2026
Y1 - 2026
N2 - Infrared-visible image fusion aims to combine multi-modal image information to generate informative and robust scene representations, thereby enhancing perception capabilities and reliability in intelligent transportation systems. However, captured images often suffer from complex degradation issues, leading to low-quality source data. Existing methods are deficient in adapting to multiple degradation conditions, which limits their fusion performance. In this paper, we aim to develop a degradation-resistant image fusion method that automatically adapts to various degradations. For this purpose, we first construct an auto-generation prompt pipeline based on cascaded multi-modal and language models. It utilizes the vision-language understanding capabilities of large models to comprehensively detect degradation, then produces degradation prompts and corresponding text-based fusion objectives for each image. To resist degradations and produce the fusion results as described by fusion objectives, we next propose an embedded contrastive learning method within CLIP space to supervise the model training. This method ensures that the image fusion process is free from degradation and better aligned with the fusion objectives, which enhances the fusion model’s anti-degradation capability. Extensive experiments on public datasets validate the superiority and generalization ability of our method, and its robust degradation-adaptive capability makes it particularly suitable for complex scenes.
AB - Infrared-visible image fusion aims to combine multi-modal image information to generate informative and robust scene representations, thereby enhancing perception capabilities and reliability in intelligent transportation systems. However, captured images often suffer from complex degradation issues, leading to low-quality source data. Existing methods are deficient in adapting to multiple degradation conditions, which limits their fusion performance. In this paper, we aim to develop a degradation-resistant image fusion method that automatically adapts to various degradations. For this purpose, we first construct an auto-generation prompt pipeline based on cascaded multi-modal and language models. It utilizes the vision-language understanding capabilities of large models to comprehensively detect degradation, then produces degradation prompts and corresponding text-based fusion objectives for each image. To resist degradations and produce the fusion results as described by fusion objectives, we next propose an embedded contrastive learning method within CLIP space to supervise the model training. This method ensures that the image fusion process is free from degradation and better aligned with the fusion objectives, which enhances the fusion model’s anti-degradation capability. Extensive experiments on public datasets validate the superiority and generalization ability of our method, and its robust degradation-adaptive capability makes it particularly suitable for complex scenes.
KW - auto-generated fusion objective
KW - degradation resistance
KW - Embedded contrastive learning
KW - infrared and visible image fusion
UR - https://www.scopus.com/pages/publications/105039562396
U2 - 10.1109/TITS.2026.3691773
DO - 10.1109/TITS.2026.3691773
M3 - Article
AN - SCOPUS:105039562396
SN - 1524-9050
JO - IEEE Transactions on Intelligent Transportation Systems
JF - IEEE Transactions on Intelligent Transportation Systems
ER -