Abstract
Multi-modality fusion improves perceptual robustness and accuracy by leveraging multi-source sensor data. Current RGB-T fusion methods still falter with adverse illumination and weather. Recent advances in generative methods especially diffusion models have shown the ability to enhance visible images under adverse conditions. However, the fusion of RGB-T still suffer from cross-modal feature loss, sensitivity to environmental interference, and prolonged generation times. These limitations arise due to: (1) difficulties in sufficiently extracting modality-specific information only within shared forward networks; (2) neglecting the interference from adverse weather conditions; (3) the multi-step denoising process in diffusion-based models, which increases temporal cost. To overcome these challenges, we propose a novel conditional diffusion model for RGB-T image fusion, named CDMFusion, which incorporates: (1) a three-branch network designed for fusion to more fully preserve information; (2) a multi-scene adaptive feature enhancer that dynamically enhances valuable features while mitigating interference; (3) a novel skip patrol mechanism enabling high-quality generation via two-step denoising without extra training. Additionally, a new multi-scene RGB-T image dataset and a dataset with multi-interference are released for comprehensive evaluation. Experiments demonstrate our method achieves superior performance across 7 datasets compared to 14 state-of-the-art methods. Code and datasets are at https://github.com/yangluojie/CDM.
| Original language | English |
|---|---|
| Article number | 110437 |
| Journal | Signal Processing |
| Volume | 243 |
| DOIs | |
| Publication status | Published - Jun 2026 |
| Externally published | Yes |
Keywords
- Deep learning
- Diffusion model
- Enhancement
- Image fusion
- Image processing