Abstract
Infrared and visible image (IVI) fusion retains the advantages of both modalities and is widely used in advanced vision tasks such as object detection. To improve the quality and visualization of fused images, a dual auto-encoder (dual-AE) and latent-guided diffusion fusion network (D2Fusion) for IVI is proposed. Firstly, D2Fusion uses a dual-branch residual AE to separately extract features from IVI, addressing the challenges of feature extraction discrepancies between the two modalities. Secondly, this paper introduces a dynamic cross-attention fusion network (DCAFNet) to deeply explore the spatial correlations between IVI features, enabling higher-quality fusion. Thirdly, the fused features are used as condition to guide the diffusion models, enhancing the fusion effect while improving cross-scene generalization. Finally, a multi-level dense-connected decoder (MDD) is proposed to generate the fused image. MDD allows deeper global information to complement shallow features better through dense connections, thereby enhancing the feature representation capability of the image. Extensive experiments on several public datasets show that our method achieves superior fusion performance and generates fused images better than existing state-of-the-art methods. Specifically, D2Fusion achieved EN (6.9131), SD (43.0563), and SSIM (0.9921) on the MSRS dataset.
| Original language | English |
|---|---|
| Article number | 131279 |
| Journal | Neurocomputing |
| Volume | 654 |
| DOIs | |
| Publication status | Published - 14 Nov 2025 |
| Externally published | Yes |
Keywords
- Auto-encoder
- Image fusion
- Latent-guided
Fingerprint
Dive into the research topics of 'D2Fusion: Dual-AE and latent-guided diffusion network for infrared and visible image fusion'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver