D2Fusion: Dual-AE and latent-guided diffusion network for infrared and visible image fusion

  • Shizun Sun
  • , Junwei Xu
  • , Shuo Han
  • , Jie Zhao
  • , Bo Mo*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Infrared and visible image (IVI) fusion retains the advantages of both modalities and is widely used in advanced vision tasks such as object detection. To improve the quality and visualization of fused images, a dual auto-encoder (dual-AE) and latent-guided diffusion fusion network (D2Fusion) for IVI is proposed. Firstly, D2Fusion uses a dual-branch residual AE to separately extract features from IVI, addressing the challenges of feature extraction discrepancies between the two modalities. Secondly, this paper introduces a dynamic cross-attention fusion network (DCAFNet) to deeply explore the spatial correlations between IVI features, enabling higher-quality fusion. Thirdly, the fused features are used as condition to guide the diffusion models, enhancing the fusion effect while improving cross-scene generalization. Finally, a multi-level dense-connected decoder (MDD) is proposed to generate the fused image. MDD allows deeper global information to complement shallow features better through dense connections, thereby enhancing the feature representation capability of the image. Extensive experiments on several public datasets show that our method achieves superior fusion performance and generates fused images better than existing state-of-the-art methods. Specifically, D2Fusion achieved EN (6.9131), SD (43.0563), and SSIM (0.9921) on the MSRS dataset.

Original languageEnglish
Article number131279
JournalNeurocomputing
Volume654
DOIs
Publication statusPublished - 14 Nov 2025
Externally publishedYes

Keywords

  • Auto-encoder
  • Image fusion
  • Latent-guided

Fingerprint

Dive into the research topics of 'D2Fusion: Dual-AE and latent-guided diffusion network for infrared and visible image fusion'. Together they form a unique fingerprint.

Cite this