ISCDFuse: Interval sampling correlation driven visual state space models for multimodal image fusion

Lian Zhang, Lingxue Wang*, Yuzhen Wu, Mingkun Chen, Dezhi Zheng, Yi Cai

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Multimodal image fusion aims to retain functional highlights and detailed textures from different modalities. To address the shortcomings of existing methods regarding time complexity and cross-modal global information extraction efficiency, we propose a cross-domain distance learning image fusion framework based on Visual State Space Models (VSSMs) - the Interval Sampling Correlation-Driven Fusion Network (ISCDFuse). ISCDFuse employs a dual-branch feature extractor structure, comprising a Cross-domain Feature Association Encoder (CFAE), a High- and Low-frequency feature Extraction (HLoE) module, and a Vmamba-based Decoder (VD) for feature fusion and image generation. The CFAE traverses two modal space domains using an interval sampling cross-scan module, converting non-causal visual images into ordered patch sequences, thereby enhancing the correlation of global features across different modalities. The low-frequency feature extractors in the HLoE and VD modules both utilize a residual visual mamba structure, incorporating a multi-directional skip scanning approach that samples the image at bi-stride, enhancing deep semantic feature extraction and effectively modeling long-distance spatial dependencies. The high-frequency feature extractor employs the Invertible Neural Networks (INN) block to extract nuanced texture details. Extensive experiments have demonstrated that ISCDFuse delivers excellent fusion performance and fast speed across visible-infrared image fusion and medical image fusion. Notably, in unified benchmark tests, ISCDFuse significantly proves the practical value in downstream multimodal image processing, such as visible-infrared object detection.

Original languageEnglish
Article number130329
JournalNeurocomputing
Volume640
DOIs
Publication statusPublished - 1 Aug 2025

Keywords

  • Interval sampling
  • Multimodal image fusion
  • VIF and MIF
  • Visual State Space Models

Fingerprint

Dive into the research topics of 'ISCDFuse: Interval sampling correlation driven visual state space models for multimodal image fusion'. Together they form a unique fingerprint.

Cite this