Rethinking the CNN-Transformer Hybrid Architecture for Infrared and Visible Image Fusion from a Frequency Perspective

Research output: Contribution to journalArticlepeer-review

Abstract

CNN-Transformer hybrid architecture have demonstrated remarkable success in image fusion. However, existing studies often oversimplify the respective strengths of CNNs and Transformers, attributing them solely to their abilities to capture local and global information. This characterization is neither precise nor reflective of their fundamental differences, leading to network designs that rely heavily on empirical practices rather than theoretical insights. To address this limitation and enhance the interpretability of fusion networks, we conduct a Fourier analysis to elucidate the distinct roles of CNNs and Transformers from a novel frequency perspective. Our analysis reveals that CNNs and Transformers exhibit contrasting properties in the frequency domain. However, prior CNN-Transformer hybrid architectures have failed to leverage these characteristics, resulting in a loss of low frequency contextual information. Based on this finding, we propose the Frequency Complementary learning and Rebalancing Fusion Network (FCRNet), which leverages the complementary behaviors of CNNs and Transformers to optimize image fusion. Firstly, a frequency complementary learning mechanism is utilized to integrate the strengths of CNNs and Transformers in representation learning, effectively capturing essential low frequency features and rich high frequency details. Secondly, a frequency rebalancing module is designed to address the imbalance between high and low frequency components in the synthesized feature maps, ensuring their frequency distribution aligns with that of natural images. Extensive experiments on multiple benchmarks validate the effectiveness and superiority of our proposed FCRNet. Additionally, experiments on downstream tasks validates the practicality of FCRNet.

Original languageEnglish
JournalIEEE Transactions on Multimedia
DOIs
Publication statusAccepted/In press - 2026
Externally publishedYes

Keywords

  • CNN-Transformer hybrid architecture
  • Fourier analysis
  • frequency complementary learning
  • image fusion

Fingerprint

Dive into the research topics of 'Rethinking the CNN-Transformer Hybrid Architecture for Infrared and Visible Image Fusion from a Frequency Perspective'. Together they form a unique fingerprint.

Cite this