Abstract
CNN-Transformer hybrid architecture have demonstrated remarkable success in image fusion. However, existing studies often oversimplify the respective strengths of CNNs and Transformers, attributing them solely to their abilities to capture local and global information. This characterization is neither precise nor reflective of their fundamental differences, leading to network designs that rely heavily on empirical practices rather than theoretical insights. To address this limitation and enhance the interpretability of fusion networks, we conduct a Fourier analysis to elucidate the distinct roles of CNNs and Transformers from a novel frequency perspective. Our analysis reveals that CNNs and Transformers exhibit contrasting properties in the frequency domain. However, prior CNN-Transformer hybrid architectures have failed to leverage these characteristics, resulting in a loss of low frequency contextual information. Based on this finding, we propose the Frequency Complementary learning and Rebalancing Fusion Network (FCRNet), which leverages the complementary behaviors of CNNs and Transformers to optimize image fusion. Firstly, a frequency complementary learning mechanism is utilized to integrate the strengths of CNNs and Transformers in representation learning, effectively capturing essential low frequency features and rich high frequency details. Secondly, a frequency rebalancing module is designed to address the imbalance between high and low frequency components in the synthesized feature maps, ensuring their frequency distribution aligns with that of natural images. Extensive experiments on multiple benchmarks validate the effectiveness and superiority of our proposed FCRNet. Additionally, experiments on downstream tasks validates the practicality of FCRNet.
| Original language | English |
|---|---|
| Journal | IEEE Transactions on Multimedia |
| DOIs | |
| Publication status | Accepted/In press - 2026 |
| Externally published | Yes |
Keywords
- CNN-Transformer hybrid architecture
- Fourier analysis
- frequency complementary learning
- image fusion
Fingerprint
Dive into the research topics of 'Rethinking the CNN-Transformer Hybrid Architecture for Infrared and Visible Image Fusion from a Frequency Perspective'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver