TY - JOUR
T1 - MixFuse
T2 - An iterative mix-attention transformer for multi-modal image fusion
AU - Li, Jinfu
AU - Song, Hong
AU - Liu, Lei
AU - Li, Yanan
AU - Xia, Jianghan
AU - Huang, Yuqi
AU - Fan, Jingfan
AU - Lin, Yucong
AU - Yang, Jian
N1 - Publisher Copyright:
© 2024
PY - 2025/2/1
Y1 - 2025/2/1
N2 - Multi-modal image fusion plays a crucial role in various visual systems. However, existing methods typically involve a multi-stage pipeline, i.e., feature extraction, integration, and reconstruction, which limits the effectiveness and efficiency of feature interaction and aggregation. In this paper, we propose MixFuse, a compact multi-modal image fusion framework based on Transformers. It smoothly unifies the process of feature extraction and integration, As its core, the Mix Attention Transformer Block (MATB) integrates the Cross-Attention Transformer Module (CATM) and the Self-Attention Transformer Module (SATM). The CATM introduces a symmetrical cross-attention mechanism to identify modality-specific and general features, filtering out irrelevant and redundant information. Meanwhile, the SATM is designed to refine the combined features via a self-attention mechanism, enhancing the internal consistency and proper preservation of the features. This successive cross and self-attention modules work together to enhance the generation of more accurate and refined feature maps, which are essential for later reconstruction. Extensive evaluation of MixFuse on five public datasets shows its superior performance and adaptability over state-of-the-art methods. The code and model will be released at https://github.com/Bitlijinfu/MixFuse.
AB - Multi-modal image fusion plays a crucial role in various visual systems. However, existing methods typically involve a multi-stage pipeline, i.e., feature extraction, integration, and reconstruction, which limits the effectiveness and efficiency of feature interaction and aggregation. In this paper, we propose MixFuse, a compact multi-modal image fusion framework based on Transformers. It smoothly unifies the process of feature extraction and integration, As its core, the Mix Attention Transformer Block (MATB) integrates the Cross-Attention Transformer Module (CATM) and the Self-Attention Transformer Module (SATM). The CATM introduces a symmetrical cross-attention mechanism to identify modality-specific and general features, filtering out irrelevant and redundant information. Meanwhile, the SATM is designed to refine the combined features via a self-attention mechanism, enhancing the internal consistency and proper preservation of the features. This successive cross and self-attention modules work together to enhance the generation of more accurate and refined feature maps, which are essential for later reconstruction. Extensive evaluation of MixFuse on five public datasets shows its superior performance and adaptability over state-of-the-art methods. The code and model will be released at https://github.com/Bitlijinfu/MixFuse.
KW - Cross-attention transformer
KW - Feature extraction
KW - Feature integration
KW - Multi-modal image fusion
KW - Self-attention transformer
UR - http://www.scopus.com/inward/record.url?scp=85205474841&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2024.125427
DO - 10.1016/j.eswa.2024.125427
M3 - Article
AN - SCOPUS:85205474841
SN - 0957-4174
VL - 261
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 125427
ER -