MGT: Modality-Guided Transformer for Infrared and Visible Image Fusion

Taoying Zhang; Hesong Li; Qiankun Liu; Xiaoyong Wang; Ying Fu

doi:10.1007/978-981-99-8429-9_26

MGT: Modality-Guided Transformer for Infrared and Visible Image Fusion

Taoying Zhang, Hesong Li, Qiankun Liu, Xiaoyong Wang^*, Ying Fu

^*Corresponding author for this work

School of Computer Science and Technology

Beijing Institute of Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

5 Citations (Scopus)

Abstract

Infrared and visible image fusion aims to generate high-quality fused images containing thermal radiation information from infrared images and texture information from visible images. Most deep learning-based methods are simple stacks of Transformer or convolution blocks and fail to further integrate the feature information of source images that may be missed in the fusion stage after generating the fused features. In this work, we develop a cross-attention-based macro framework, named Modality-Guided Transformer (MGT), that reintroduces detailed information from the two input images across multiple feature extraction layers into the initially obtained fused image. For efficiency, our MGT also introduces shared attention and multi-scale windows to reduce the computational costs of attention. Experimental results show that the proposed MGT outperforms state-of-the-art methods, especially in preserving salient targets and infrared texture details. Our code is publicly available at https://github.com/TaoYing-Zhang/MGT.

Original language	English
Title of host publication	Pattern Recognition and Computer Vision - 6th Chinese Conference, PRCV 2023, Proceedings
Editors	Qingshan Liu, Hanzi Wang, Rongrong Ji, Zhanyu Ma, Weishi Zheng, Hongbin Zha, Xilin Chen, Liang Wang
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	321-332
Number of pages	12
ISBN (Print)	9789819984282
DOIs	https://doi.org/10.1007/978-981-99-8429-9_26
Publication status	Published - 2024
Event	6th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2023 - Xiamen, China Duration: 13 Oct 2023 → 15 Oct 2023

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	14425 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	6th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2023
Country/Territory	China
City	Xiamen
Period	13/10/23 → 15/10/23

Keywords

Cross-attention
Infrared and visible image fusion
Modality-guided
Transformer

Access to Document

10.1007/978-981-99-8429-9_26

Cite this

Zhang, T., Li, H., Liu, Q., Wang, X., & Fu, Y. (2024). MGT: Modality-Guided Transformer for Infrared and Visible Image Fusion. In Q. Liu, H. Wang, R. Ji, Z. Ma, W. Zheng, H. Zha, X. Chen, & L. Wang (Eds.), Pattern Recognition and Computer Vision - 6th Chinese Conference, PRCV 2023, Proceedings (pp. 321-332). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 14425 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-99-8429-9_26

Zhang, Taoying ; Li, Hesong ; Liu, Qiankun et al. / MGT : Modality-Guided Transformer for Infrared and Visible Image Fusion. Pattern Recognition and Computer Vision - 6th Chinese Conference, PRCV 2023, Proceedings. editor / Qingshan Liu ; Hanzi Wang ; Rongrong Ji ; Zhanyu Ma ; Weishi Zheng ; Hongbin Zha ; Xilin Chen ; Liang Wang. Springer Science and Business Media Deutschland GmbH, 2024. pp. 321-332 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{a5abedf6235b43cb89c494ef4232f2a9,

title = "MGT: Modality-Guided Transformer for Infrared and Visible Image Fusion",

abstract = "Infrared and visible image fusion aims to generate high-quality fused images containing thermal radiation information from infrared images and texture information from visible images. Most deep learning-based methods are simple stacks of Transformer or convolution blocks and fail to further integrate the feature information of source images that may be missed in the fusion stage after generating the fused features. In this work, we develop a cross-attention-based macro framework, named Modality-Guided Transformer (MGT), that reintroduces detailed information from the two input images across multiple feature extraction layers into the initially obtained fused image. For efficiency, our MGT also introduces shared attention and multi-scale windows to reduce the computational costs of attention. Experimental results show that the proposed MGT outperforms state-of-the-art methods, especially in preserving salient targets and infrared texture details. Our code is publicly available at https://github.com/TaoYing-Zhang/MGT.",

keywords = "Cross-attention, Infrared and visible image fusion, Modality-guided, Transformer",

author = "Taoying Zhang and Hesong Li and Qiankun Liu and Xiaoyong Wang and Ying Fu",

note = "Publisher Copyright: {\textcopyright} 2024, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.; 6th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2023 ; Conference date: 13-10-2023 Through 15-10-2023",

year = "2024",

doi = "10.1007/978-981-99-8429-9_26",

language = "English",

isbn = "9789819984282",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "321--332",

editor = "Qingshan Liu and Hanzi Wang and Rongrong Ji and Zhanyu Ma and Weishi Zheng and Hongbin Zha and Xilin Chen and Liang Wang",

booktitle = "Pattern Recognition and Computer Vision - 6th Chinese Conference, PRCV 2023, Proceedings",

address = "Germany",

}

Zhang, T, Li, H, Liu, Q, Wang, X & Fu, Y 2024, MGT: Modality-Guided Transformer for Infrared and Visible Image Fusion. in Q Liu, H Wang, R Ji, Z Ma, W Zheng, H Zha, X Chen & L Wang (eds), Pattern Recognition and Computer Vision - 6th Chinese Conference, PRCV 2023, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 14425 LNCS, Springer Science and Business Media Deutschland GmbH, pp. 321-332, 6th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2023, Xiamen, China, 13/10/23. https://doi.org/10.1007/978-981-99-8429-9_26

MGT: Modality-Guided Transformer for Infrared and Visible Image Fusion. / Zhang, Taoying; Li, Hesong; Liu, Qiankun et al.
Pattern Recognition and Computer Vision - 6th Chinese Conference, PRCV 2023, Proceedings. ed. / Qingshan Liu; Hanzi Wang; Rongrong Ji; Zhanyu Ma; Weishi Zheng; Hongbin Zha; Xilin Chen; Liang Wang. Springer Science and Business Media Deutschland GmbH, 2024. p. 321-332 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 14425 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - MGT

T2 - 6th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2023

AU - Zhang, Taoying

AU - Li, Hesong

AU - Liu, Qiankun

AU - Wang, Xiaoyong

AU - Fu, Ying

PY - 2024

Y1 - 2024

N2 - Infrared and visible image fusion aims to generate high-quality fused images containing thermal radiation information from infrared images and texture information from visible images. Most deep learning-based methods are simple stacks of Transformer or convolution blocks and fail to further integrate the feature information of source images that may be missed in the fusion stage after generating the fused features. In this work, we develop a cross-attention-based macro framework, named Modality-Guided Transformer (MGT), that reintroduces detailed information from the two input images across multiple feature extraction layers into the initially obtained fused image. For efficiency, our MGT also introduces shared attention and multi-scale windows to reduce the computational costs of attention. Experimental results show that the proposed MGT outperforms state-of-the-art methods, especially in preserving salient targets and infrared texture details. Our code is publicly available at https://github.com/TaoYing-Zhang/MGT.

AB - Infrared and visible image fusion aims to generate high-quality fused images containing thermal radiation information from infrared images and texture information from visible images. Most deep learning-based methods are simple stacks of Transformer or convolution blocks and fail to further integrate the feature information of source images that may be missed in the fusion stage after generating the fused features. In this work, we develop a cross-attention-based macro framework, named Modality-Guided Transformer (MGT), that reintroduces detailed information from the two input images across multiple feature extraction layers into the initially obtained fused image. For efficiency, our MGT also introduces shared attention and multi-scale windows to reduce the computational costs of attention. Experimental results show that the proposed MGT outperforms state-of-the-art methods, especially in preserving salient targets and infrared texture details. Our code is publicly available at https://github.com/TaoYing-Zhang/MGT.

KW - Cross-attention

KW - Infrared and visible image fusion

KW - Modality-guided

KW - Transformer

UR - http://www.scopus.com/inward/record.url?scp=85180810006&partnerID=8YFLogxK

U2 - 10.1007/978-981-99-8429-9_26

DO - 10.1007/978-981-99-8429-9_26

M3 - Conference contribution

AN - SCOPUS:85180810006

SN - 9789819984282

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 321

EP - 332

BT - Pattern Recognition and Computer Vision - 6th Chinese Conference, PRCV 2023, Proceedings

A2 - Liu, Qingshan

A2 - Wang, Hanzi

A2 - Ji, Rongrong

A2 - Ma, Zhanyu

A2 - Zheng, Weishi

A2 - Zha, Hongbin

A2 - Chen, Xilin

A2 - Wang, Liang

PB - Springer Science and Business Media Deutschland GmbH

Y2 - 13 October 2023 through 15 October 2023

ER -

Zhang T, Li H, Liu Q, Wang X, Fu Y. MGT: Modality-Guided Transformer for Infrared and Visible Image Fusion. In Liu Q, Wang H, Ji R, Ma Z, Zheng W, Zha H, Chen X, Wang L, editors, Pattern Recognition and Computer Vision - 6th Chinese Conference, PRCV 2023, Proceedings. Springer Science and Business Media Deutschland GmbH. 2024. p. 321-332. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-981-99-8429-9_26