Cross-Modal Attentive Recalibration and Dynamic Fusion for Multispectral Pedestrian Detection

Wei Bao; Jingjing Hu; Meiyu Huang; Xueshuang Xiang

doi:10.1007/978-981-99-8429-9_40

Cross-Modal Attentive Recalibration and Dynamic Fusion for Multispectral Pedestrian Detection

Wei Bao, Jingjing Hu^*, Meiyu Huang, Xueshuang Xiang

^*此作品的通讯作者

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

2 引用（Scopus）

摘要

Multispectral pedestrian detection can provide accurate and reliable results from color-thermal modalities and has drawn much attention. However, how to effectively capture and leverage complementary information from multiple modalities for superior performance is still a core issue. This paper presents a Cross-Modal Attentive Recalibration and Dynamic Fusion Network (CMRF-Net) to adaptively recalibrate and dynamically fuse multi-modal features from multiple perspectives. CMRF-Net consists of a Cross-modal Attentive Feature Recalibration (CAFR) module and a Multi-Modal Dynamic Feature Fusion (MDFF) module in each feature extraction stage. The CAFR module recalibrates features by fully leveraging local and global complementary information in spatial- and channel-wise dimensions, leading to better cross-modal feature alignment and extraction. The MDFF module adopts dynamically learned convolutions to further exploit complementary information in kernel space, enabling more efficient multi-modal feature aggregation. Extensive experiments are conducted on three multispectral datasets to show the effectiveness and generalization of the proposed method and the state-of-the-art detection performance. Specifically, CMRF-Net can achieve 2.3% mAP gains over the baseline on FLIR dataset.

源语言	英语
主期刊名	Pattern Recognition and Computer Vision - 6th Chinese Conference, PRCV 2023, Proceedings
编辑	Qingshan Liu, Hanzi Wang, Rongrong Ji, Zhanyu Ma, Weishi Zheng, Hongbin Zha, Xilin Chen, Liang Wang
出版商	Springer Science and Business Media Deutschland GmbH
页	499-510
页数	12
ISBN（印刷版）	9789819984282
DOI	https://doi.org/10.1007/978-981-99-8429-9_40
出版状态	已出版 - 2024
活动	6th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2023 - Xiamen, 中国期限: 13 10月 2023 → 15 10月 2023

出版系列

姓名	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
卷	14425 LNCS
ISSN（印刷版）	0302-9743
ISSN（电子版）	1611-3349

会议

会议	6th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2023
国家/地区	中国
市	Xiamen
时期	13/10/23 → 15/10/23

访问文件

10.1007/978-981-99-8429-9_40

其它文件与链接

链接到 Scopus 的出版物

引用此

Bao, W., Hu, J., Huang, M., & Xiang, X. (2024). Cross-Modal Attentive Recalibration and Dynamic Fusion for Multispectral Pedestrian Detection. 在 Q. Liu, H. Wang, R. Ji, Z. Ma, W. Zheng, H. Zha, X. Chen, & L. Wang (编辑), Pattern Recognition and Computer Vision - 6th Chinese Conference, PRCV 2023, Proceedings (页码 499-510). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 14425 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-99-8429-9_40

Bao, Wei ; Hu, Jingjing ; Huang, Meiyu 等. / Cross-Modal Attentive Recalibration and Dynamic Fusion for Multispectral Pedestrian Detection. Pattern Recognition and Computer Vision - 6th Chinese Conference, PRCV 2023, Proceedings. 编辑 / Qingshan Liu ; Hanzi Wang ; Rongrong Ji ; Zhanyu Ma ; Weishi Zheng ; Hongbin Zha ; Xilin Chen ; Liang Wang. Springer Science and Business Media Deutschland GmbH, 2024. 页码 499-510 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{0a0d3ce9e10044c5b534cd2c2f1e4355,

title = "Cross-Modal Attentive Recalibration and Dynamic Fusion for Multispectral Pedestrian Detection",

abstract = "Multispectral pedestrian detection can provide accurate and reliable results from color-thermal modalities and has drawn much attention. However, how to effectively capture and leverage complementary information from multiple modalities for superior performance is still a core issue. This paper presents a Cross-Modal Attentive Recalibration and Dynamic Fusion Network (CMRF-Net) to adaptively recalibrate and dynamically fuse multi-modal features from multiple perspectives. CMRF-Net consists of a Cross-modal Attentive Feature Recalibration (CAFR) module and a Multi-Modal Dynamic Feature Fusion (MDFF) module in each feature extraction stage. The CAFR module recalibrates features by fully leveraging local and global complementary information in spatial- and channel-wise dimensions, leading to better cross-modal feature alignment and extraction. The MDFF module adopts dynamically learned convolutions to further exploit complementary information in kernel space, enabling more efficient multi-modal feature aggregation. Extensive experiments are conducted on three multispectral datasets to show the effectiveness and generalization of the proposed method and the state-of-the-art detection performance. Specifically, CMRF-Net can achieve 2.3% mAP gains over the baseline on FLIR dataset.",

keywords = "Cross-modal attentive feature recalibration, Multi-modal dynamic feature fusion, Multispectral pedestrian detection",

author = "Wei Bao and Jingjing Hu and Meiyu Huang and Xueshuang Xiang",

note = "Publisher Copyright: {\textcopyright} 2024, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.; 6th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2023 ; Conference date: 13-10-2023 Through 15-10-2023",

year = "2024",

doi = "10.1007/978-981-99-8429-9_40",

language = "English",

isbn = "9789819984282",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "499--510",

editor = "Qingshan Liu and Hanzi Wang and Rongrong Ji and Zhanyu Ma and Weishi Zheng and Hongbin Zha and Xilin Chen and Liang Wang",

booktitle = "Pattern Recognition and Computer Vision - 6th Chinese Conference, PRCV 2023, Proceedings",

address = "Germany",

}

Bao, W, Hu, J, Huang, M & Xiang, X 2024, Cross-Modal Attentive Recalibration and Dynamic Fusion for Multispectral Pedestrian Detection. 在 Q Liu, H Wang, R Ji, Z Ma, W Zheng, H Zha, X Chen & L Wang (编辑), Pattern Recognition and Computer Vision - 6th Chinese Conference, PRCV 2023, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 卷 14425 LNCS, Springer Science and Business Media Deutschland GmbH, 页码 499-510, 6th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2023, Xiamen, 中国, 13/10/23. https://doi.org/10.1007/978-981-99-8429-9_40

Cross-Modal Attentive Recalibration and Dynamic Fusion for Multispectral Pedestrian Detection. / Bao, Wei; Hu, Jingjing; Huang, Meiyu 等.
Pattern Recognition and Computer Vision - 6th Chinese Conference, PRCV 2023, Proceedings. 编辑 / Qingshan Liu; Hanzi Wang; Rongrong Ji; Zhanyu Ma; Weishi Zheng; Hongbin Zha; Xilin Chen; Liang Wang. Springer Science and Business Media Deutschland GmbH, 2024. 页码 499-510 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 14425 LNCS).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Cross-Modal Attentive Recalibration and Dynamic Fusion for Multispectral Pedestrian Detection

AU - Bao, Wei

AU - Hu, Jingjing

AU - Huang, Meiyu

AU - Xiang, Xueshuang

PY - 2024

Y1 - 2024

N2 - Multispectral pedestrian detection can provide accurate and reliable results from color-thermal modalities and has drawn much attention. However, how to effectively capture and leverage complementary information from multiple modalities for superior performance is still a core issue. This paper presents a Cross-Modal Attentive Recalibration and Dynamic Fusion Network (CMRF-Net) to adaptively recalibrate and dynamically fuse multi-modal features from multiple perspectives. CMRF-Net consists of a Cross-modal Attentive Feature Recalibration (CAFR) module and a Multi-Modal Dynamic Feature Fusion (MDFF) module in each feature extraction stage. The CAFR module recalibrates features by fully leveraging local and global complementary information in spatial- and channel-wise dimensions, leading to better cross-modal feature alignment and extraction. The MDFF module adopts dynamically learned convolutions to further exploit complementary information in kernel space, enabling more efficient multi-modal feature aggregation. Extensive experiments are conducted on three multispectral datasets to show the effectiveness and generalization of the proposed method and the state-of-the-art detection performance. Specifically, CMRF-Net can achieve 2.3% mAP gains over the baseline on FLIR dataset.

AB - Multispectral pedestrian detection can provide accurate and reliable results from color-thermal modalities and has drawn much attention. However, how to effectively capture and leverage complementary information from multiple modalities for superior performance is still a core issue. This paper presents a Cross-Modal Attentive Recalibration and Dynamic Fusion Network (CMRF-Net) to adaptively recalibrate and dynamically fuse multi-modal features from multiple perspectives. CMRF-Net consists of a Cross-modal Attentive Feature Recalibration (CAFR) module and a Multi-Modal Dynamic Feature Fusion (MDFF) module in each feature extraction stage. The CAFR module recalibrates features by fully leveraging local and global complementary information in spatial- and channel-wise dimensions, leading to better cross-modal feature alignment and extraction. The MDFF module adopts dynamically learned convolutions to further exploit complementary information in kernel space, enabling more efficient multi-modal feature aggregation. Extensive experiments are conducted on three multispectral datasets to show the effectiveness and generalization of the proposed method and the state-of-the-art detection performance. Specifically, CMRF-Net can achieve 2.3% mAP gains over the baseline on FLIR dataset.

KW - Cross-modal attentive feature recalibration

KW - Multi-modal dynamic feature fusion

KW - Multispectral pedestrian detection

UR - http://www.scopus.com/inward/record.url?scp=85180752282&partnerID=8YFLogxK

U2 - 10.1007/978-981-99-8429-9_40

DO - 10.1007/978-981-99-8429-9_40

M3 - Conference contribution

AN - SCOPUS:85180752282

SN - 9789819984282

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 499

EP - 510

BT - Pattern Recognition and Computer Vision - 6th Chinese Conference, PRCV 2023, Proceedings

A2 - Liu, Qingshan

A2 - Wang, Hanzi

A2 - Ji, Rongrong

A2 - Ma, Zhanyu

A2 - Zheng, Weishi

A2 - Zha, Hongbin

A2 - Chen, Xilin

A2 - Wang, Liang

PB - Springer Science and Business Media Deutschland GmbH

T2 - 6th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2023

Y2 - 13 October 2023 through 15 October 2023

ER -

Bao W, Hu J, Huang M, Xiang X. Cross-Modal Attentive Recalibration and Dynamic Fusion for Multispectral Pedestrian Detection. 在 Liu Q, Wang H, Ji R, Ma Z, Zheng W, Zha H, Chen X, Wang L, 编辑, Pattern Recognition and Computer Vision - 6th Chinese Conference, PRCV 2023, Proceedings. Springer Science and Business Media Deutschland GmbH. 2024. 页码 499-510. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-981-99-8429-9_40

Cross-Modal Attentive Recalibration and Dynamic Fusion for Multispectral Pedestrian Detection

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此