TY - GEN
T1 - Cross-Modal Attentive Recalibration and Dynamic Fusion for Multispectral Pedestrian Detection
AU - Bao, Wei
AU - Hu, Jingjing
AU - Huang, Meiyu
AU - Xiang, Xueshuang
N1 - Publisher Copyright:
© 2024, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
PY - 2024
Y1 - 2024
N2 - Multispectral pedestrian detection can provide accurate and reliable results from color-thermal modalities and has drawn much attention. However, how to effectively capture and leverage complementary information from multiple modalities for superior performance is still a core issue. This paper presents a Cross-Modal Attentive Recalibration and Dynamic Fusion Network (CMRF-Net) to adaptively recalibrate and dynamically fuse multi-modal features from multiple perspectives. CMRF-Net consists of a Cross-modal Attentive Feature Recalibration (CAFR) module and a Multi-Modal Dynamic Feature Fusion (MDFF) module in each feature extraction stage. The CAFR module recalibrates features by fully leveraging local and global complementary information in spatial- and channel-wise dimensions, leading to better cross-modal feature alignment and extraction. The MDFF module adopts dynamically learned convolutions to further exploit complementary information in kernel space, enabling more efficient multi-modal feature aggregation. Extensive experiments are conducted on three multispectral datasets to show the effectiveness and generalization of the proposed method and the state-of-the-art detection performance. Specifically, CMRF-Net can achieve 2.3% mAP gains over the baseline on FLIR dataset.
AB - Multispectral pedestrian detection can provide accurate and reliable results from color-thermal modalities and has drawn much attention. However, how to effectively capture and leverage complementary information from multiple modalities for superior performance is still a core issue. This paper presents a Cross-Modal Attentive Recalibration and Dynamic Fusion Network (CMRF-Net) to adaptively recalibrate and dynamically fuse multi-modal features from multiple perspectives. CMRF-Net consists of a Cross-modal Attentive Feature Recalibration (CAFR) module and a Multi-Modal Dynamic Feature Fusion (MDFF) module in each feature extraction stage. The CAFR module recalibrates features by fully leveraging local and global complementary information in spatial- and channel-wise dimensions, leading to better cross-modal feature alignment and extraction. The MDFF module adopts dynamically learned convolutions to further exploit complementary information in kernel space, enabling more efficient multi-modal feature aggregation. Extensive experiments are conducted on three multispectral datasets to show the effectiveness and generalization of the proposed method and the state-of-the-art detection performance. Specifically, CMRF-Net can achieve 2.3% mAP gains over the baseline on FLIR dataset.
KW - Cross-modal attentive feature recalibration
KW - Multi-modal dynamic feature fusion
KW - Multispectral pedestrian detection
UR - http://www.scopus.com/inward/record.url?scp=85180752282&partnerID=8YFLogxK
U2 - 10.1007/978-981-99-8429-9_40
DO - 10.1007/978-981-99-8429-9_40
M3 - Conference contribution
AN - SCOPUS:85180752282
SN - 9789819984282
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 499
EP - 510
BT - Pattern Recognition and Computer Vision - 6th Chinese Conference, PRCV 2023, Proceedings
A2 - Liu, Qingshan
A2 - Wang, Hanzi
A2 - Ji, Rongrong
A2 - Ma, Zhanyu
A2 - Zheng, Weishi
A2 - Zha, Hongbin
A2 - Chen, Xilin
A2 - Wang, Liang
PB - Springer Science and Business Media Deutschland GmbH
T2 - 6th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2023
Y2 - 13 October 2023 through 15 October 2023
ER -