Cross-Modal Attentive Recalibration and Dynamic Fusion for Multispectral Pedestrian Detection

Wei Bao, Jingjing Hu*, Meiyu Huang, Xueshuang Xiang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

Multispectral pedestrian detection can provide accurate and reliable results from color-thermal modalities and has drawn much attention. However, how to effectively capture and leverage complementary information from multiple modalities for superior performance is still a core issue. This paper presents a Cross-Modal Attentive Recalibration and Dynamic Fusion Network (CMRF-Net) to adaptively recalibrate and dynamically fuse multi-modal features from multiple perspectives. CMRF-Net consists of a Cross-modal Attentive Feature Recalibration (CAFR) module and a Multi-Modal Dynamic Feature Fusion (MDFF) module in each feature extraction stage. The CAFR module recalibrates features by fully leveraging local and global complementary information in spatial- and channel-wise dimensions, leading to better cross-modal feature alignment and extraction. The MDFF module adopts dynamically learned convolutions to further exploit complementary information in kernel space, enabling more efficient multi-modal feature aggregation. Extensive experiments are conducted on three multispectral datasets to show the effectiveness and generalization of the proposed method and the state-of-the-art detection performance. Specifically, CMRF-Net can achieve 2.3% mAP gains over the baseline on FLIR dataset.

Original languageEnglish
Title of host publicationPattern Recognition and Computer Vision - 6th Chinese Conference, PRCV 2023, Proceedings
EditorsQingshan Liu, Hanzi Wang, Rongrong Ji, Zhanyu Ma, Weishi Zheng, Hongbin Zha, Xilin Chen, Liang Wang
PublisherSpringer Science and Business Media Deutschland GmbH
Pages499-510
Number of pages12
ISBN (Print)9789819984282
DOIs
Publication statusPublished - 2024
Event6th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2023 - Xiamen, China
Duration: 13 Oct 202315 Oct 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14425 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference6th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2023
Country/TerritoryChina
CityXiamen
Period13/10/2315/10/23

Keywords

  • Cross-modal attentive feature recalibration
  • Multi-modal dynamic feature fusion
  • Multispectral pedestrian detection

Fingerprint

Dive into the research topics of 'Cross-Modal Attentive Recalibration and Dynamic Fusion for Multispectral Pedestrian Detection'. Together they form a unique fingerprint.

Cite this