TY - GEN
T1 - Attention-Guided Multi-modal and Multi-scale Fusion for Multispectral Pedestrian Detection
AU - Bao, Wei
AU - Huang, Meiyu
AU - Hu, Jingjing
AU - Xiang, Xueshuang
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - Multispectral pedestrian detection provides more accurate and reliable detection results by leveraging complementary information from color-thermal modalities and has drawn much attention in the open world. Much progress has been made in the feature-level-based detection methods which aim to effectively fuse the multispectral features extracted by the convolution neural networks. However, existing methods mainly focus on the information integration between the same-level feature maps and ignore the complementary local features scattered in multi-scale layers. In this paper, we introduce an Attention-guided multi-Modal and multi-Scale Fusion (AMSF) module to simultaneously sample complementary local features scattered in multi-modal and multi-scale layers, and adaptively aggregate them with fine-grained attention to fully exploit different modalities for better multi-scale detection results. Extensive experiments are conducted on three multispectral datasets and three representative deep-learning-based detection benchmarks to show the effectiveness and generalization of the proposed method, and the state-of-the-art detection performance.
AB - Multispectral pedestrian detection provides more accurate and reliable detection results by leveraging complementary information from color-thermal modalities and has drawn much attention in the open world. Much progress has been made in the feature-level-based detection methods which aim to effectively fuse the multispectral features extracted by the convolution neural networks. However, existing methods mainly focus on the information integration between the same-level feature maps and ignore the complementary local features scattered in multi-scale layers. In this paper, we introduce an Attention-guided multi-Modal and multi-Scale Fusion (AMSF) module to simultaneously sample complementary local features scattered in multi-modal and multi-scale layers, and adaptively aggregate them with fine-grained attention to fully exploit different modalities for better multi-scale detection results. Extensive experiments are conducted on three multispectral datasets and three representative deep-learning-based detection benchmarks to show the effectiveness and generalization of the proposed method, and the state-of-the-art detection performance.
KW - Fine-grained attention
KW - Multi-modal and Multi-scale fusion
KW - Multispectral pedestrian detection
UR - http://www.scopus.com/inward/record.url?scp=85142674683&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-18907-4_30
DO - 10.1007/978-3-031-18907-4_30
M3 - Conference contribution
AN - SCOPUS:85142674683
SN - 9783031189067
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 382
EP - 393
BT - Pattern Recognition and Computer Vision - 5th Chinese Conference, PRCV 2022, Proceedings
A2 - Yu, Shiqi
A2 - Zhang, Jianguo
A2 - Zhang, Zhaoxiang
A2 - Tan, Tieniu
A2 - Yuen, Pong C.
A2 - Guo, Yike
A2 - Han, Junwei
A2 - Lai, Jianhuang
PB - Springer Science and Business Media Deutschland GmbH
T2 - 5th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2022
Y2 - 4 November 2022 through 7 November 2022
ER -