Attention-Guided Multi-modal and Multi-scale Fusion for Multispectral Pedestrian Detection

Wei Bao; Meiyu Huang; Jingjing Hu; Xueshuang Xiang

doi:10.1007/978-3-031-18907-4_30

Attention-Guided Multi-modal and Multi-scale Fusion for Multispectral Pedestrian Detection

Wei Bao, Meiyu Huang^*, Jingjing Hu, Xueshuang Xiang

^*此作品的通讯作者

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

1 引用（Scopus）

摘要

Multispectral pedestrian detection provides more accurate and reliable detection results by leveraging complementary information from color-thermal modalities and has drawn much attention in the open world. Much progress has been made in the feature-level-based detection methods which aim to effectively fuse the multispectral features extracted by the convolution neural networks. However, existing methods mainly focus on the information integration between the same-level feature maps and ignore the complementary local features scattered in multi-scale layers. In this paper, we introduce an Attention-guided multi-Modal and multi-Scale Fusion (AMSF) module to simultaneously sample complementary local features scattered in multi-modal and multi-scale layers, and adaptively aggregate them with fine-grained attention to fully exploit different modalities for better multi-scale detection results. Extensive experiments are conducted on three multispectral datasets and three representative deep-learning-based detection benchmarks to show the effectiveness and generalization of the proposed method, and the state-of-the-art detection performance.

源语言	英语
主期刊名	Pattern Recognition and Computer Vision - 5th Chinese Conference, PRCV 2022, Proceedings
编辑	Shiqi Yu, Jianguo Zhang, Zhaoxiang Zhang, Tieniu Tan, Pong C. Yuen, Yike Guo, Junwei Han, Jianhuang Lai
出版商	Springer Science and Business Media Deutschland GmbH
页	382-393
页数	12
ISBN（印刷版）	9783031189067
DOI	https://doi.org/10.1007/978-3-031-18907-4_30
出版状态	已出版 - 2022
活动	5th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2022 - Shenzhen, 中国期限: 4 11月 2022 → 7 11月 2022

出版系列

姓名	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
卷	13534 LNCS
ISSN（印刷版）	0302-9743
ISSN（电子版）	1611-3349

会议

会议	5th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2022
国家/地区	中国
市	Shenzhen
时期	4/11/22 → 7/11/22

访问文件

10.1007/978-3-031-18907-4_30

其它文件与链接

链接到 Scopus 的出版物

引用此

Bao, W., Huang, M., Hu, J., & Xiang, X. (2022). Attention-Guided Multi-modal and Multi-scale Fusion for Multispectral Pedestrian Detection. 在 S. Yu, J. Zhang, Z. Zhang, T. Tan, P. C. Yuen, Y. Guo, J. Han, & J. Lai (编辑), Pattern Recognition and Computer Vision - 5th Chinese Conference, PRCV 2022, Proceedings (页码 382-393). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 13534 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-18907-4_30

Bao, Wei ; Huang, Meiyu ; Hu, Jingjing 等. / Attention-Guided Multi-modal and Multi-scale Fusion for Multispectral Pedestrian Detection. Pattern Recognition and Computer Vision - 5th Chinese Conference, PRCV 2022, Proceedings. 编辑 / Shiqi Yu ; Jianguo Zhang ; Zhaoxiang Zhang ; Tieniu Tan ; Pong C. Yuen ; Yike Guo ; Junwei Han ; Jianhuang Lai. Springer Science and Business Media Deutschland GmbH, 2022. 页码 382-393 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{b7453e4c4f5e408db61df9edf88aa242,

title = "Attention-Guided Multi-modal and Multi-scale Fusion for Multispectral Pedestrian Detection",

abstract = "Multispectral pedestrian detection provides more accurate and reliable detection results by leveraging complementary information from color-thermal modalities and has drawn much attention in the open world. Much progress has been made in the feature-level-based detection methods which aim to effectively fuse the multispectral features extracted by the convolution neural networks. However, existing methods mainly focus on the information integration between the same-level feature maps and ignore the complementary local features scattered in multi-scale layers. In this paper, we introduce an Attention-guided multi-Modal and multi-Scale Fusion (AMSF) module to simultaneously sample complementary local features scattered in multi-modal and multi-scale layers, and adaptively aggregate them with fine-grained attention to fully exploit different modalities for better multi-scale detection results. Extensive experiments are conducted on three multispectral datasets and three representative deep-learning-based detection benchmarks to show the effectiveness and generalization of the proposed method, and the state-of-the-art detection performance.",

keywords = "Fine-grained attention, Multi-modal and Multi-scale fusion, Multispectral pedestrian detection",

author = "Wei Bao and Meiyu Huang and Jingjing Hu and Xueshuang Xiang",

note = "Publisher Copyright: {\textcopyright} 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.; 5th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2022 ; Conference date: 04-11-2022 Through 07-11-2022",

year = "2022",

doi = "10.1007/978-3-031-18907-4_30",

language = "English",

isbn = "9783031189067",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "382--393",

editor = "Shiqi Yu and Jianguo Zhang and Zhaoxiang Zhang and Tieniu Tan and Yuen, {Pong C.} and Yike Guo and Junwei Han and Jianhuang Lai",

booktitle = "Pattern Recognition and Computer Vision - 5th Chinese Conference, PRCV 2022, Proceedings",

address = "Germany",

}

Bao, W, Huang, M, Hu, J & Xiang, X 2022, Attention-Guided Multi-modal and Multi-scale Fusion for Multispectral Pedestrian Detection. 在 S Yu, J Zhang, Z Zhang, T Tan, PC Yuen, Y Guo, J Han & J Lai (编辑), Pattern Recognition and Computer Vision - 5th Chinese Conference, PRCV 2022, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 卷 13534 LNCS, Springer Science and Business Media Deutschland GmbH, 页码 382-393, 5th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2022, Shenzhen, 中国, 4/11/22. https://doi.org/10.1007/978-3-031-18907-4_30

Attention-Guided Multi-modal and Multi-scale Fusion for Multispectral Pedestrian Detection. / Bao, Wei; Huang, Meiyu; Hu, Jingjing 等.
Pattern Recognition and Computer Vision - 5th Chinese Conference, PRCV 2022, Proceedings. 编辑 / Shiqi Yu; Jianguo Zhang; Zhaoxiang Zhang; Tieniu Tan; Pong C. Yuen; Yike Guo; Junwei Han; Jianhuang Lai. Springer Science and Business Media Deutschland GmbH, 2022. 页码 382-393 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 13534 LNCS).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Attention-Guided Multi-modal and Multi-scale Fusion for Multispectral Pedestrian Detection

AU - Bao, Wei

AU - Huang, Meiyu

AU - Hu, Jingjing

AU - Xiang, Xueshuang

PY - 2022

Y1 - 2022

N2 - Multispectral pedestrian detection provides more accurate and reliable detection results by leveraging complementary information from color-thermal modalities and has drawn much attention in the open world. Much progress has been made in the feature-level-based detection methods which aim to effectively fuse the multispectral features extracted by the convolution neural networks. However, existing methods mainly focus on the information integration between the same-level feature maps and ignore the complementary local features scattered in multi-scale layers. In this paper, we introduce an Attention-guided multi-Modal and multi-Scale Fusion (AMSF) module to simultaneously sample complementary local features scattered in multi-modal and multi-scale layers, and adaptively aggregate them with fine-grained attention to fully exploit different modalities for better multi-scale detection results. Extensive experiments are conducted on three multispectral datasets and three representative deep-learning-based detection benchmarks to show the effectiveness and generalization of the proposed method, and the state-of-the-art detection performance.

AB - Multispectral pedestrian detection provides more accurate and reliable detection results by leveraging complementary information from color-thermal modalities and has drawn much attention in the open world. Much progress has been made in the feature-level-based detection methods which aim to effectively fuse the multispectral features extracted by the convolution neural networks. However, existing methods mainly focus on the information integration between the same-level feature maps and ignore the complementary local features scattered in multi-scale layers. In this paper, we introduce an Attention-guided multi-Modal and multi-Scale Fusion (AMSF) module to simultaneously sample complementary local features scattered in multi-modal and multi-scale layers, and adaptively aggregate them with fine-grained attention to fully exploit different modalities for better multi-scale detection results. Extensive experiments are conducted on three multispectral datasets and three representative deep-learning-based detection benchmarks to show the effectiveness and generalization of the proposed method, and the state-of-the-art detection performance.

KW - Fine-grained attention

KW - Multi-modal and Multi-scale fusion

KW - Multispectral pedestrian detection

UR - http://www.scopus.com/inward/record.url?scp=85142674683&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-18907-4_30

DO - 10.1007/978-3-031-18907-4_30

M3 - Conference contribution

AN - SCOPUS:85142674683

SN - 9783031189067

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 382

EP - 393

BT - Pattern Recognition and Computer Vision - 5th Chinese Conference, PRCV 2022, Proceedings

A2 - Yu, Shiqi

A2 - Zhang, Jianguo

A2 - Zhang, Zhaoxiang

A2 - Tan, Tieniu

A2 - Yuen, Pong C.

A2 - Guo, Yike

A2 - Han, Junwei

A2 - Lai, Jianhuang

PB - Springer Science and Business Media Deutschland GmbH

T2 - 5th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2022

Y2 - 4 November 2022 through 7 November 2022

ER -

Bao W, Huang M, Hu J, Xiang X. Attention-Guided Multi-modal and Multi-scale Fusion for Multispectral Pedestrian Detection. 在 Yu S, Zhang J, Zhang Z, Tan T, Yuen PC, Guo Y, Han J, Lai J, 编辑, Pattern Recognition and Computer Vision - 5th Chinese Conference, PRCV 2022, Proceedings. Springer Science and Business Media Deutschland GmbH. 2022. 页码 382-393. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-031-18907-4_30

Attention-Guided Multi-modal and Multi-scale Fusion for Multispectral Pedestrian Detection

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此