HAFNet: Hierarchical Attentive Fusion Network for Multispectral Pedestrian Detection

Peiran Peng, Tingfa Xu, Bo Huang, Jianan Li*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

6 Citations (Scopus)

Abstract

Multispectral pedestrian detection via visible and thermal image pairs has received widespread attention in recent years. It provides a promising multi-modality solution to address the challenges of pedestrian detection in low-light environments and occlusion situations. Most existing methods directly blend the results of the two modalities or combine the visible and thermal features via a linear interpolation. However, such fusion strategies tend to extract coarser features corresponding to the positions of different modalities, which may lead to degraded detection performance. To mitigate this, this paper proposes a novel and adaptive cross-modality fusion framework, named Hierarchical Attentive Fusion Network (HAFNet), which fully exploits the multispectral attention knowledge to inspire pedestrian detection in the decision-making process. Concretely, we introduce a Hierarchical Content-dependent Attentive Fusion (HCAF) module to extract top-level features as a guide to pixel-wise blending features of two modalities to enhance the quality of the feature representation and a plug-in multi-modality feature alignment (MFA) block to fine-tune the feature alignment of two modalities. Experiments on the challenging KAIST and CVC-14 datasets demonstrate the superior performance of our method with satisfactory speed.

Original languageEnglish
Article number2041
JournalRemote Sensing
Volume15
Issue number8
DOIs
Publication statusPublished - Apr 2023

Keywords

  • content-dependent
  • feature alignment
  • multispectral pedestrian detection

Fingerprint

Dive into the research topics of 'HAFNet: Hierarchical Attentive Fusion Network for Multispectral Pedestrian Detection'. Together they form a unique fingerprint.

Cite this