HAFNet: Hierarchical Attentive Fusion Network for Multispectral Pedestrian Detection

Peiran Peng; Tingfa Xu; Bo Huang; Jianan Li

doi:10.3390/rs15082041

HAFNet: Hierarchical Attentive Fusion Network for Multispectral Pedestrian Detection

Peiran Peng, Tingfa Xu, Bo Huang, Jianan Li^*

^*Corresponding author for this work

School of Optics and Photonics

Research output: Contribution to journal › Article › peer-review

6 Citations (Scopus)

Abstract

Multispectral pedestrian detection via visible and thermal image pairs has received widespread attention in recent years. It provides a promising multi-modality solution to address the challenges of pedestrian detection in low-light environments and occlusion situations. Most existing methods directly blend the results of the two modalities or combine the visible and thermal features via a linear interpolation. However, such fusion strategies tend to extract coarser features corresponding to the positions of different modalities, which may lead to degraded detection performance. To mitigate this, this paper proposes a novel and adaptive cross-modality fusion framework, named Hierarchical Attentive Fusion Network (HAFNet), which fully exploits the multispectral attention knowledge to inspire pedestrian detection in the decision-making process. Concretely, we introduce a Hierarchical Content-dependent Attentive Fusion (HCAF) module to extract top-level features as a guide to pixel-wise blending features of two modalities to enhance the quality of the feature representation and a plug-in multi-modality feature alignment (MFA) block to fine-tune the feature alignment of two modalities. Experiments on the challenging KAIST and CVC-14 datasets demonstrate the superior performance of our method with satisfactory speed.

Original language	English
Article number	2041
Journal	Remote Sensing
Volume	15
Issue number	8
DOIs	https://doi.org/10.3390/rs15082041
Publication status	Published - Apr 2023

Keywords

content-dependent
feature alignment
multispectral pedestrian detection

Access to Document

10.3390/rs15082041

Cite this

@article{8280eeddb0874a94a50526ea9f6d95c7,

title = "HAFNet: Hierarchical Attentive Fusion Network for Multispectral Pedestrian Detection",

abstract = "Multispectral pedestrian detection via visible and thermal image pairs has received widespread attention in recent years. It provides a promising multi-modality solution to address the challenges of pedestrian detection in low-light environments and occlusion situations. Most existing methods directly blend the results of the two modalities or combine the visible and thermal features via a linear interpolation. However, such fusion strategies tend to extract coarser features corresponding to the positions of different modalities, which may lead to degraded detection performance. To mitigate this, this paper proposes a novel and adaptive cross-modality fusion framework, named Hierarchical Attentive Fusion Network (HAFNet), which fully exploits the multispectral attention knowledge to inspire pedestrian detection in the decision-making process. Concretely, we introduce a Hierarchical Content-dependent Attentive Fusion (HCAF) module to extract top-level features as a guide to pixel-wise blending features of two modalities to enhance the quality of the feature representation and a plug-in multi-modality feature alignment (MFA) block to fine-tune the feature alignment of two modalities. Experiments on the challenging KAIST and CVC-14 datasets demonstrate the superior performance of our method with satisfactory speed.",

keywords = "content-dependent, feature alignment, multispectral pedestrian detection",

author = "Peiran Peng and Tingfa Xu and Bo Huang and Jianan Li",

note = "Publisher Copyright: {\textcopyright} 2023 by the authors.",

year = "2023",

month = apr,

doi = "10.3390/rs15082041",

language = "English",

volume = "15",

journal = "Remote Sensing",

issn = "2072-4292",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "8",

}

TY - JOUR

T1 - HAFNet

T2 - Hierarchical Attentive Fusion Network for Multispectral Pedestrian Detection

AU - Peng, Peiran

AU - Xu, Tingfa

AU - Huang, Bo

AU - Li, Jianan

PY - 2023/4

Y1 - 2023/4

N2 - Multispectral pedestrian detection via visible and thermal image pairs has received widespread attention in recent years. It provides a promising multi-modality solution to address the challenges of pedestrian detection in low-light environments and occlusion situations. Most existing methods directly blend the results of the two modalities or combine the visible and thermal features via a linear interpolation. However, such fusion strategies tend to extract coarser features corresponding to the positions of different modalities, which may lead to degraded detection performance. To mitigate this, this paper proposes a novel and adaptive cross-modality fusion framework, named Hierarchical Attentive Fusion Network (HAFNet), which fully exploits the multispectral attention knowledge to inspire pedestrian detection in the decision-making process. Concretely, we introduce a Hierarchical Content-dependent Attentive Fusion (HCAF) module to extract top-level features as a guide to pixel-wise blending features of two modalities to enhance the quality of the feature representation and a plug-in multi-modality feature alignment (MFA) block to fine-tune the feature alignment of two modalities. Experiments on the challenging KAIST and CVC-14 datasets demonstrate the superior performance of our method with satisfactory speed.

AB - Multispectral pedestrian detection via visible and thermal image pairs has received widespread attention in recent years. It provides a promising multi-modality solution to address the challenges of pedestrian detection in low-light environments and occlusion situations. Most existing methods directly blend the results of the two modalities or combine the visible and thermal features via a linear interpolation. However, such fusion strategies tend to extract coarser features corresponding to the positions of different modalities, which may lead to degraded detection performance. To mitigate this, this paper proposes a novel and adaptive cross-modality fusion framework, named Hierarchical Attentive Fusion Network (HAFNet), which fully exploits the multispectral attention knowledge to inspire pedestrian detection in the decision-making process. Concretely, we introduce a Hierarchical Content-dependent Attentive Fusion (HCAF) module to extract top-level features as a guide to pixel-wise blending features of two modalities to enhance the quality of the feature representation and a plug-in multi-modality feature alignment (MFA) block to fine-tune the feature alignment of two modalities. Experiments on the challenging KAIST and CVC-14 datasets demonstrate the superior performance of our method with satisfactory speed.

KW - content-dependent

KW - feature alignment

KW - multispectral pedestrian detection

UR - http://www.scopus.com/inward/record.url?scp=85156098772&partnerID=8YFLogxK

U2 - 10.3390/rs15082041

DO - 10.3390/rs15082041

M3 - Article

AN - SCOPUS:85156098772

SN - 2072-4292

VL - 15

JO - Remote Sensing

JF - Remote Sensing

IS - 8

M1 - 2041

ER -

HAFNet: Hierarchical Attentive Fusion Network for Multispectral Pedestrian Detection

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this