SFAF-MA: Spatial Feature Aggregation and Fusion with Modality Adaptation for RGB-Thermal Semantic Segmentation

Xunjie He, Meiling Wang, Tong Liu*, Lin Zhao, Yufeng Yue

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

18 Citations (Scopus)

Abstract

The fusion of red, green, blue (RGB) and thermal images has profound implications for the semantic segmentation of challenging urban scenes, such as those with poor illumination. Nevertheless, existing RGB-Thermal (RGB-T) fusion networks pay less attention to modality differences, i.e., RGB and thermal images are commonly fused with fixed weights. In addition, spatial context details are lost during regular extraction operations, inevitably leading to imprecise object segmentation. To improve the segmentation accuracy, a novel network named spatial feature aggregation and fusion with modality adaptation (SFAF-MA) is proposed in this article. The modality difference adaptive fusion (MDAF) module is introduced to adaptively fuse RGB and thermal images with corresponding weights generated from an attention mechanism. In addition, the spatial semantic fusion (SSF) module is designed to tap into more information by capturing multiscale perceptive fields with dilated convolutions of different rates, and aggregate shallower-level features with rich visual information and deeper-level features with strong semantics. Compared with existing methods on the public MFNet dataset and PST900 dataset, the proposed network significantly improves the segmentation effectiveness. The code is available at https://github.com/hexunjie/SFAF-MA.

Original languageEnglish
Article number5012810
JournalIEEE Transactions on Instrumentation and Measurement
Volume72
DOIs
Publication statusPublished - 2023

Keywords

  • Attention mechanism
  • RGB-T semantic segmentation
  • multimodal fusion
  • spatial feature aggregation

Fingerprint

Dive into the research topics of 'SFAF-MA: Spatial Feature Aggregation and Fusion with Modality Adaptation for RGB-Thermal Semantic Segmentation'. Together they form a unique fingerprint.

Cite this