TY - JOUR
T1 - SFAF-MA
T2 - Spatial Feature Aggregation and Fusion with Modality Adaptation for RGB-Thermal Semantic Segmentation
AU - He, Xunjie
AU - Wang, Meiling
AU - Liu, Tong
AU - Zhao, Lin
AU - Yue, Yufeng
N1 - Publisher Copyright:
© 1963-2012 IEEE.
PY - 2023
Y1 - 2023
N2 - The fusion of red, green, blue (RGB) and thermal images has profound implications for the semantic segmentation of challenging urban scenes, such as those with poor illumination. Nevertheless, existing RGB-Thermal (RGB-T) fusion networks pay less attention to modality differences, i.e., RGB and thermal images are commonly fused with fixed weights. In addition, spatial context details are lost during regular extraction operations, inevitably leading to imprecise object segmentation. To improve the segmentation accuracy, a novel network named spatial feature aggregation and fusion with modality adaptation (SFAF-MA) is proposed in this article. The modality difference adaptive fusion (MDAF) module is introduced to adaptively fuse RGB and thermal images with corresponding weights generated from an attention mechanism. In addition, the spatial semantic fusion (SSF) module is designed to tap into more information by capturing multiscale perceptive fields with dilated convolutions of different rates, and aggregate shallower-level features with rich visual information and deeper-level features with strong semantics. Compared with existing methods on the public MFNet dataset and PST900 dataset, the proposed network significantly improves the segmentation effectiveness. The code is available at https://github.com/hexunjie/SFAF-MA.
AB - The fusion of red, green, blue (RGB) and thermal images has profound implications for the semantic segmentation of challenging urban scenes, such as those with poor illumination. Nevertheless, existing RGB-Thermal (RGB-T) fusion networks pay less attention to modality differences, i.e., RGB and thermal images are commonly fused with fixed weights. In addition, spatial context details are lost during regular extraction operations, inevitably leading to imprecise object segmentation. To improve the segmentation accuracy, a novel network named spatial feature aggregation and fusion with modality adaptation (SFAF-MA) is proposed in this article. The modality difference adaptive fusion (MDAF) module is introduced to adaptively fuse RGB and thermal images with corresponding weights generated from an attention mechanism. In addition, the spatial semantic fusion (SSF) module is designed to tap into more information by capturing multiscale perceptive fields with dilated convolutions of different rates, and aggregate shallower-level features with rich visual information and deeper-level features with strong semantics. Compared with existing methods on the public MFNet dataset and PST900 dataset, the proposed network significantly improves the segmentation effectiveness. The code is available at https://github.com/hexunjie/SFAF-MA.
KW - Attention mechanism
KW - RGB-T semantic segmentation
KW - multimodal fusion
KW - spatial feature aggregation
UR - http://www.scopus.com/inward/record.url?scp=85153509843&partnerID=8YFLogxK
U2 - 10.1109/TIM.2023.3267529
DO - 10.1109/TIM.2023.3267529
M3 - Article
AN - SCOPUS:85153509843
SN - 0018-9456
VL - 72
JO - IEEE Transactions on Instrumentation and Measurement
JF - IEEE Transactions on Instrumentation and Measurement
M1 - 5012810
ER -