TY - JOUR
T1 - Enhanced Spectral–Spatial Fusion Network for Multispectral Object Detection in Ground-Aerial Images
AU - Xu, Fengxiang
AU - Xu, Tingfa
AU - Hong, Lang
AU - Peng, Peiran
AU - Guo, Jiaxin
AU - Li, Jianan
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - In recent years, multispectral object detection technology has gained widespread attention due to its significant performance in detecting objects with similar colors or textures in complex environments. The mainstream methods adopt the fusion of visible light (RGB) and thermal images to make up for the shortcomings of a single modality, improving detection accuracy. However, most methods fail to take the inherent differences between modalities into account. RGB and thermal images differ in object attributes, which may lead to inconsistent contribution of each modality to the fusion features. Hence, equally feeding them into the feature extractor will limit the expressiveness of the fusion features. In order to reasonably utilize complementary information cues of each modality, an effective cross-modality feature fusion network is proposed in this letter. It comprises a spectral–spatial enhancing (SSE) module and a feature fusion module via Transformer [fast Fourier transform (FFT)]. For dual-modality data, in the aerial dataset, our model’s detection accuracy metrics, mAP50 and mean average precision (mAP), are, respectively, improved by 1.2% and 0.9% compared to the best dual-modality network. Comprehensive experiments on both ground and aerial datasets demonstrate that our approach outperforms existing methods. The achievements are of great significance for enhancing the robustness and accuracy of multispectral object detection.
AB - In recent years, multispectral object detection technology has gained widespread attention due to its significant performance in detecting objects with similar colors or textures in complex environments. The mainstream methods adopt the fusion of visible light (RGB) and thermal images to make up for the shortcomings of a single modality, improving detection accuracy. However, most methods fail to take the inherent differences between modalities into account. RGB and thermal images differ in object attributes, which may lead to inconsistent contribution of each modality to the fusion features. Hence, equally feeding them into the feature extractor will limit the expressiveness of the fusion features. In order to reasonably utilize complementary information cues of each modality, an effective cross-modality feature fusion network is proposed in this letter. It comprises a spectral–spatial enhancing (SSE) module and a feature fusion module via Transformer [fast Fourier transform (FFT)]. For dual-modality data, in the aerial dataset, our model’s detection accuracy metrics, mAP50 and mean average precision (mAP), are, respectively, improved by 1.2% and 0.9% compared to the best dual-modality network. Comprehensive experiments on both ground and aerial datasets demonstrate that our approach outperforms existing methods. The achievements are of great significance for enhancing the robustness and accuracy of multispectral object detection.
KW - Cross-modality
KW - feature fusion
KW - multispectral object detection
KW - spectral–spatial
UR - https://www.scopus.com/pages/publications/85202758133
U2 - 10.1109/LGRS.2024.3440045
DO - 10.1109/LGRS.2024.3440045
M3 - Article
AN - SCOPUS:85202758133
SN - 1545-598X
VL - 21
JO - IEEE Geoscience and Remote Sensing Letters
JF - IEEE Geoscience and Remote Sensing Letters
M1 - 5005005
ER -