TY - JOUR
T1 - TranSTD
T2 - A Wavelet-Driven Transformer-Based SAR Target Detection Framework With Adaptive Feature Enhancement and Fusion
AU - Xi, Bobo
AU - Chen, Jiaqi
AU - Huang, Yan
AU - Li, Jiaojiao
AU - Li, Yunsong
AU - Li, Zan
AU - Xia, Xiang Gen
N1 - Publisher Copyright:
© 2008-2012 IEEE.
PY - 2026
Y1 - 2026
N2 - Target detection in Synthetic Aperture Radar (SAR) images is of great importance in civilian monitoring and military reconnaissance. However, the unique speckle noise inherent in SAR images leads to semantic information loss, while traditional convolutional neural network downsampling methods exacerbate this issue, impacting detection accuracy and robustness. Moreover, some dense target scenarios and weak scattering features of targets make it challenging to achieve sufficient feature discriminability, adding complexity to the detection task. In addition, the multiscale characteristic of SAR targets presents difficulties in balancing detection performance with computational efficiency in complex scenes. To tackle these difficulties, this article introduces a wavelet-driven transformer-based SAR target detection framework called TranSTD. Specifically, it incorporates the Haar wavelet dynamic downsampling and semantic preserving dynamic downsampling modules, which effectively suppress noise and preserve semantic information using techniques such as Haar wavelet denoise and input-driven dynamic pooling downsampling. Furthermore, the SAR adaptive convolution (SAC) bottleneck is proposed for enhancing the discrimination of features. To optimize performance and efficiency across varying scene complexities, a multiscale SAR attention fusion encoder is developed. Extensive experiments are carried out on three datasets, showing that our proposed algorithm outperforms the current state-of-the-art benchmarks in SAR target detection, offering a robust solution for the detection of targets in complex SAR scenes.
AB - Target detection in Synthetic Aperture Radar (SAR) images is of great importance in civilian monitoring and military reconnaissance. However, the unique speckle noise inherent in SAR images leads to semantic information loss, while traditional convolutional neural network downsampling methods exacerbate this issue, impacting detection accuracy and robustness. Moreover, some dense target scenarios and weak scattering features of targets make it challenging to achieve sufficient feature discriminability, adding complexity to the detection task. In addition, the multiscale characteristic of SAR targets presents difficulties in balancing detection performance with computational efficiency in complex scenes. To tackle these difficulties, this article introduces a wavelet-driven transformer-based SAR target detection framework called TranSTD. Specifically, it incorporates the Haar wavelet dynamic downsampling and semantic preserving dynamic downsampling modules, which effectively suppress noise and preserve semantic information using techniques such as Haar wavelet denoise and input-driven dynamic pooling downsampling. Furthermore, the SAR adaptive convolution (SAC) bottleneck is proposed for enhancing the discrimination of features. To optimize performance and efficiency across varying scene complexities, a multiscale SAR attention fusion encoder is developed. Extensive experiments are carried out on three datasets, showing that our proposed algorithm outperforms the current state-of-the-art benchmarks in SAR target detection, offering a robust solution for the detection of targets in complex SAR scenes.
KW - Dynamic downsampling
KW - multiscale SAR attention fusion encoder (MSAF)
KW - target detection
KW - wavelet denoise
UR - https://www.scopus.com/pages/publications/105023894893
U2 - 10.1109/JSTARS.2025.3639785
DO - 10.1109/JSTARS.2025.3639785
M3 - Article
AN - SCOPUS:105023894893
SN - 1939-1404
VL - 19
SP - 1197
EP - 1211
JO - IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
JF - IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
ER -