TY - JOUR
T1 - S2Net
T2 - Spatial-Aligned and Semantic-Discriminative Network for Remote Sensing Object Detection
AU - Yao, Jiayu
AU - Chen, He
AU - Xie, Yizhuang
AU - Zhang, Ning
AU - Yang, Mingxu
AU - Chen, Liang
N1 - Publisher Copyright:
© 1980-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - With the development of remote sensing technology in recent times, those geographically located objects such as vehicles, ships, airplanes, and oil tanks have become one of the most important information resources in many civilian applications. However, such remote sensing images (RSIs) are still struggling with the problems of huge scale variation and limited semantic details. To tackle these challenges, a spatial-aligned and semantic-discriminative network, S2Net, is proposed. First, grouped spatial attention-based fusion (GSAF) is designed for balanced and spatial-aware multiscale feature fusion. Second, a loss function for box regression, edge-ratio IoU (ER-IoU) loss function, is presented to improve convergence speed and localization accuracy, which explicitly optimizes the edge-to-edge alignment between anchor and ground-truth boxes, followed by aspect ratio calibration. Finally, semantic-adaptive contrastive learning (SACL) is proposed, which brings similar instances closer together and pushes dissimilar instances farther apart with straightforward alignment of the content queries. Experiments on three public datasets NWPU VHR-10, DIOR, and RSOD demonstrate that S2Net achieves superb accuracy benefits compared with recent advanced methods, attaining 96.3%, 81.4%, and 96.8%, respectively.
AB - With the development of remote sensing technology in recent times, those geographically located objects such as vehicles, ships, airplanes, and oil tanks have become one of the most important information resources in many civilian applications. However, such remote sensing images (RSIs) are still struggling with the problems of huge scale variation and limited semantic details. To tackle these challenges, a spatial-aligned and semantic-discriminative network, S2Net, is proposed. First, grouped spatial attention-based fusion (GSAF) is designed for balanced and spatial-aware multiscale feature fusion. Second, a loss function for box regression, edge-ratio IoU (ER-IoU) loss function, is presented to improve convergence speed and localization accuracy, which explicitly optimizes the edge-to-edge alignment between anchor and ground-truth boxes, followed by aspect ratio calibration. Finally, semantic-adaptive contrastive learning (SACL) is proposed, which brings similar instances closer together and pushes dissimilar instances farther apart with straightforward alignment of the content queries. Experiments on three public datasets NWPU VHR-10, DIOR, and RSOD demonstrate that S2Net achieves superb accuracy benefits compared with recent advanced methods, attaining 96.3%, 81.4%, and 96.8%, respectively.
KW - Box regression
KW - contrastive learning
KW - detection transformer (DETR)
KW - multiscale feature fusion
KW - remote sensing object detection (RSOD)
UR - https://www.scopus.com/pages/publications/105019681550
U2 - 10.1109/TGRS.2025.3622254
DO - 10.1109/TGRS.2025.3622254
M3 - Article
AN - SCOPUS:105019681550
SN - 0196-2892
VL - 63
JO - IEEE Transactions on Geoscience and Remote Sensing
JF - IEEE Transactions on Geoscience and Remote Sensing
M1 - 5648018
ER -