S2Net: Spatial-Aligned and Semantic-Discriminative Network for Remote Sensing Object Detection

Research output: Contribution to journalArticlepeer-review

Abstract

With the development of remote sensing technology in recent times, those geographically located objects such as vehicles, ships, airplanes, and oil tanks have become one of the most important information resources in many civilian applications. However, such remote sensing images (RSIs) are still struggling with the problems of huge scale variation and limited semantic details. To tackle these challenges, a spatial-aligned and semantic-discriminative network, S2Net, is proposed. First, grouped spatial attention-based fusion (GSAF) is designed for balanced and spatial-aware multiscale feature fusion. Second, a loss function for box regression, edge-ratio IoU (ER-IoU) loss function, is presented to improve convergence speed and localization accuracy, which explicitly optimizes the edge-to-edge alignment between anchor and ground-truth boxes, followed by aspect ratio calibration. Finally, semantic-adaptive contrastive learning (SACL) is proposed, which brings similar instances closer together and pushes dissimilar instances farther apart with straightforward alignment of the content queries. Experiments on three public datasets NWPU VHR-10, DIOR, and RSOD demonstrate that S2Net achieves superb accuracy benefits compared with recent advanced methods, attaining 96.3%, 81.4%, and 96.8%, respectively.

Original languageEnglish
Article number5648018
JournalIEEE Transactions on Geoscience and Remote Sensing
Volume63
DOIs
Publication statusPublished - 2025

Keywords

  • Box regression
  • contrastive learning
  • detection transformer (DETR)
  • multiscale feature fusion
  • remote sensing object detection (RSOD)

Fingerprint

Dive into the research topics of 'S2Net: Spatial-Aligned and Semantic-Discriminative Network for Remote Sensing Object Detection'. Together they form a unique fingerprint.

Cite this