TY - JOUR
T1 - Encouraging the Mutual Interact between Dataset-Level and Image-Level Context for Semantic Segmentation of Remote Sensing Image
AU - An, Ke
AU - Wang, Yupei
AU - Chen, Liang
N1 - Publisher Copyright:
© 1980-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - Recently, semantic segmentation of remote sensing images has witnessed rapid advancement with the adoption of deep neural networks. Contextual cues, referring to the long-range correlation between pixels, are crucial for achieving accurate segmentation results, particularly for objects with less discriminative characteristics in these images. Currently, most studies are centered on incorporating contextual cues by aggregating context information at the dataset level or image level. However, current research often treats contextual cue modeling at the dataset-level and image level as independent procedures, neglecting the intrinsic correlation between these two feature levels. Consequently, the obtained contextual cues are suboptimal. This issue is particularly critical in the semantic segmentation of remote sensing images. To address this, we propose to encourage mutual interaction between dataset-level and image-level contextual cues. Firstly, we propose an interactive dataset-image context aggregation scheme to obtain complementary and consistent multilevel contextual cues. Additionally, we introduce a parallel feature interaction network (PFI-Net) that progressively extracts and fuses features across multiple layers, enabling effective integration of multilevel contexts. Furthermore, we introduce an enhanced shifted window-based cross-attention mechanism to augment model efficiency. Extensive experimental results on the widely used Vaihingen dataset, GaoFen-2 dataset, and instance segmentation in aerial images dataset (iSAID) effectively demonstrate the superiority of our proposed method over the other state-of-the-art methods.
AB - Recently, semantic segmentation of remote sensing images has witnessed rapid advancement with the adoption of deep neural networks. Contextual cues, referring to the long-range correlation between pixels, are crucial for achieving accurate segmentation results, particularly for objects with less discriminative characteristics in these images. Currently, most studies are centered on incorporating contextual cues by aggregating context information at the dataset level or image level. However, current research often treats contextual cue modeling at the dataset-level and image level as independent procedures, neglecting the intrinsic correlation between these two feature levels. Consequently, the obtained contextual cues are suboptimal. This issue is particularly critical in the semantic segmentation of remote sensing images. To address this, we propose to encourage mutual interaction between dataset-level and image-level contextual cues. Firstly, we propose an interactive dataset-image context aggregation scheme to obtain complementary and consistent multilevel contextual cues. Additionally, we introduce a parallel feature interaction network (PFI-Net) that progressively extracts and fuses features across multiple layers, enabling effective integration of multilevel contexts. Furthermore, we introduce an enhanced shifted window-based cross-attention mechanism to augment model efficiency. Extensive experimental results on the widely used Vaihingen dataset, GaoFen-2 dataset, and instance segmentation in aerial images dataset (iSAID) effectively demonstrate the superiority of our proposed method over the other state-of-the-art methods.
KW - Contextual cue
KW - remote sensing image
KW - semantic segmentation
UR - http://www.scopus.com/inward/record.url?scp=85182928868&partnerID=8YFLogxK
U2 - 10.1109/TGRS.2024.3352582
DO - 10.1109/TGRS.2024.3352582
M3 - Article
AN - SCOPUS:85182928868
SN - 0196-2892
VL - 62
SP - 1
EP - 16
JO - IEEE Transactions on Geoscience and Remote Sensing
JF - IEEE Transactions on Geoscience and Remote Sensing
M1 - 5606116
ER -