A Dual-Branch Network Based on ViT and Mamba for Semantic Segmentation of Remote Sensing Image

Ke An, Ying Wang, Liang Chen, Yupie Wang*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Semantic segmentation of remote sensing images has significant applications across various scenarios. The prevailing frameworks include Convolutional Neural Network (CNN) and Transformer. However, CNN is limited by the receptive field of convolutions, while the Transformer is constrained by computational complexity, which restricts attention calculations to local windows and fails to effectively address long-range dependency modeling. The efficient Mamba architecture, characterized by linear complexity, offers a promising solution to these challenges. Inspired by Mamba, we propose a dual-branch network based on ViT and Mamba. The Vision Transformer (ViT) branch incorporates the Swin Transformer to model spatial details while maintaining computational complexity within acceptable bounds. Complementarily, the Mamba branch efficiently captures global context and long-range dependencies. Additionally, to suppress noise and conflicting information arising from the fusion of features from different frameworks, we design the Cross-Model Fusion Module (CMFM) and the Cross-Model Relevance Loss (CMRLoss) to achieve semantic consistency in the fusion process. The comprehensive experimental findings on the commonly utilized GaoFen-2 and iSAID datasets clearly illustrate the advantages of our proposed approach compared to the leading methods in the field.

Original languageEnglish
Title of host publicationIEEE International Conference on Signal, Information and Data Processing, ICSIDP 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798331515669
DOIs
Publication statusPublished - 2024
Event2nd IEEE International Conference on Signal, Information and Data Processing, ICSIDP 2024 - Zhuhai, China
Duration: 22 Nov 202424 Nov 2024

Publication series

NameIEEE International Conference on Signal, Information and Data Processing, ICSIDP 2024

Conference

Conference2nd IEEE International Conference on Signal, Information and Data Processing, ICSIDP 2024
Country/TerritoryChina
CityZhuhai
Period22/11/2424/11/24

Keywords

  • Mamba
  • Remote Sensing Image
  • Semantic Segmentation
  • Vision Transformer

Cite this