TY - JOUR
T1 - CGM2Net
T2 - Cloud-Guided Mamba-Based Multitask Network for Multimodal Remote Sensing Semantic Segmentation and Cloud Removal
AU - Li, Jianhao
AU - Wang, Jue
AU - Shi, Hao
AU - Dong, Shan
AU - Liu, Wenchao
AU - Chen, Liang
AU - Li, Wei
N1 - Publisher Copyright:
© 2008-2012 IEEE.
PY - 2026
Y1 - 2026
N2 - Semantic segmentation and cloud removal via optical-Synthetic Aperture Radar (SAR) fusion are crucial tasks in Earth observation. Existing works have proposed multitask architectures to simultaneously address segmentation and cloud removal, but they lack targeted modeling of under-cloud features, leading to blurred ground-object reconstruction and poor fine-grained recognition performance in thick-cloud regions. To address these challenges, this article proposes a Cloud-Guided Mamba-based Multitask Network (CGM2Net) for end-to-end multimodal segmentation and cloud removal. Specifically, CGM2Net utilizes a dual-stream encoder and integrates three key modules at each stage: cloud-guided feature restoration module, which, under cloud-mask-guided gating, exploits spatial- and channel-wise cross-modal correlations to reconstruct sub-cloud optical features and suppress SAR speckle; cloud-guided semantic interaction module, which performs mask-aware state-space modulation to enable selective, region-aware cross-modal semantic exchange; and Mamba-Based Fusion module, which adaptively fuses enhanced multimodal features to fully exploit modal complementary information. Two task-specific decoders synergistically optimize segmentation and cloud removal, thereby promoting mutual enhancement between the semantic prior and visual restoration. Experiments on M3M-CR and LuojiaSET-OSFCR demonstrate that CGM2Net achieves state-of-the-art performance on both tasks. Ablation studies and feature visualizations further validate the complementary roles and effectiveness of the proposed modules.
AB - Semantic segmentation and cloud removal via optical-Synthetic Aperture Radar (SAR) fusion are crucial tasks in Earth observation. Existing works have proposed multitask architectures to simultaneously address segmentation and cloud removal, but they lack targeted modeling of under-cloud features, leading to blurred ground-object reconstruction and poor fine-grained recognition performance in thick-cloud regions. To address these challenges, this article proposes a Cloud-Guided Mamba-based Multitask Network (CGM2Net) for end-to-end multimodal segmentation and cloud removal. Specifically, CGM2Net utilizes a dual-stream encoder and integrates three key modules at each stage: cloud-guided feature restoration module, which, under cloud-mask-guided gating, exploits spatial- and channel-wise cross-modal correlations to reconstruct sub-cloud optical features and suppress SAR speckle; cloud-guided semantic interaction module, which performs mask-aware state-space modulation to enable selective, region-aware cross-modal semantic exchange; and Mamba-Based Fusion module, which adaptively fuses enhanced multimodal features to fully exploit modal complementary information. Two task-specific decoders synergistically optimize segmentation and cloud removal, thereby promoting mutual enhancement between the semantic prior and visual restoration. Experiments on M3M-CR and LuojiaSET-OSFCR demonstrate that CGM2Net achieves state-of-the-art performance on both tasks. Ablation studies and feature visualizations further validate the complementary roles and effectiveness of the proposed modules.
KW - Cloud removal
KW - multimodal segmentation
KW - multitask learning
KW - remote sensing data
UR - https://www.scopus.com/pages/publications/105036341745
U2 - 10.1109/JSTARS.2026.3684570
DO - 10.1109/JSTARS.2026.3684570
M3 - Article
AN - SCOPUS:105036341745
SN - 1939-1404
VL - 19
SP - 14358
EP - 14374
JO - IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
JF - IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
ER -