跳到主要导航 跳到搜索 跳到主要内容

Soft-Guided Open-Vocabulary Semantic Segmentation of Remote Sensing Images

  • Beijing Institute of Technology
  • National Key Laboratory of Science and Technology on Space-Born Intelligent Information Processing

科研成果: 期刊稿件文章同行评审

摘要

Open-vocabulary remote sensing (RS) semantic segmentation strives to assign both seen and unseen class labels to individual pixels in RS images. Existing models follow the “fine-tune” paradigm based on vision-language models (VLMs). However, as VLMs are predominantly tailored to natural scenes, these directly fine-tuned models often collapse into the seen categories and show insensitivity in perceiving RS semantic cues. This critical issue of model collapse is closely related to the misalignment between image and text, making them struggle with the unique challenges of RS images, such as complex and diverse scenes, and objects with significant scale differences. To this end, we propose a soft-guided open-vocabulary RS semantic segmentation framework, which is the first to explore how to softly adapt VLMs to the downstream task of semantic segmentation for RS images. Concretely, instead of directly fine-tuning, we introduce a generalization compensation strategy, which employs an additional frozen VLM encoder to provide implicit semantic guidance for dynamic optimization of visual representation. By introducing prior knowledge from the frozen encoder, this soft strategy compensates potential losses incurred during fine-tuning, thus enhancing the model’s pixel-level perceptual alignment while avoiding model collapse. Afterward, to optimize the sensitivity of VLMs’ textual and visual embeddings to RS semantic information, bias-guided image–text collaborative optimization is presented to achieve a bilateral interaction of semantic information with the guidance of RS scenes’ Bias. Finally, an improved upsampling decoder is employed to obtain the progressive refinement and calibration of the cost map through the integration of multiscale information and textual embeddings. Extensive experiments demonstrate that our method achieves state-of-the-art performance on widely used challenging benchmarks.

源语言英语
文章编号5652216
期刊IEEE Transactions on Geoscience and Remote Sensing
63
DOI
出版状态已出版 - 2025

指纹

探究 'Soft-Guided Open-Vocabulary Semantic Segmentation of Remote Sensing Images' 的科研主题。它们共同构成独一无二的指纹。

引用此