Multi-Scale interaction and enhancement network for referring camouflaged objects image segmentation

  • Qiyang Sun
  • , Xin Zhang
  • , Xia Wang*
  • , Shiwei Xu
  • , Yuyang Li
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Camouflaged Objects Detection (COD) aims to identify objects seamlessly blending into their surrounding environments. Existing COD methods treat COD as a binary segmentation problem based on Salient Object Detection techniques, which separate objects from the background. While these methods have been widely applied, their inability to identify object categories limits the scope of applications. Moreover, multi-target selection and localization rely heavily on expert-driven post-processing, resulting in poor interactivity. To address these limitations, we reformulate COD as a Referring Image Segmentation (RIS) challenge, enabling precise localization and segmentation of language-specified objects through natural language instructions. Accordingly, this paper proposes a novel RIS framework named MSIENet for the COD task, which integrates a language encoder, an image encoder, and a multi-modal fusion module. This framework bridges the modality gap between visual and linguistic features through a cross-attention-based fusion and alignment module. MSIENet also contains two key components: multi-scale edge enhancement and texture enhancement modules, which effectively aggregate and refine texture details and boundary information, facilitating the generation of high-quality segmentation masks. We also collect a Language-image camouflaged dataset Ref-ACOD, establishing a rigorous evaluation benchmark for COD tasks based on RIS approaches. Experiments demonstrate that the MSIENet surpasses SOTA RIS methods on COD tasks, with MIoUs and OIoUs on LAVT increasing by 8 % and 14.5 %. All datasets are available at http://github.com/samsunq/Ref-ACOD.git

Original languageEnglish
Article number114883
JournalKnowledge-Based Systems
Volume331
DOIs
Publication statusPublished - 3 Dec 2025
Externally publishedYes

Keywords

  • Camouflaged objects detection
  • Cross-attention
  • Multi-scale features
  • Multi-task learning
  • Referring image segmentation

Fingerprint

Dive into the research topics of 'Multi-Scale interaction and enhancement network for referring camouflaged objects image segmentation'. Together they form a unique fingerprint.

Cite this