DM-PCL: Text-Driven Dual-Modal Prototype Consistency Learning for Weakly-Supervised Few-Shot Part Segmentation

  • Mengya Han
  • , Yong Luo*
  • , Han Hu
  • , Zengmao Wang
  • , Lefei Zhang
  • , Bo Du
  • , Ling Yu Duan
  • , Dacheng Tao
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Few-shot part segmentation is essential for fine-grained visual understanding, but it remains challenging in the absence of pixel-level annotations. This motivates us to introduce a more practical task setting, weakly-supervised few-shot part segmentation, where only part-level textual labels (e.g., textual part descriptions) are provided for support images. This setting is quite challenging due to the semantic-visual gap and lack of pixel-level supervision. To address this challenge, we propose text-driven dual-modal prototype consistency learning (DM-PCL), which predicts pseudo masks for both support and query images using part-level textual labels and learns consistent part prototypes across diverse images and modalities to facilitate accurate part segmentation. Specifically, DM-PCL introduces: (i) a pseudo mask generation (PMG) module, which generates pseudo masks by comparing image features with textual part prototypes derived from part-level textual labels; (ii) a text-driven spatial interaction (TSI) module that enriches visual features with semantic knowledge to enhance part perception; and (iii) a dual-modal prototype consistency learning (DPCL) module that enforces consistency between part prototypes across different images and modalities. Final segmentation is performed by comparing query features with both visual and textual part prototypes via a dual-modal cooperative segmentation strategy. Extensive experiments on benchmark datasets demonstrate that our method significantly outperforms existing approaches, achieving the state-of-the-art performance in weakly-supervised few-shot part segmentation.

Original languageEnglish
Pages (from-to)7553-7569
Number of pages17
JournalInternational Journal of Computer Vision
Volume133
Issue number11
DOIs
Publication statusPublished - Nov 2025
Externally publishedYes

Keywords

  • Dual-modal cooperative segmentation
  • Dual-modal prototype consistency learning
  • Few-shot part segmentation
  • Text-driven spatial interaction
  • Weakly-supervised

Fingerprint

Dive into the research topics of 'DM-PCL: Text-Driven Dual-Modal Prototype Consistency Learning for Weakly-Supervised Few-Shot Part Segmentation'. Together they form a unique fingerprint.

Cite this