TY - JOUR
T1 - PartSeg
T2 - Few-shot part segmentation via part-aware prompt learning
AU - Han, Mengya
AU - Zheng, Heliang
AU - Wang, Chaoyue
AU - Luo, Yong
AU - Hu, Han
AU - Zhang, Jing
AU - Du, Bo
N1 - Publisher Copyright:
© 2025 Elsevier Ltd
PY - 2025/6
Y1 - 2025/6
N2 - In this work, we address the task of few-shot part segmentation, which aims to segment the different parts of an unseen object using very few labeled examples. It has been found that leveraging the textual space of a powerful pre-trained image-language model, such as CLIP, can substantially enhance the learning of visual features in few-shot tasks. However, CLIP-based methods primarily focus on high-level visual features that are fully aligned with textual features representing the “summary” of the image, which often struggle to understand the concept of object parts through textual descriptions. To address this, we propose PartSeg, a novel method that learns part-aware prompts to grasp the concept of “part” and better utilize the textual space of CLIP to enhance few-shot part segmentation. Specifically, we design a part-aware prompt learning module that generates part-aware prompts, enabling the CLIP model to better understand the concept of “part” and effectively utilize its textual space. The part-aware prompt learning module includes a part-specific prompt generator that produces part-specific tokens for each part class. Furthermore, since the concept of the same part across different object categories is general, we establish relationships between these parts to estimate part-shared tokens during the prompt learning process. Finally, the part-specific and part-shared tokens, along with the textual tokens encoded from textual descriptions of parts (i.e., part labels), are combined to form the part-aware prompt used to generate textual prototypes for segmentation. We conduct extensive experiments on the PartImageNet and Pascal_Part datasets, and the results demonstrate that our proposed method achieves state-of-the-art performance.
AB - In this work, we address the task of few-shot part segmentation, which aims to segment the different parts of an unseen object using very few labeled examples. It has been found that leveraging the textual space of a powerful pre-trained image-language model, such as CLIP, can substantially enhance the learning of visual features in few-shot tasks. However, CLIP-based methods primarily focus on high-level visual features that are fully aligned with textual features representing the “summary” of the image, which often struggle to understand the concept of object parts through textual descriptions. To address this, we propose PartSeg, a novel method that learns part-aware prompts to grasp the concept of “part” and better utilize the textual space of CLIP to enhance few-shot part segmentation. Specifically, we design a part-aware prompt learning module that generates part-aware prompts, enabling the CLIP model to better understand the concept of “part” and effectively utilize its textual space. The part-aware prompt learning module includes a part-specific prompt generator that produces part-specific tokens for each part class. Furthermore, since the concept of the same part across different object categories is general, we establish relationships between these parts to estimate part-shared tokens during the prompt learning process. Finally, the part-specific and part-shared tokens, along with the textual tokens encoded from textual descriptions of parts (i.e., part labels), are combined to form the part-aware prompt used to generate textual prototypes for segmentation. We conduct extensive experiments on the PartImageNet and Pascal_Part datasets, and the results demonstrate that our proposed method achieves state-of-the-art performance.
KW - Few-shot part segmentation
KW - Part-aware prompt learning
KW - Part-shared tokens
KW - Part-specific tokens
KW - Pre-trained image-language model
UR - http://www.scopus.com/inward/record.url?scp=85214793690&partnerID=8YFLogxK
U2 - 10.1016/j.patcog.2024.111326
DO - 10.1016/j.patcog.2024.111326
M3 - Article
AN - SCOPUS:85214793690
SN - 0031-3203
VL - 162
JO - Pattern Recognition
JF - Pattern Recognition
M1 - 111326
ER -