LCCo: Lending CLIP to co-segmentation

Xin Duan, Yan Yang, Liyuan Pan*, Xiabi Liu

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

摘要

This paper studies co-segmenting common semantic objects in a set of images. Existing works either rely on carefully engineered networks to mine implicit semantics in visual features or require extra data (i.e., classification labels) for training. In this paper, we leverage the contrastive language-image pre-training framework (CLIP) for the task. With a backbone segmentation network that processes each image from the set, we introduce semantics from CLIP into the backbone features, refining them in a coarse-to-fine manner with three key modules: (i) an image set feature correspondence module, encoding global consistent semantics of the image set; (ii) a CLIP interaction module, using CLIP-mined common semantics of the image set to refine the backbone feature; (iii) a CLIP regularization module, drawing CLIP towards co-segmentation, identifying and using the best CLIP semantic to regularize the backbone feature. Experiments on four standard co-segmentation benchmark datasets show that our method outperforms state-of-the-art methods.

源语言英语
文章编号111252
期刊Pattern Recognition
161
DOI
出版状态已出版 - 5月 2025

指纹

探究 'LCCo: Lending CLIP to co-segmentation' 的科研主题。它们共同构成独一无二的指纹。

引用此