“Car or Bus?" CLearSeg: CLIP-Enhanced Discrimination Among Resembling Classes for Few-Shot Semantic Segmentation

Anqi Zhang, Guangyu Gao*, Zhuocheng Lv, Yukun An

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Few-shot semantic segmentation aims at learning to segment query images of unseen classes with the guidance of limited segmented support examples. However, existing models tend to confuse the resembling classes (e.g., ‘car’ and ‘bus’) thus generating erroneous predictions. To address this, we propose the CLIP-enhanced discrimination among resembling classes for few-shot semantic Segmentation (CLearSeg), which leverages information beyond support images, including the class name, through Contrastive Language-Image Pretraining (CLIP), to discriminate between resembling classes. Firstly, we modify the CLIP structure and design the Sliding Attention Pooling (SAP) to construct the Text-Driven Activation (TDA) module, learning the Class-Specific Activation (CSA) maps with class names. Since the semantic information is explicitly involved by the class name, the CSA maps exhibit clear distinctions among resembling classes. Meanwhile, to enrich fine-grained features ensuring distinguishability, the Multi-Level Correlation (MLC) module is designed to extract multi-level features of support and query images and generate various correlation maps. We further applied a decoder to fuse the CSA map and correlation maps with encoded features and obtain the final prediction. Experiments on the Pascal-5i and COCO-20i datasets have shown that CLearSeg outperforms previous methods, achieving the mIoU of 69.2 % and 48.9 % for 1-shot segmentation, particularly in distinguishing objects from resembling classes.

源语言英语
主期刊名MultiMedia Modeling - 30th International Conference, MMM 2024, Proceedings
编辑Stevan Rudinac, Marcel Worring, Cynthia Liem, Alan Hanjalic, Björn Pór Jónsson, Yoko Yamakata, Bei Liu
出版商Springer Science and Business Media Deutschland GmbH
172-186
页数15
ISBN(印刷版)9783031533044
DOI
出版状态已出版 - 2024
活动30th International Conference on MultiMedia Modeling, MMM 2024 - Amsterdam, 荷兰
期限: 29 1月 20242 2月 2024

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
14554 LNCS
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议30th International Conference on MultiMedia Modeling, MMM 2024
国家/地区荷兰
Amsterdam
时期29/01/242/02/24

指纹

探究 '“Car or Bus?" CLearSeg: CLIP-Enhanced Discrimination Among Resembling Classes for Few-Shot Semantic Segmentation' 的科研主题。它们共同构成独一无二的指纹。

引用此