“Car or Bus?" CLearSeg: CLIP-Enhanced Discrimination Among Resembling Classes for Few-Shot Semantic Segmentation

Anqi Zhang, Guangyu Gao*, Zhuocheng Lv, Yukun An

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Few-shot semantic segmentation aims at learning to segment query images of unseen classes with the guidance of limited segmented support examples. However, existing models tend to confuse the resembling classes (e.g., ‘car’ and ‘bus’) thus generating erroneous predictions. To address this, we propose the CLIP-enhanced discrimination among resembling classes for few-shot semantic Segmentation (CLearSeg), which leverages information beyond support images, including the class name, through Contrastive Language-Image Pretraining (CLIP), to discriminate between resembling classes. Firstly, we modify the CLIP structure and design the Sliding Attention Pooling (SAP) to construct the Text-Driven Activation (TDA) module, learning the Class-Specific Activation (CSA) maps with class names. Since the semantic information is explicitly involved by the class name, the CSA maps exhibit clear distinctions among resembling classes. Meanwhile, to enrich fine-grained features ensuring distinguishability, the Multi-Level Correlation (MLC) module is designed to extract multi-level features of support and query images and generate various correlation maps. We further applied a decoder to fuse the CSA map and correlation maps with encoded features and obtain the final prediction. Experiments on the Pascal-5i and COCO-20i datasets have shown that CLearSeg outperforms previous methods, achieving the mIoU of 69.2 % and 48.9 % for 1-shot segmentation, particularly in distinguishing objects from resembling classes.

Original languageEnglish
Title of host publicationMultiMedia Modeling - 30th International Conference, MMM 2024, Proceedings
EditorsStevan Rudinac, Marcel Worring, Cynthia Liem, Alan Hanjalic, Björn Pór Jónsson, Yoko Yamakata, Bei Liu
PublisherSpringer Science and Business Media Deutschland GmbH
Pages172-186
Number of pages15
ISBN (Print)9783031533044
DOIs
Publication statusPublished - 2024
Event30th International Conference on MultiMedia Modeling, MMM 2024 - Amsterdam, Netherlands
Duration: 29 Jan 20242 Feb 2024

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14554 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference30th International Conference on MultiMedia Modeling, MMM 2024
Country/TerritoryNetherlands
CityAmsterdam
Period29/01/242/02/24

Keywords

  • CLIP
  • Few-Shot Learning
  • Resembling Class
  • Semantic Segmentation

Fingerprint

Dive into the research topics of '“Car or Bus?" CLearSeg: CLIP-Enhanced Discrimination Among Resembling Classes for Few-Shot Semantic Segmentation'. Together they form a unique fingerprint.

Cite this