Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation

  • Fei Zhang
  • , Tianfei Zhou
  • , Boyang Li
  • , Hao He
  • , Chaofan Ma
  • , Tianjiao Zhang
  • , Jiangchao Yao*
  • , Ya Zhang
  • , Yanfeng Wang
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

22 Citations (Scopus)

Abstract

This paper studies the problem of weakly open-vocabulary semantic segmentation (WOVSS), which learns to segment objects of arbitrary classes using mere image-text pairs.Existing works turn to enhance the vanilla vision transformer by introducing explicit grouping recognition, i.e., employing several group tokens/centroids to cluster the image tokens and perform the group-text alignment.Nevertheless, these methods suffer from a granularity inconsistency regarding the usage of group tokens, which are aligned in the all-to-one v.s.one-to-one manners during the training and inference phases, respectively.We argue that this discrepancy arises from the lack of elaborate supervision for each group token.To bridge this granularity gap, this paper explores explicit supervision for the group tokens from the prototypical knowledge.To this end, this paper proposes the non-learnable prototypical regularization (NPR) where non-learnable prototypes are estimated from source features to serve as supervision and enable contrastive matching of the group tokens.This regularization encourages the group tokens to segment objects with less redundancy and capture more comprehensive semantic regions, leading to increased compactness and richness.Based on NPR, we propose the prototypical guidance segmentation network (PGSeg) that incorporates multi-modal regularization by leveraging prototypical sources from both images and texts at different levels, progressively enhancing the segmentation capability with diverse prototypical patterns.Experimental results show that our proposed method achieves state-of-the-art performance on several benchmark datasets.The source code is available at https://github.com/Ferenas/PGSeg.

Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems 36 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023
EditorsA. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, S. Levine
PublisherNeural information processing systems foundation
ISBN (Electronic)9781713899921
Publication statusPublished - 2023
Event37th Conference on Neural Information Processing Systems, NeurIPS 2023 - New Orleans, United States
Duration: 10 Dec 202316 Dec 2023

Publication series

NameAdvances in Neural Information Processing Systems
Volume36
ISSN (Print)1049-5258

Conference

Conference37th Conference on Neural Information Processing Systems, NeurIPS 2023
Country/TerritoryUnited States
CityNew Orleans
Period10/12/2316/12/23

Fingerprint

Dive into the research topics of 'Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation'. Together they form a unique fingerprint.

Cite this