PRFormer: Matching Proposal and Reference Masks by Semantic and Spatial Similarity for Few-Shot Semantic Segmentation

Guangyu Gao*, Anqi Zhang, Jianbo Jiao, Chi Harold Liu, Yunchao Wei

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Few-shot Semantic Segmentation (FSS) aims to accurately segment query images with guidance from only a few annotated support images. Previous methods typically rely on pixel-level feature correlations, denoted as the many-to-many (pixels-to-pixels) or few-to-many (prototype-to-pixels) manners. Recent mask proposals classification pipeline in semantic segmentation enables more efficient few-to-few (prototype-to-prototype) correlation between masks of query proposals and support reference. However, these methods still involve intermediate pixel-level feature correlation, resulting in lower efficiency. In this paper, we introduce the Proposal and Reference masks matching transFormer (PRFormer), designed to rigorously address mask matching in both spatial and semantic aspects in a thorough few-to-few manner. Following the mask-classification paradigm, PRFormer starts with a class-agnostic proposal generator to partition the query image into proposal masks. It then evaluates the features corresponding to query proposal masks and support reference masks using two strategies: semantic matching based on feature similarity across prototypes and spatial matching through mask intersection ratio. These strategies are implemented as the Prototype Contrastive Correlation (PrCC) and Prior-Proposals Intersection (PPI) modules, respectively. These strategies enhance matching precision and efficiency while eliminating dependence on pixel-level feature correlations. Additionally, we propose the category discrimination NCE (cdNCE) loss and IoU-KLD loss to constrain the adapted prototypes and align the similarity vector with the corresponding IoU between proposals and ground truth. Given that class-agnostic proposals tend to be more accurate for training classes than for novel classes in FSS, we introduce the Weighted Proposal Refinement (WPR) to refine the most confident masks with detailed features, yielding more precise predictions. Experiments on the popular Pascal-5i and COCO-20i benchmarks show that our Few-to-Few approach, PRFormer, outperforms previous methods, achieving mIoU scores of 70.4% and 49.4%, respectively, on 1-shot segmentation.

Original languageEnglish
Pages (from-to)8161-8173
Number of pages13
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume35
Issue number8
DOIs
Publication statusPublished - 2025
Externally publishedYes

Keywords

  • Few-shot learning
  • contrastive learning
  • mask matching
  • proposal masks
  • semantic segmentation

Fingerprint

Dive into the research topics of 'PRFormer: Matching Proposal and Reference Masks by Semantic and Spatial Similarity for Few-Shot Semantic Segmentation'. Together they form a unique fingerprint.

Cite this