Prototype-based Semantic Segmentation

Tianfei Zhou, Wenguan Wang

科研成果: 期刊稿件文章同行评审

9 引用 (Scopus)

摘要

Deep learning based semantic segmentation solutions have yielded compelling results over the preceding decade. They encompass diverse network architectures (FCN based or attention based), along with various mask decoding schemes (parametric softmax based or pixel-query based). Despite the divergence, they can be grouped within a unified framework by interpreting the softmax weights or query vectors as learnable class prototypes. In light of this prototype view, we reveal inherent limitations within the parametric segmentation regime, and accordingly develop a nonparametric alternative based on non-learnable prototypes. In contrast to previous approaches that entail the learning of a single weight/query vector per class in a fully parametric manner, our approach represents each class as a set of non-learnable prototypes, relying solely upon the mean features of training pixels within that class. The pixel-wise prediction is thus achieved by nonparametric nearest prototype retrieving. This allows our model to directly shape the pixel embedding space by optimizing the arrangement between embedded pixels and anchored prototypes. It is able to accommodate an arbitrary number of classes with a constant number of learnable parameters. Through empirical evaluation with FCN based and Transformer based segmentation models (i.e., HRNet, Swin, SegFormer, Mask2Former) and backbones (i.e., ResNet, HRNet, Swin, MiT), our nonparametric framework shows superior performance on standard segmentation datasets (i.e., ADE20K, Cityscapes, COCO-Stuff), as well as in large-vocabulary semantic segmentation scenarios. We expect that this study will provoke a rethink of the current <italic>de facto</italic> semantic segmentation model design.

源语言英语
页(从-至)1-15
页数15
期刊IEEE Transactions on Pattern Analysis and Machine Intelligence
DOI
出版状态已接受/待刊 - 2024

指纹

探究 'Prototype-based Semantic Segmentation' 的科研主题。它们共同构成独一无二的指纹。

引用此