Prototype-based Semantic Segmentation

Tianfei Zhou; Wenguan Wang

doi:10.1109/TPAMI.2024.3387116

Prototype-based Semantic Segmentation

Tianfei Zhou, Wenguan Wang

计算机学院

Zhejiang University

科研成果: 期刊稿件 › 文章 › 同行评审

16 引用（Scopus）

摘要

Deep learning based semantic segmentation solutions have yielded compelling results over the preceding decade. They encompass diverse network architectures (FCN based or attention based), along with various mask decoding schemes (parametric softmax based or pixel-query based). Despite the divergence, they can be grouped within a unified framework by interpreting the softmax weights or query vectors as learnable class prototypes. In light of this prototype view, we reveal inherent limitations within the parametric segmentation regime, and accordingly develop a nonparametric alternative based on non-learnable prototypes. In contrast to previous approaches that entail the learning of a single weight/query vector per class in a fully parametric manner, our approach represents each class as a set of non-learnable prototypes, relying solely upon the mean features of training pixels within that class. The pixel-wise prediction is thus achieved by nonparametric nearest prototype retrieving. This allows our model to directly shape the pixel embedding space by optimizing the arrangement between embedded pixels and anchored prototypes. It is able to accommodate an arbitrary number of classes with a constant number of learnable parameters. Through empirical evaluation with FCN based and Transformer based segmentation models (i.e., HRNet, Swin, SegFormer, Mask2Former) and backbones (i.e., ResNet, HRNet, Swin, MiT), our nonparametric framework shows superior performance on standard segmentation datasets (i.e., ADE20K, Cityscapes, COCO-Stuff), as well as in large-vocabulary semantic segmentation scenarios. We expect that this study will provoke a rethink of the current <italic>de facto</italic> semantic segmentation model design.

源语言	英语
页（从-至）	1-15
页数	15
期刊	IEEE Transactions on Pattern Analysis and Machine Intelligence
DOI	https://doi.org/10.1109/TPAMI.2024.3387116
出版状态	已接受/待刊 - 2024

访问文件

10.1109/TPAMI.2024.3387116

其它文件与链接

链接到 Scopus 的出版物

引用此

Zhou, T., & Wang, W. (已接受/印刷中). Prototype-based Semantic Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1-15. https://doi.org/10.1109/TPAMI.2024.3387116

@article{be8d65ba040e4812b4c78e8aad980ba1,

title = "Prototype-based Semantic Segmentation",

abstract = "Deep learning based semantic segmentation solutions have yielded compelling results over the preceding decade. They encompass diverse network architectures (FCN based or attention based), along with various mask decoding schemes (parametric softmax based or pixel-query based). Despite the divergence, they can be grouped within a unified framework by interpreting the softmax weights or query vectors as learnable class prototypes. In light of this prototype view, we reveal inherent limitations within the parametric segmentation regime, and accordingly develop a nonparametric alternative based on non-learnable prototypes. In contrast to previous approaches that entail the learning of a single weight/query vector per class in a fully parametric manner, our approach represents each class as a set of non-learnable prototypes, relying solely upon the mean features of training pixels within that class. The pixel-wise prediction is thus achieved by nonparametric nearest prototype retrieving. This allows our model to directly shape the pixel embedding space by optimizing the arrangement between embedded pixels and anchored prototypes. It is able to accommodate an arbitrary number of classes with a constant number of learnable parameters. Through empirical evaluation with FCN based and Transformer based segmentation models (i.e., HRNet, Swin, SegFormer, Mask2Former) and backbones (i.e., ResNet, HRNet, Swin, MiT), our nonparametric framework shows superior performance on standard segmentation datasets (i.e., ADE20K, Cityscapes, COCO-Stuff), as well as in large-vocabulary semantic segmentation scenarios. We expect that this study will provoke a rethink of the current de facto semantic segmentation model design.",

keywords = "Image segmentation, Measurement, Nonparametric Classification, Online Clustering, Prototype, Prototypes, Semantic Segmentation, Semantic segmentation, Semantics, Transformers, Vectors",

author = "Tianfei Zhou and Wenguan Wang",

note = "Publisher Copyright: IEEE",

year = "2024",

doi = "10.1109/TPAMI.2024.3387116",

language = "English",

pages = "1--15",

journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",

issn = "0162-8828",

publisher = "IEEE Computer Society",

}

TY - JOUR

T1 - Prototype-based Semantic Segmentation

AU - Zhou, Tianfei

AU - Wang, Wenguan

N1 - Publisher Copyright: IEEE

PY - 2024

Y1 - 2024

N2 - Deep learning based semantic segmentation solutions have yielded compelling results over the preceding decade. They encompass diverse network architectures (FCN based or attention based), along with various mask decoding schemes (parametric softmax based or pixel-query based). Despite the divergence, they can be grouped within a unified framework by interpreting the softmax weights or query vectors as learnable class prototypes. In light of this prototype view, we reveal inherent limitations within the parametric segmentation regime, and accordingly develop a nonparametric alternative based on non-learnable prototypes. In contrast to previous approaches that entail the learning of a single weight/query vector per class in a fully parametric manner, our approach represents each class as a set of non-learnable prototypes, relying solely upon the mean features of training pixels within that class. The pixel-wise prediction is thus achieved by nonparametric nearest prototype retrieving. This allows our model to directly shape the pixel embedding space by optimizing the arrangement between embedded pixels and anchored prototypes. It is able to accommodate an arbitrary number of classes with a constant number of learnable parameters. Through empirical evaluation with FCN based and Transformer based segmentation models (i.e., HRNet, Swin, SegFormer, Mask2Former) and backbones (i.e., ResNet, HRNet, Swin, MiT), our nonparametric framework shows superior performance on standard segmentation datasets (i.e., ADE20K, Cityscapes, COCO-Stuff), as well as in large-vocabulary semantic segmentation scenarios. We expect that this study will provoke a rethink of the current de facto semantic segmentation model design.

AB - Deep learning based semantic segmentation solutions have yielded compelling results over the preceding decade. They encompass diverse network architectures (FCN based or attention based), along with various mask decoding schemes (parametric softmax based or pixel-query based). Despite the divergence, they can be grouped within a unified framework by interpreting the softmax weights or query vectors as learnable class prototypes. In light of this prototype view, we reveal inherent limitations within the parametric segmentation regime, and accordingly develop a nonparametric alternative based on non-learnable prototypes. In contrast to previous approaches that entail the learning of a single weight/query vector per class in a fully parametric manner, our approach represents each class as a set of non-learnable prototypes, relying solely upon the mean features of training pixels within that class. The pixel-wise prediction is thus achieved by nonparametric nearest prototype retrieving. This allows our model to directly shape the pixel embedding space by optimizing the arrangement between embedded pixels and anchored prototypes. It is able to accommodate an arbitrary number of classes with a constant number of learnable parameters. Through empirical evaluation with FCN based and Transformer based segmentation models (i.e., HRNet, Swin, SegFormer, Mask2Former) and backbones (i.e., ResNet, HRNet, Swin, MiT), our nonparametric framework shows superior performance on standard segmentation datasets (i.e., ADE20K, Cityscapes, COCO-Stuff), as well as in large-vocabulary semantic segmentation scenarios. We expect that this study will provoke a rethink of the current de facto semantic segmentation model design.

KW - Image segmentation

KW - Measurement

KW - Nonparametric Classification

KW - Online Clustering

KW - Prototype

KW - Prototypes

KW - Semantic Segmentation

KW - Semantic segmentation

KW - Semantics

KW - Transformers

KW - Vectors

UR - http://www.scopus.com/inward/record.url?scp=85190169315&partnerID=8YFLogxK

U2 - 10.1109/TPAMI.2024.3387116

DO - 10.1109/TPAMI.2024.3387116

M3 - Article

AN - SCOPUS:85190169315

SN - 0162-8828

SP - 1

EP - 15

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

ER -

Prototype-based Semantic Segmentation

摘要

访问文件

其它文件与链接

指纹

引用此