Rethinking Semantic Segmentation: A Prototype View

Tianfei Zhou; Wenguan Wang; Ender Konukoglu; Luc Van Goo

doi:10.1109/CVPR52688.2022.00261

Rethinking Semantic Segmentation: A Prototype View

Tianfei Zhou, Wenguan Wang^*, Ender Konukoglu, Luc Van Goo

^*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

219 Citations (Scopus)

Abstract

Prevalent semantic segmentation solutions, despite their different network designs (FCN based or attention based) and mask decoding strategies (parametric softmax based or pixel-query based), can be placed in one category, by considering the softmax weights or query vectors as learnable class prototypes. In light of this prototype view, this study uncovers several limitations of such parametric segmentation regime, and proposes a nonparametric alternative based on non-learnable prototypes. Instead of prior methods learning a single weight/query vector for each class in a fully parametric manner, our model represents each class as a set of non-learnable prototypes, relying solely on the mean fea-tures of several training pixels within that class. The dense prediction is thus achieved by nonparametric nearest prototype retrieving. This allows our model to directly shape the pixel embedding space, by optimizing the arrangement between embedded pixels and anchored prototypes. It is able to handle arbitrary number of classes with a constant amount of learnable parameters. We empirically show that, with FCN based and attention based segmentation models (i.e., HR-Net, Swin, SegFormer) and backbones (i.e., ResNet, HRNet, Swin, MiT), our nonparametric framework yields compel-ling results over several datasets (i.e., ADE20K, Cityscapes, COCO-Stuff), and performs well in the large-vocabulary situation. We expect this work will provoke a rethink of the current de facto semantic segmentation model design.

Original language	English
Title of host publication	Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
Publisher	IEEE Computer Society
Pages	2572-2583
Number of pages	12
ISBN (Electronic)	9781665469463
DOIs	https://doi.org/10.1109/CVPR52688.2022.00261
Publication status	Published - 2022
Externally published	Yes
Event	2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 - New Orleans, United States Duration: 19 Jun 2022 → 24 Jun 2022

Publication series

Name	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Volume	2022-June
ISSN (Print)	1063-6919

Conference

Conference	2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
Country/Territory	United States
City	New Orleans
Period	19/06/22 → 24/06/22

Keywords

Scene analysis and understanding
Segmentation
grouping and shape analysis

Access to Document

10.1109/CVPR52688.2022.00261

Cite this

Zhou, T., Wang, W., Konukoglu, E., & Van Goo, L. (2022). Rethinking Semantic Segmentation: A Prototype View. In Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 (pp. 2572-2583). (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition; Vol. 2022-June). IEEE Computer Society. https://doi.org/10.1109/CVPR52688.2022.00261

@inproceedings{fc5be234d4574bbea36db522bfba7baa,

title = "Rethinking Semantic Segmentation: A Prototype View",

abstract = "Prevalent semantic segmentation solutions, despite their different network designs (FCN based or attention based) and mask decoding strategies (parametric softmax based or pixel-query based), can be placed in one category, by considering the softmax weights or query vectors as learnable class prototypes. In light of this prototype view, this study uncovers several limitations of such parametric segmentation regime, and proposes a nonparametric alternative based on non-learnable prototypes. Instead of prior methods learning a single weight/query vector for each class in a fully parametric manner, our model represents each class as a set of non-learnable prototypes, relying solely on the mean fea-tures of several training pixels within that class. The dense prediction is thus achieved by nonparametric nearest prototype retrieving. This allows our model to directly shape the pixel embedding space, by optimizing the arrangement between embedded pixels and anchored prototypes. It is able to handle arbitrary number of classes with a constant amount of learnable parameters. We empirically show that, with FCN based and attention based segmentation models (i.e., HR-Net, Swin, SegFormer) and backbones (i.e., ResNet, HRNet, Swin, MiT), our nonparametric framework yields compel-ling results over several datasets (i.e., ADE20K, Cityscapes, COCO-Stuff), and performs well in the large-vocabulary situation. We expect this work will provoke a rethink of the current de facto semantic segmentation model design.",

keywords = "Scene analysis and understanding, Segmentation, grouping and shape analysis",

author = "Tianfei Zhou and Wenguan Wang and Ender Konukoglu and {Van Goo}, Luc",

note = "Publisher Copyright: {\textcopyright} 2022 IEEE.; 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 ; Conference date: 19-06-2022 Through 24-06-2022",

year = "2022",

doi = "10.1109/CVPR52688.2022.00261",

language = "English",

series = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",

publisher = "IEEE Computer Society",

pages = "2572--2583",

booktitle = "Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022",

address = "United States",

}

Zhou, T, Wang, W, Konukoglu, E & Van Goo, L 2022, Rethinking Semantic Segmentation: A Prototype View. in Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2022-June, IEEE Computer Society, pp. 2572-2583, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, United States, 19/06/22. https://doi.org/10.1109/CVPR52688.2022.00261

Rethinking Semantic Segmentation: A Prototype View. / Zhou, Tianfei; Wang, Wenguan; Konukoglu, Ender et al.
Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022. IEEE Computer Society, 2022. p. 2572-2583 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition; Vol. 2022-June).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Rethinking Semantic Segmentation

T2 - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022

AU - Zhou, Tianfei

AU - Wang, Wenguan

AU - Konukoglu, Ender

AU - Van Goo, Luc

PY - 2022

Y1 - 2022

N2 - Prevalent semantic segmentation solutions, despite their different network designs (FCN based or attention based) and mask decoding strategies (parametric softmax based or pixel-query based), can be placed in one category, by considering the softmax weights or query vectors as learnable class prototypes. In light of this prototype view, this study uncovers several limitations of such parametric segmentation regime, and proposes a nonparametric alternative based on non-learnable prototypes. Instead of prior methods learning a single weight/query vector for each class in a fully parametric manner, our model represents each class as a set of non-learnable prototypes, relying solely on the mean fea-tures of several training pixels within that class. The dense prediction is thus achieved by nonparametric nearest prototype retrieving. This allows our model to directly shape the pixel embedding space, by optimizing the arrangement between embedded pixels and anchored prototypes. It is able to handle arbitrary number of classes with a constant amount of learnable parameters. We empirically show that, with FCN based and attention based segmentation models (i.e., HR-Net, Swin, SegFormer) and backbones (i.e., ResNet, HRNet, Swin, MiT), our nonparametric framework yields compel-ling results over several datasets (i.e., ADE20K, Cityscapes, COCO-Stuff), and performs well in the large-vocabulary situation. We expect this work will provoke a rethink of the current de facto semantic segmentation model design.

AB - Prevalent semantic segmentation solutions, despite their different network designs (FCN based or attention based) and mask decoding strategies (parametric softmax based or pixel-query based), can be placed in one category, by considering the softmax weights or query vectors as learnable class prototypes. In light of this prototype view, this study uncovers several limitations of such parametric segmentation regime, and proposes a nonparametric alternative based on non-learnable prototypes. Instead of prior methods learning a single weight/query vector for each class in a fully parametric manner, our model represents each class as a set of non-learnable prototypes, relying solely on the mean fea-tures of several training pixels within that class. The dense prediction is thus achieved by nonparametric nearest prototype retrieving. This allows our model to directly shape the pixel embedding space, by optimizing the arrangement between embedded pixels and anchored prototypes. It is able to handle arbitrary number of classes with a constant amount of learnable parameters. We empirically show that, with FCN based and attention based segmentation models (i.e., HR-Net, Swin, SegFormer) and backbones (i.e., ResNet, HRNet, Swin, MiT), our nonparametric framework yields compel-ling results over several datasets (i.e., ADE20K, Cityscapes, COCO-Stuff), and performs well in the large-vocabulary situation. We expect this work will provoke a rethink of the current de facto semantic segmentation model design.

KW - Scene analysis and understanding

KW - Segmentation

KW - grouping and shape analysis

UR - http://www.scopus.com/inward/record.url?scp=85141804234&partnerID=8YFLogxK

U2 - 10.1109/CVPR52688.2022.00261

DO - 10.1109/CVPR52688.2022.00261

M3 - Conference contribution

AN - SCOPUS:85141804234

T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

SP - 2572

EP - 2583

BT - Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022

PB - IEEE Computer Society

Y2 - 19 June 2022 through 24 June 2022

ER -

Rethinking Semantic Segmentation: A Prototype View

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this