Cross-Image Pixel Contrasting for Semantic Segmentation

Tianfei Zhou; Wenguan Wang

doi:10.1109/TPAMI.2024.3367952

Cross-Image Pixel Contrasting for Semantic Segmentation

Tianfei Zhou, Wenguan Wang

School of Computer Science and Technology

Zhejiang University

Research output: Contribution to journal › Article › peer-review

14 Citations (Scopus)

Abstract

This work studies the problem of image semantic segmentation. Current approaches focus mainly on mining “local” context, <italic>i.e.</italic>, dependencies between pixels within individual images, by specifically-designed, context aggregation modules (<italic>e.g.</italic>, dilated convolution, neural attention) or structure-aware optimization objectives (<italic>e.g.</italic>, IoU-like loss). However, they ignore “global” context of the training data, <italic>i.e.</italic>, rich semantic relations between pixels across different images. Inspired by recent advance in unsupervised contrastive representation learning, we propose a pixel-wise contrastive algorithm, dubbed as PiCo, for semantic segmentation in the fully supervised learning setting. The core idea is to enforce pixel embeddings belonging to a same semantic class to be more similar than embeddings from different classes. It raises a pixel-wise metric learning paradigm for semantic segmentation, by explicitly exploring the structures of labeled pixels, which were rarely studied before. Our training algorithm is compatible with modern segmentation solutions without extra overhead during testing. We experimentally show that, with famous segmentation models (<italic>i.e.</italic>, DeepLabV3, HRNet, OCRNet, SegFormer, Segmenter, MaskFormer) and backbones (<italic>i.e.</italic>, MobileNet, ResNet, HRNet, MiT, ViT), our algorithm brings consistent performance improvements across diverse datasets (<italic>i.e.</italic>, Cityscapes, ADE20K, PASCAL-Context, COCO-Stuff, CamVid). We expect that this work will encourage our community to rethink the current de facto training paradigm in semantic segmentation. Our code is available at <uri>https://github.com/tfzhou/ContrastiveSeg</uri>.

Original language	English
Pages (from-to)	1-15
Number of pages	15
Journal	IEEE Transactions on Pattern Analysis and Machine Intelligence
DOIs	https://doi.org/10.1109/TPAMI.2024.3367952
Publication status	Accepted/In press - 2024

Keywords

Contrastive Learning
Cross-Image Context
Image segmentation
Measurement
Metric Learning
Self-supervised learning
Semantic Segmentation
Semantic segmentation
Semantics
Task analysis
Training

Access to Document

10.1109/TPAMI.2024.3367952

Cite this

@article{892b033805cd45ad8e1025c2332d541e,

title = "Cross-Image Pixel Contrasting for Semantic Segmentation",

abstract = "This work studies the problem of image semantic segmentation. Current approaches focus mainly on mining “local” context, i.e., dependencies between pixels within individual images, by specifically-designed, context aggregation modules (e.g., dilated convolution, neural attention) or structure-aware optimization objectives (e.g., IoU-like loss). However, they ignore “global” context of the training data, i.e., rich semantic relations between pixels across different images. Inspired by recent advance in unsupervised contrastive representation learning, we propose a pixel-wise contrastive algorithm, dubbed as PiCo, for semantic segmentation in the fully supervised learning setting. The core idea is to enforce pixel embeddings belonging to a same semantic class to be more similar than embeddings from different classes. It raises a pixel-wise metric learning paradigm for semantic segmentation, by explicitly exploring the structures of labeled pixels, which were rarely studied before. Our training algorithm is compatible with modern segmentation solutions without extra overhead during testing. We experimentally show that, with famous segmentation models (i.e., DeepLabV3, HRNet, OCRNet, SegFormer, Segmenter, MaskFormer) and backbones (i.e., MobileNet, ResNet, HRNet, MiT, ViT), our algorithm brings consistent performance improvements across diverse datasets (i.e., Cityscapes, ADE20K, PASCAL-Context, COCO-Stuff, CamVid). We expect that this work will encourage our community to rethink the current de facto training paradigm in semantic segmentation. Our code is available at https://github.com/tfzhou/ContrastiveSeg.",

keywords = "Contrastive Learning, Cross-Image Context, Image segmentation, Measurement, Metric Learning, Self-supervised learning, Semantic Segmentation, Semantic segmentation, Semantics, Task analysis, Training",

author = "Tianfei Zhou and Wenguan Wang",

note = "Publisher Copyright: IEEE",

year = "2024",

doi = "10.1109/TPAMI.2024.3367952",

language = "English",

pages = "1--15",

journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",

issn = "0162-8828",

publisher = "IEEE Computer Society",

}

TY - JOUR

T1 - Cross-Image Pixel Contrasting for Semantic Segmentation

AU - Zhou, Tianfei

AU - Wang, Wenguan

N1 - Publisher Copyright: IEEE

PY - 2024

Y1 - 2024

N2 - This work studies the problem of image semantic segmentation. Current approaches focus mainly on mining “local” context, i.e., dependencies between pixels within individual images, by specifically-designed, context aggregation modules (e.g., dilated convolution, neural attention) or structure-aware optimization objectives (e.g., IoU-like loss). However, they ignore “global” context of the training data, i.e., rich semantic relations between pixels across different images. Inspired by recent advance in unsupervised contrastive representation learning, we propose a pixel-wise contrastive algorithm, dubbed as PiCo, for semantic segmentation in the fully supervised learning setting. The core idea is to enforce pixel embeddings belonging to a same semantic class to be more similar than embeddings from different classes. It raises a pixel-wise metric learning paradigm for semantic segmentation, by explicitly exploring the structures of labeled pixels, which were rarely studied before. Our training algorithm is compatible with modern segmentation solutions without extra overhead during testing. We experimentally show that, with famous segmentation models (i.e., DeepLabV3, HRNet, OCRNet, SegFormer, Segmenter, MaskFormer) and backbones (i.e., MobileNet, ResNet, HRNet, MiT, ViT), our algorithm brings consistent performance improvements across diverse datasets (i.e., Cityscapes, ADE20K, PASCAL-Context, COCO-Stuff, CamVid). We expect that this work will encourage our community to rethink the current de facto training paradigm in semantic segmentation. Our code is available at https://github.com/tfzhou/ContrastiveSeg.

AB - This work studies the problem of image semantic segmentation. Current approaches focus mainly on mining “local” context, i.e., dependencies between pixels within individual images, by specifically-designed, context aggregation modules (e.g., dilated convolution, neural attention) or structure-aware optimization objectives (e.g., IoU-like loss). However, they ignore “global” context of the training data, i.e., rich semantic relations between pixels across different images. Inspired by recent advance in unsupervised contrastive representation learning, we propose a pixel-wise contrastive algorithm, dubbed as PiCo, for semantic segmentation in the fully supervised learning setting. The core idea is to enforce pixel embeddings belonging to a same semantic class to be more similar than embeddings from different classes. It raises a pixel-wise metric learning paradigm for semantic segmentation, by explicitly exploring the structures of labeled pixels, which were rarely studied before. Our training algorithm is compatible with modern segmentation solutions without extra overhead during testing. We experimentally show that, with famous segmentation models (i.e., DeepLabV3, HRNet, OCRNet, SegFormer, Segmenter, MaskFormer) and backbones (i.e., MobileNet, ResNet, HRNet, MiT, ViT), our algorithm brings consistent performance improvements across diverse datasets (i.e., Cityscapes, ADE20K, PASCAL-Context, COCO-Stuff, CamVid). We expect that this work will encourage our community to rethink the current de facto training paradigm in semantic segmentation. Our code is available at https://github.com/tfzhou/ContrastiveSeg.

KW - Contrastive Learning

KW - Cross-Image Context

KW - Image segmentation

KW - Measurement

KW - Metric Learning

KW - Self-supervised learning

KW - Semantic Segmentation

KW - Semantic segmentation

KW - Semantics

KW - Task analysis

KW - Training

UR - http://www.scopus.com/inward/record.url?scp=85186089686&partnerID=8YFLogxK

U2 - 10.1109/TPAMI.2024.3367952

DO - 10.1109/TPAMI.2024.3367952

M3 - Article

AN - SCOPUS:85186089686

SN - 0162-8828

SP - 1

EP - 15

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

ER -

Cross-Image Pixel Contrasting for Semantic Segmentation

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this