Cross-Image Pixel Contrasting for Semantic Segmentation

Tianfei Zhou; Wenguan Wang

doi:10.1109/TPAMI.2024.3367952

Cross-Image Pixel Contrasting for Semantic Segmentation

Tianfei Zhou, Wenguan Wang

计算机学院

Zhejiang University

科研成果: 期刊稿件 › 文章 › 同行评审

27 引用（Scopus）

摘要

This work studies the problem of image semantic segmentation. Current approaches focus mainly on mining “local” context, <italic>i.e.</italic>, dependencies between pixels within individual images, by specifically-designed, context aggregation modules (<italic>e.g.</italic>, dilated convolution, neural attention) or structure-aware optimization objectives (<italic>e.g.</italic>, IoU-like loss). However, they ignore “global” context of the training data, <italic>i.e.</italic>, rich semantic relations between pixels across different images. Inspired by recent advance in unsupervised contrastive representation learning, we propose a pixel-wise contrastive algorithm, dubbed as PiCo, for semantic segmentation in the fully supervised learning setting. The core idea is to enforce pixel embeddings belonging to a same semantic class to be more similar than embeddings from different classes. It raises a pixel-wise metric learning paradigm for semantic segmentation, by explicitly exploring the structures of labeled pixels, which were rarely studied before. Our training algorithm is compatible with modern segmentation solutions without extra overhead during testing. We experimentally show that, with famous segmentation models (<italic>i.e.</italic>, DeepLabV3, HRNet, OCRNet, SegFormer, Segmenter, MaskFormer) and backbones (<italic>i.e.</italic>, MobileNet, ResNet, HRNet, MiT, ViT), our algorithm brings consistent performance improvements across diverse datasets (<italic>i.e.</italic>, Cityscapes, ADE20K, PASCAL-Context, COCO-Stuff, CamVid). We expect that this work will encourage our community to rethink the current de facto training paradigm in semantic segmentation. Our code is available at <uri>https://github.com/tfzhou/ContrastiveSeg</uri>.

源语言	英语
页（从-至）	1-15
页数	15
期刊	IEEE Transactions on Pattern Analysis and Machine Intelligence
DOI	https://doi.org/10.1109/TPAMI.2024.3367952
出版状态	已接受/待刊 - 2024

访问文件

10.1109/TPAMI.2024.3367952

其它文件与链接

链接到 Scopus 的出版物

引用此

Zhou, T., & Wang, W. (已接受/印刷中). Cross-Image Pixel Contrasting for Semantic Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1-15. https://doi.org/10.1109/TPAMI.2024.3367952

@article{892b033805cd45ad8e1025c2332d541e,

title = "Cross-Image Pixel Contrasting for Semantic Segmentation",

abstract = "This work studies the problem of image semantic segmentation. Current approaches focus mainly on mining “local” context, i.e., dependencies between pixels within individual images, by specifically-designed, context aggregation modules (e.g., dilated convolution, neural attention) or structure-aware optimization objectives (e.g., IoU-like loss). However, they ignore “global” context of the training data, i.e., rich semantic relations between pixels across different images. Inspired by recent advance in unsupervised contrastive representation learning, we propose a pixel-wise contrastive algorithm, dubbed as PiCo, for semantic segmentation in the fully supervised learning setting. The core idea is to enforce pixel embeddings belonging to a same semantic class to be more similar than embeddings from different classes. It raises a pixel-wise metric learning paradigm for semantic segmentation, by explicitly exploring the structures of labeled pixels, which were rarely studied before. Our training algorithm is compatible with modern segmentation solutions without extra overhead during testing. We experimentally show that, with famous segmentation models (i.e., DeepLabV3, HRNet, OCRNet, SegFormer, Segmenter, MaskFormer) and backbones (i.e., MobileNet, ResNet, HRNet, MiT, ViT), our algorithm brings consistent performance improvements across diverse datasets (i.e., Cityscapes, ADE20K, PASCAL-Context, COCO-Stuff, CamVid). We expect that this work will encourage our community to rethink the current de facto training paradigm in semantic segmentation. Our code is available at https://github.com/tfzhou/ContrastiveSeg.",

keywords = "Contrastive Learning, Cross-Image Context, Image segmentation, Measurement, Metric Learning, Self-supervised learning, Semantic Segmentation, Semantic segmentation, Semantics, Task analysis, Training",

author = "Tianfei Zhou and Wenguan Wang",

note = "Publisher Copyright: IEEE",

year = "2024",

doi = "10.1109/TPAMI.2024.3367952",

language = "English",

pages = "1--15",

journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",

issn = "0162-8828",

publisher = "IEEE Computer Society",

}

TY - JOUR

T1 - Cross-Image Pixel Contrasting for Semantic Segmentation

AU - Zhou, Tianfei

AU - Wang, Wenguan

N1 - Publisher Copyright: IEEE

PY - 2024

Y1 - 2024

N2 - This work studies the problem of image semantic segmentation. Current approaches focus mainly on mining “local” context, i.e., dependencies between pixels within individual images, by specifically-designed, context aggregation modules (e.g., dilated convolution, neural attention) or structure-aware optimization objectives (e.g., IoU-like loss). However, they ignore “global” context of the training data, i.e., rich semantic relations between pixels across different images. Inspired by recent advance in unsupervised contrastive representation learning, we propose a pixel-wise contrastive algorithm, dubbed as PiCo, for semantic segmentation in the fully supervised learning setting. The core idea is to enforce pixel embeddings belonging to a same semantic class to be more similar than embeddings from different classes. It raises a pixel-wise metric learning paradigm for semantic segmentation, by explicitly exploring the structures of labeled pixels, which were rarely studied before. Our training algorithm is compatible with modern segmentation solutions without extra overhead during testing. We experimentally show that, with famous segmentation models (i.e., DeepLabV3, HRNet, OCRNet, SegFormer, Segmenter, MaskFormer) and backbones (i.e., MobileNet, ResNet, HRNet, MiT, ViT), our algorithm brings consistent performance improvements across diverse datasets (i.e., Cityscapes, ADE20K, PASCAL-Context, COCO-Stuff, CamVid). We expect that this work will encourage our community to rethink the current de facto training paradigm in semantic segmentation. Our code is available at https://github.com/tfzhou/ContrastiveSeg.

AB - This work studies the problem of image semantic segmentation. Current approaches focus mainly on mining “local” context, i.e., dependencies between pixels within individual images, by specifically-designed, context aggregation modules (e.g., dilated convolution, neural attention) or structure-aware optimization objectives (e.g., IoU-like loss). However, they ignore “global” context of the training data, i.e., rich semantic relations between pixels across different images. Inspired by recent advance in unsupervised contrastive representation learning, we propose a pixel-wise contrastive algorithm, dubbed as PiCo, for semantic segmentation in the fully supervised learning setting. The core idea is to enforce pixel embeddings belonging to a same semantic class to be more similar than embeddings from different classes. It raises a pixel-wise metric learning paradigm for semantic segmentation, by explicitly exploring the structures of labeled pixels, which were rarely studied before. Our training algorithm is compatible with modern segmentation solutions without extra overhead during testing. We experimentally show that, with famous segmentation models (i.e., DeepLabV3, HRNet, OCRNet, SegFormer, Segmenter, MaskFormer) and backbones (i.e., MobileNet, ResNet, HRNet, MiT, ViT), our algorithm brings consistent performance improvements across diverse datasets (i.e., Cityscapes, ADE20K, PASCAL-Context, COCO-Stuff, CamVid). We expect that this work will encourage our community to rethink the current de facto training paradigm in semantic segmentation. Our code is available at https://github.com/tfzhou/ContrastiveSeg.

KW - Contrastive Learning

KW - Cross-Image Context

KW - Image segmentation

KW - Measurement

KW - Metric Learning

KW - Self-supervised learning

KW - Semantic Segmentation

KW - Semantic segmentation

KW - Semantics

KW - Task analysis

KW - Training

UR - http://www.scopus.com/inward/record.url?scp=85186089686&partnerID=8YFLogxK

U2 - 10.1109/TPAMI.2024.3367952

DO - 10.1109/TPAMI.2024.3367952

M3 - Article

AN - SCOPUS:85186089686

SN - 0162-8828

SP - 1

EP - 15

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

ER -

Cross-Image Pixel Contrasting for Semantic Segmentation

摘要

访问文件

其它文件与链接

指纹

引用此