“Car or Bus?" CLearSeg: CLIP-Enhanced Discrimination Among Resembling Classes for Few-Shot Semantic Segmentation

Anqi Zhang; Guangyu Gao; Zhuocheng Lv; Yukun An

doi:10.1007/978-3-031-53305-1_14

“Car or Bus?" CLearSeg: CLIP-Enhanced Discrimination Among Resembling Classes for Few-Shot Semantic Segmentation

Anqi Zhang, Guangyu Gao^*, Zhuocheng Lv, Yukun An

^*此作品的通讯作者

计算机学院

Beijing Institute of Technology

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

Few-shot semantic segmentation aims at learning to segment query images of unseen classes with the guidance of limited segmented support examples. However, existing models tend to confuse the resembling classes (e.g., ‘car’ and ‘bus’) thus generating erroneous predictions. To address this, we propose the CLIP-enhanced discrimination among resembling classes for few-shot semantic Segmentation (CLearSeg), which leverages information beyond support images, including the class name, through Contrastive Language-Image Pretraining (CLIP), to discriminate between resembling classes. Firstly, we modify the CLIP structure and design the Sliding Attention Pooling (SAP) to construct the Text-Driven Activation (TDA) module, learning the Class-Specific Activation (CSA) maps with class names. Since the semantic information is explicitly involved by the class name, the CSA maps exhibit clear distinctions among resembling classes. Meanwhile, to enrich fine-grained features ensuring distinguishability, the Multi-Level Correlation (MLC) module is designed to extract multi-level features of support and query images and generate various correlation maps. We further applied a decoder to fuse the CSA map and correlation maps with encoded features and obtain the final prediction. Experiments on the Pascal-5ⁱ and COCO-20ⁱ datasets have shown that CLearSeg outperforms previous methods, achieving the mIoU of 69.2 % and 48.9 % for 1-shot segmentation, particularly in distinguishing objects from resembling classes.

源语言	英语
主期刊名	MultiMedia Modeling - 30th International Conference, MMM 2024, Proceedings
编辑	Stevan Rudinac, Marcel Worring, Cynthia Liem, Alan Hanjalic, Björn Pór Jónsson, Yoko Yamakata, Bei Liu
出版商	Springer Science and Business Media Deutschland GmbH
页	172-186
页数	15
ISBN（印刷版）	9783031533044
DOI	https://doi.org/10.1007/978-3-031-53305-1_14
出版状态	已出版 - 2024
活动	30th International Conference on MultiMedia Modeling, MMM 2024 - Amsterdam, 荷兰期限: 29 1月 2024 → 2 2月 2024

出版系列

姓名	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
卷	14554 LNCS
ISSN（印刷版）	0302-9743
ISSN（电子版）	1611-3349

会议

会议	30th International Conference on MultiMedia Modeling, MMM 2024
国家/地区	荷兰
市	Amsterdam
时期	29/01/24 → 2/02/24

访问文件

10.1007/978-3-031-53305-1_14

其它文件与链接

链接到 Scopus 的出版物

引用此

Zhang, A., Gao, G., Lv, Z., & An, Y. (2024). “Car or Bus?" CLearSeg: CLIP-Enhanced Discrimination Among Resembling Classes for Few-Shot Semantic Segmentation. 在 S. Rudinac, M. Worring, C. Liem, A. Hanjalic, B. P. Jónsson, Y. Yamakata, & B. Liu (编辑), MultiMedia Modeling - 30th International Conference, MMM 2024, Proceedings (页码 172-186). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 14554 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-53305-1_14

Zhang, Anqi ; Gao, Guangyu ; Lv, Zhuocheng 等. / “Car or Bus?" CLearSeg : CLIP-Enhanced Discrimination Among Resembling Classes for Few-Shot Semantic Segmentation. MultiMedia Modeling - 30th International Conference, MMM 2024, Proceedings. 编辑 / Stevan Rudinac ; Marcel Worring ; Cynthia Liem ; Alan Hanjalic ; Björn Pór Jónsson ; Yoko Yamakata ; Bei Liu. Springer Science and Business Media Deutschland GmbH, 2024. 页码 172-186 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{85ab756665c34b2d95d4af2e3073ea7a,

title = "“Car or Bus?{"} CLearSeg: CLIP-Enhanced Discrimination Among Resembling Classes for Few-Shot Semantic Segmentation",

abstract = "Few-shot semantic segmentation aims at learning to segment query images of unseen classes with the guidance of limited segmented support examples. However, existing models tend to confuse the resembling classes (e.g., {\textquoteleft}car{\textquoteright} and {\textquoteleft}bus{\textquoteright}) thus generating erroneous predictions. To address this, we propose the CLIP-enhanced discrimination among resembling classes for few-shot semantic Segmentation (CLearSeg), which leverages information beyond support images, including the class name, through Contrastive Language-Image Pretraining (CLIP), to discriminate between resembling classes. Firstly, we modify the CLIP structure and design the Sliding Attention Pooling (SAP) to construct the Text-Driven Activation (TDA) module, learning the Class-Specific Activation (CSA) maps with class names. Since the semantic information is explicitly involved by the class name, the CSA maps exhibit clear distinctions among resembling classes. Meanwhile, to enrich fine-grained features ensuring distinguishability, the Multi-Level Correlation (MLC) module is designed to extract multi-level features of support and query images and generate various correlation maps. We further applied a decoder to fuse the CSA map and correlation maps with encoded features and obtain the final prediction. Experiments on the Pascal-5i and COCO-20i datasets have shown that CLearSeg outperforms previous methods, achieving the mIoU of 69.2 % and 48.9 % for 1-shot segmentation, particularly in distinguishing objects from resembling classes.",

keywords = "CLIP, Few-Shot Learning, Resembling Class, Semantic Segmentation",

author = "Anqi Zhang and Guangyu Gao and Zhuocheng Lv and Yukun An",

note = "Publisher Copyright: {\textcopyright} 2024, The Author(s), under exclusive license to Springer Nature Switzerland AG.; 30th International Conference on MultiMedia Modeling, MMM 2024 ; Conference date: 29-01-2024 Through 02-02-2024",

year = "2024",

doi = "10.1007/978-3-031-53305-1_14",

language = "English",

isbn = "9783031533044",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "172--186",

editor = "Stevan Rudinac and Marcel Worring and Cynthia Liem and Alan Hanjalic and J{\'o}nsson, {Bj{\"o}rn P{\'o}r} and Yoko Yamakata and Bei Liu",

booktitle = "MultiMedia Modeling - 30th International Conference, MMM 2024, Proceedings",

address = "Germany",

}

Zhang, A, Gao, G, Lv, Z & An, Y 2024, “Car or Bus?" CLearSeg: CLIP-Enhanced Discrimination Among Resembling Classes for Few-Shot Semantic Segmentation. 在 S Rudinac, M Worring, C Liem, A Hanjalic, BP Jónsson, Y Yamakata & B Liu (编辑), MultiMedia Modeling - 30th International Conference, MMM 2024, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 卷 14554 LNCS, Springer Science and Business Media Deutschland GmbH, 页码 172-186, 30th International Conference on MultiMedia Modeling, MMM 2024, Amsterdam, 荷兰, 29/01/24. https://doi.org/10.1007/978-3-031-53305-1_14

“Car or Bus?" CLearSeg: CLIP-Enhanced Discrimination Among Resembling Classes for Few-Shot Semantic Segmentation. / Zhang, Anqi; Gao, Guangyu; Lv, Zhuocheng 等.
MultiMedia Modeling - 30th International Conference, MMM 2024, Proceedings. 编辑 / Stevan Rudinac; Marcel Worring; Cynthia Liem; Alan Hanjalic; Björn Pór Jónsson; Yoko Yamakata; Bei Liu. Springer Science and Business Media Deutschland GmbH, 2024. 页码 172-186 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 14554 LNCS).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - “Car or Bus?" CLearSeg

T2 - 30th International Conference on MultiMedia Modeling, MMM 2024

AU - Zhang, Anqi

AU - Gao, Guangyu

AU - Lv, Zhuocheng

AU - An, Yukun

PY - 2024

Y1 - 2024

N2 - Few-shot semantic segmentation aims at learning to segment query images of unseen classes with the guidance of limited segmented support examples. However, existing models tend to confuse the resembling classes (e.g., ‘car’ and ‘bus’) thus generating erroneous predictions. To address this, we propose the CLIP-enhanced discrimination among resembling classes for few-shot semantic Segmentation (CLearSeg), which leverages information beyond support images, including the class name, through Contrastive Language-Image Pretraining (CLIP), to discriminate between resembling classes. Firstly, we modify the CLIP structure and design the Sliding Attention Pooling (SAP) to construct the Text-Driven Activation (TDA) module, learning the Class-Specific Activation (CSA) maps with class names. Since the semantic information is explicitly involved by the class name, the CSA maps exhibit clear distinctions among resembling classes. Meanwhile, to enrich fine-grained features ensuring distinguishability, the Multi-Level Correlation (MLC) module is designed to extract multi-level features of support and query images and generate various correlation maps. We further applied a decoder to fuse the CSA map and correlation maps with encoded features and obtain the final prediction. Experiments on the Pascal-5i and COCO-20i datasets have shown that CLearSeg outperforms previous methods, achieving the mIoU of 69.2 % and 48.9 % for 1-shot segmentation, particularly in distinguishing objects from resembling classes.

AB - Few-shot semantic segmentation aims at learning to segment query images of unseen classes with the guidance of limited segmented support examples. However, existing models tend to confuse the resembling classes (e.g., ‘car’ and ‘bus’) thus generating erroneous predictions. To address this, we propose the CLIP-enhanced discrimination among resembling classes for few-shot semantic Segmentation (CLearSeg), which leverages information beyond support images, including the class name, through Contrastive Language-Image Pretraining (CLIP), to discriminate between resembling classes. Firstly, we modify the CLIP structure and design the Sliding Attention Pooling (SAP) to construct the Text-Driven Activation (TDA) module, learning the Class-Specific Activation (CSA) maps with class names. Since the semantic information is explicitly involved by the class name, the CSA maps exhibit clear distinctions among resembling classes. Meanwhile, to enrich fine-grained features ensuring distinguishability, the Multi-Level Correlation (MLC) module is designed to extract multi-level features of support and query images and generate various correlation maps. We further applied a decoder to fuse the CSA map and correlation maps with encoded features and obtain the final prediction. Experiments on the Pascal-5i and COCO-20i datasets have shown that CLearSeg outperforms previous methods, achieving the mIoU of 69.2 % and 48.9 % for 1-shot segmentation, particularly in distinguishing objects from resembling classes.

KW - CLIP

KW - Few-Shot Learning

KW - Resembling Class

KW - Semantic Segmentation

UR - http://www.scopus.com/inward/record.url?scp=85185703832&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-53305-1_14

DO - 10.1007/978-3-031-53305-1_14

M3 - Conference contribution

AN - SCOPUS:85185703832

SN - 9783031533044

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 172

EP - 186

BT - MultiMedia Modeling - 30th International Conference, MMM 2024, Proceedings

A2 - Rudinac, Stevan

A2 - Worring, Marcel

A2 - Liem, Cynthia

A2 - Hanjalic, Alan

A2 - Jónsson, Björn Pór

A2 - Yamakata, Yoko

A2 - Liu, Bei

PB - Springer Science and Business Media Deutschland GmbH

Y2 - 29 January 2024 through 2 February 2024

ER -

Zhang A, Gao G, Lv Z, An Y. “Car or Bus?" CLearSeg: CLIP-Enhanced Discrimination Among Resembling Classes for Few-Shot Semantic Segmentation. 在 Rudinac S, Worring M, Liem C, Hanjalic A, Jónsson BP, Yamakata Y, Liu B, 编辑, MultiMedia Modeling - 30th International Conference, MMM 2024, Proceedings. Springer Science and Business Media Deutschland GmbH. 2024. 页码 172-186. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-031-53305-1_14

“Car or Bus?" CLearSeg: CLIP-Enhanced Discrimination Among Resembling Classes for Few-Shot Semantic Segmentation

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此