TY - GEN
T1 - “Car or Bus?" CLearSeg
T2 - 30th International Conference on MultiMedia Modeling, MMM 2024
AU - Zhang, Anqi
AU - Gao, Guangyu
AU - Lv, Zhuocheng
AU - An, Yukun
N1 - Publisher Copyright:
© 2024, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2024
Y1 - 2024
N2 - Few-shot semantic segmentation aims at learning to segment query images of unseen classes with the guidance of limited segmented support examples. However, existing models tend to confuse the resembling classes (e.g., ‘car’ and ‘bus’) thus generating erroneous predictions. To address this, we propose the CLIP-enhanced discrimination among resembling classes for few-shot semantic Segmentation (CLearSeg), which leverages information beyond support images, including the class name, through Contrastive Language-Image Pretraining (CLIP), to discriminate between resembling classes. Firstly, we modify the CLIP structure and design the Sliding Attention Pooling (SAP) to construct the Text-Driven Activation (TDA) module, learning the Class-Specific Activation (CSA) maps with class names. Since the semantic information is explicitly involved by the class name, the CSA maps exhibit clear distinctions among resembling classes. Meanwhile, to enrich fine-grained features ensuring distinguishability, the Multi-Level Correlation (MLC) module is designed to extract multi-level features of support and query images and generate various correlation maps. We further applied a decoder to fuse the CSA map and correlation maps with encoded features and obtain the final prediction. Experiments on the Pascal-5i and COCO-20i datasets have shown that CLearSeg outperforms previous methods, achieving the mIoU of 69.2 % and 48.9 % for 1-shot segmentation, particularly in distinguishing objects from resembling classes.
AB - Few-shot semantic segmentation aims at learning to segment query images of unseen classes with the guidance of limited segmented support examples. However, existing models tend to confuse the resembling classes (e.g., ‘car’ and ‘bus’) thus generating erroneous predictions. To address this, we propose the CLIP-enhanced discrimination among resembling classes for few-shot semantic Segmentation (CLearSeg), which leverages information beyond support images, including the class name, through Contrastive Language-Image Pretraining (CLIP), to discriminate between resembling classes. Firstly, we modify the CLIP structure and design the Sliding Attention Pooling (SAP) to construct the Text-Driven Activation (TDA) module, learning the Class-Specific Activation (CSA) maps with class names. Since the semantic information is explicitly involved by the class name, the CSA maps exhibit clear distinctions among resembling classes. Meanwhile, to enrich fine-grained features ensuring distinguishability, the Multi-Level Correlation (MLC) module is designed to extract multi-level features of support and query images and generate various correlation maps. We further applied a decoder to fuse the CSA map and correlation maps with encoded features and obtain the final prediction. Experiments on the Pascal-5i and COCO-20i datasets have shown that CLearSeg outperforms previous methods, achieving the mIoU of 69.2 % and 48.9 % for 1-shot segmentation, particularly in distinguishing objects from resembling classes.
KW - CLIP
KW - Few-Shot Learning
KW - Resembling Class
KW - Semantic Segmentation
UR - http://www.scopus.com/inward/record.url?scp=85185703832&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-53305-1_14
DO - 10.1007/978-3-031-53305-1_14
M3 - Conference contribution
AN - SCOPUS:85185703832
SN - 9783031533044
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 172
EP - 186
BT - MultiMedia Modeling - 30th International Conference, MMM 2024, Proceedings
A2 - Rudinac, Stevan
A2 - Worring, Marcel
A2 - Liem, Cynthia
A2 - Hanjalic, Alan
A2 - Jónsson, Björn Pór
A2 - Yamakata, Yoko
A2 - Liu, Bei
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 29 January 2024 through 2 February 2024
ER -