DRNet: Double Recalibration Network for Few-Shot Semantic Segmentation

Guangyu Gao; Zhiyuan Fang; Cen Han; Yunchao Wei; Chi Harold Liu; Shuicheng Yan

doi:10.1109/TIP.2022.3215905

DRNet: Double Recalibration Network for Few-Shot Semantic Segmentation

Guangyu Gao^*, Zhiyuan Fang, Cen Han, Yunchao Wei, Chi Harold Liu, Shuicheng Yan

^*此作品的通讯作者

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

14 引用（Scopus）

摘要

Few-shot segmentation aims at learning to segment query images guided by only a few annotated images from the support set. Previous methods rely on mining the feature embedding similarity across the query and the support images to achieve successful segmentation. However, these models tend to perform badly in cases where the query instances have a large variance from the support ones. To enhance model robustness against such intra-class variance, we propose a Double Recalibration Network (DRNet) with two recalibration modules, i.e., the Self-adapted Recalibration (SR) module and the Cross-attended Recalibration (CR) module. In particular, beyond learning robust feature embedding for pixel-wise comparison between support and query as in conventional methods, the DRNet further exploits semantic-aware knowledge embedded in the query image to help segment itself, which we call 'self-adapted recalibration'. More specifically, DRNet first employs guidance from the support set to roughly predict an incomplete but correct initial object region for the query image, and then reversely uses the feature embedding extracted from the incomplete object region to segment the query image. Also, we devise a CR module to refine the feature representation of the query image by propagating the underlying knowledge embedded in the support image's foreground to the query. Instead of foreground global pooling, we refine the response at each pixel in the query feature map by attending to all foreground pixels in the support feature map and taking the weighted average by their similarity; meanwhile, feature maps of the query image are also added back to weighted feature maps as a residual connection. Our DRNet can effectively address the intra-class variance under the few-shot setting with such two recalibration modules, and mine more accurate target regions for query images. We conduct extensive experiments on the popular benchmarks PASCAL- 5i and COCO- 20i. The DRNet with the best configuration achieves the mIoU of 63.6% and 64.9% on PASCAL- 5i and 44.7% and 49.6% on COCO- 20i for 1-shot and 5-shot settings respectively, significantly outperforming the state-of-the-arts without any bells and whistles. Code is available at: https://github.com/fangzy97/drnet.

源语言	英语
页（从-至）	6733-6746
页数	14
期刊	IEEE Transactions on Image Processing
卷	31
DOI	https://doi.org/10.1109/TIP.2022.3215905
出版状态	已出版 - 2022

访问文件

10.1109/TIP.2022.3215905

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{d0ef2f51b7de419ebc3d3d9453de3718,

title = "DRNet: Double Recalibration Network for Few-Shot Semantic Segmentation",

abstract = "Few-shot segmentation aims at learning to segment query images guided by only a few annotated images from the support set. Previous methods rely on mining the feature embedding similarity across the query and the support images to achieve successful segmentation. However, these models tend to perform badly in cases where the query instances have a large variance from the support ones. To enhance model robustness against such intra-class variance, we propose a Double Recalibration Network (DRNet) with two recalibration modules, i.e., the Self-adapted Recalibration (SR) module and the Cross-attended Recalibration (CR) module. In particular, beyond learning robust feature embedding for pixel-wise comparison between support and query as in conventional methods, the DRNet further exploits semantic-aware knowledge embedded in the query image to help segment itself, which we call 'self-adapted recalibration'. More specifically, DRNet first employs guidance from the support set to roughly predict an incomplete but correct initial object region for the query image, and then reversely uses the feature embedding extracted from the incomplete object region to segment the query image. Also, we devise a CR module to refine the feature representation of the query image by propagating the underlying knowledge embedded in the support image's foreground to the query. Instead of foreground global pooling, we refine the response at each pixel in the query feature map by attending to all foreground pixels in the support feature map and taking the weighted average by their similarity; meanwhile, feature maps of the query image are also added back to weighted feature maps as a residual connection. Our DRNet can effectively address the intra-class variance under the few-shot setting with such two recalibration modules, and mine more accurate target regions for query images. We conduct extensive experiments on the popular benchmarks PASCAL- 5i and COCO- 20i. The DRNet with the best configuration achieves the mIoU of 63.6% and 64.9% on PASCAL- 5i and 44.7% and 49.6% on COCO- 20i for 1-shot and 5-shot settings respectively, significantly outperforming the state-of-the-arts without any bells and whistles. Code is available at: https://github.com/fangzy97/drnet.",

keywords = "Few-shot learning, recalibration, semantic segmentation, weakly-supervised learning",

author = "Guangyu Gao and Zhiyuan Fang and Cen Han and Yunchao Wei and Liu, {Chi Harold} and Shuicheng Yan",

note = "Publisher Copyright: {\textcopyright} 1992-2012 IEEE.",

year = "2022",

doi = "10.1109/TIP.2022.3215905",

language = "English",

volume = "31",

pages = "6733--6746",

journal = "IEEE Transactions on Image Processing",

issn = "1057-7149",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - DRNet

T2 - Double Recalibration Network for Few-Shot Semantic Segmentation

AU - Gao, Guangyu

AU - Fang, Zhiyuan

AU - Han, Cen

AU - Wei, Yunchao

AU - Liu, Chi Harold

AU - Yan, Shuicheng

PY - 2022

Y1 - 2022

N2 - Few-shot segmentation aims at learning to segment query images guided by only a few annotated images from the support set. Previous methods rely on mining the feature embedding similarity across the query and the support images to achieve successful segmentation. However, these models tend to perform badly in cases where the query instances have a large variance from the support ones. To enhance model robustness against such intra-class variance, we propose a Double Recalibration Network (DRNet) with two recalibration modules, i.e., the Self-adapted Recalibration (SR) module and the Cross-attended Recalibration (CR) module. In particular, beyond learning robust feature embedding for pixel-wise comparison between support and query as in conventional methods, the DRNet further exploits semantic-aware knowledge embedded in the query image to help segment itself, which we call 'self-adapted recalibration'. More specifically, DRNet first employs guidance from the support set to roughly predict an incomplete but correct initial object region for the query image, and then reversely uses the feature embedding extracted from the incomplete object region to segment the query image. Also, we devise a CR module to refine the feature representation of the query image by propagating the underlying knowledge embedded in the support image's foreground to the query. Instead of foreground global pooling, we refine the response at each pixel in the query feature map by attending to all foreground pixels in the support feature map and taking the weighted average by their similarity; meanwhile, feature maps of the query image are also added back to weighted feature maps as a residual connection. Our DRNet can effectively address the intra-class variance under the few-shot setting with such two recalibration modules, and mine more accurate target regions for query images. We conduct extensive experiments on the popular benchmarks PASCAL- 5i and COCO- 20i. The DRNet with the best configuration achieves the mIoU of 63.6% and 64.9% on PASCAL- 5i and 44.7% and 49.6% on COCO- 20i for 1-shot and 5-shot settings respectively, significantly outperforming the state-of-the-arts without any bells and whistles. Code is available at: https://github.com/fangzy97/drnet.

AB - Few-shot segmentation aims at learning to segment query images guided by only a few annotated images from the support set. Previous methods rely on mining the feature embedding similarity across the query and the support images to achieve successful segmentation. However, these models tend to perform badly in cases where the query instances have a large variance from the support ones. To enhance model robustness against such intra-class variance, we propose a Double Recalibration Network (DRNet) with two recalibration modules, i.e., the Self-adapted Recalibration (SR) module and the Cross-attended Recalibration (CR) module. In particular, beyond learning robust feature embedding for pixel-wise comparison between support and query as in conventional methods, the DRNet further exploits semantic-aware knowledge embedded in the query image to help segment itself, which we call 'self-adapted recalibration'. More specifically, DRNet first employs guidance from the support set to roughly predict an incomplete but correct initial object region for the query image, and then reversely uses the feature embedding extracted from the incomplete object region to segment the query image. Also, we devise a CR module to refine the feature representation of the query image by propagating the underlying knowledge embedded in the support image's foreground to the query. Instead of foreground global pooling, we refine the response at each pixel in the query feature map by attending to all foreground pixels in the support feature map and taking the weighted average by their similarity; meanwhile, feature maps of the query image are also added back to weighted feature maps as a residual connection. Our DRNet can effectively address the intra-class variance under the few-shot setting with such two recalibration modules, and mine more accurate target regions for query images. We conduct extensive experiments on the popular benchmarks PASCAL- 5i and COCO- 20i. The DRNet with the best configuration achieves the mIoU of 63.6% and 64.9% on PASCAL- 5i and 44.7% and 49.6% on COCO- 20i for 1-shot and 5-shot settings respectively, significantly outperforming the state-of-the-arts without any bells and whistles. Code is available at: https://github.com/fangzy97/drnet.

KW - Few-shot learning

KW - recalibration

KW - semantic segmentation

KW - weakly-supervised learning

UR - http://www.scopus.com/inward/record.url?scp=85141004326&partnerID=8YFLogxK

U2 - 10.1109/TIP.2022.3215905

DO - 10.1109/TIP.2022.3215905

M3 - Article

C2 - 36282824

AN - SCOPUS:85141004326

SN - 1057-7149

VL - 31

SP - 6733

EP - 6746

JO - IEEE Transactions on Image Processing

JF - IEEE Transactions on Image Processing

ER -

DRNet: Double Recalibration Network for Few-Shot Semantic Segmentation

摘要

访问文件

其它文件与链接

指纹

引用此