TY - JOUR
T1 - SynerNet
T2 - Broad-to-precise CAM synergy for weakly supervised semantic segmentation
AU - Wang, Zhonggai
AU - Gao, Guangyu
AU - Li, Zhuoshu
AU - Qin, A. K.
N1 - Publisher Copyright:
© 2026 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
PY - 2026/10
Y1 - 2026/10
N2 - Weakly Supervised Semantic Segmentation (WSSS) remains highly challenging because image-level supervision typically produces class activation maps (CAMs) that are incomplete or noisy when used as pixel-level pseudo-labels. Despite the architectural efficiency of one-stage approaches, they are often hindered by tight encoder-label coupling: CAMs and segmentation predictions are derived from the same encoder and optimized jointly, leading to the propagation and reinforcement of initial CAM inaccuracies by the segmentation outputs. To circumvent this limitation, we propose SynerNet, a one-stage dual-branch framework that explicitly mandates complementary yet synergistic objectives: one branch generates broad pseudo-labels to enhance coverage, while the other produces precise pseudo-labels to sharpen localization. With such pseudo-labels, the segmentation network yields simultaneously comprehensive and accurate predictions. The broad branch (B-CAM) leverages global attention to expand foreground coverage by guiding ambiguous regions toward likely foreground, whereas the precise branch (P-CAM) emphasizes fine localization by encouraging unreliable pixels toward the background. Through cross-supervision, the two branches effectively decouple the optimization process, alleviating the risk of error reinforcement inherent in direct coupling. To further integrate their strengths, we introduce a confidence matrix derived from multi-scale ViT features, in which pixels consistently classified across layers are treated as high-confidence, while inconsistent ones are marked as uncertain. This enables a confidence-guided fusion strategy that directly adopts reliable predictions and adaptively blends uncertain regions using contributions from both branches. Such a complementary design mitigates error reinforcement and promotes mutually beneficial learning, enabling the network to generate high-fidelity pseudo-labels in a fully end-to-end manner. By combining branch-specific objectives with confidence-guided fusion, SynerNet produces pseudo-labels that are both complete and precise, achieves state-of-the-art performance on PASCAL VOC 2012 and COCO 2014, and demonstrates the effectiveness of one-stage co-training for high-quality weakly supervised segmentation. The code is publicly available at: https://github.com/ZhonggaiWang/DEFormer.
AB - Weakly Supervised Semantic Segmentation (WSSS) remains highly challenging because image-level supervision typically produces class activation maps (CAMs) that are incomplete or noisy when used as pixel-level pseudo-labels. Despite the architectural efficiency of one-stage approaches, they are often hindered by tight encoder-label coupling: CAMs and segmentation predictions are derived from the same encoder and optimized jointly, leading to the propagation and reinforcement of initial CAM inaccuracies by the segmentation outputs. To circumvent this limitation, we propose SynerNet, a one-stage dual-branch framework that explicitly mandates complementary yet synergistic objectives: one branch generates broad pseudo-labels to enhance coverage, while the other produces precise pseudo-labels to sharpen localization. With such pseudo-labels, the segmentation network yields simultaneously comprehensive and accurate predictions. The broad branch (B-CAM) leverages global attention to expand foreground coverage by guiding ambiguous regions toward likely foreground, whereas the precise branch (P-CAM) emphasizes fine localization by encouraging unreliable pixels toward the background. Through cross-supervision, the two branches effectively decouple the optimization process, alleviating the risk of error reinforcement inherent in direct coupling. To further integrate their strengths, we introduce a confidence matrix derived from multi-scale ViT features, in which pixels consistently classified across layers are treated as high-confidence, while inconsistent ones are marked as uncertain. This enables a confidence-guided fusion strategy that directly adopts reliable predictions and adaptively blends uncertain regions using contributions from both branches. Such a complementary design mitigates error reinforcement and promotes mutually beneficial learning, enabling the network to generate high-fidelity pseudo-labels in a fully end-to-end manner. By combining branch-specific objectives with confidence-guided fusion, SynerNet produces pseudo-labels that are both complete and precise, achieves state-of-the-art performance on PASCAL VOC 2012 and COCO 2014, and demonstrates the effectiveness of one-stage co-training for high-quality weakly supervised segmentation. The code is publicly available at: https://github.com/ZhonggaiWang/DEFormer.
KW - CAM
KW - Co-training
KW - Dual-branch
KW - Pseudo-label
KW - Weakly supervised semantic segmentation
UR - https://www.scopus.com/pages/publications/105037755240
U2 - 10.1016/j.neunet.2026.109024
DO - 10.1016/j.neunet.2026.109024
M3 - Article
AN - SCOPUS:105037755240
SN - 0893-6080
VL - 202
JO - Neural Networks
JF - Neural Networks
M1 - 109024
ER -