TY - JOUR
T1 - It Takes Two
T2 - Multi-frequency Perception with Complementary Fusion Network for Complex Scene Segmentation
AU - Zhang, Jin
AU - Zhang, Ruiheng
AU - Cao, Zhe
AU - Xu, Lixin
AU - Chen, Xi
AU - Xu, Min
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Complex scene segmentation aims to segment objects with intricate details or those concealed within the background. Despite significant advancements, a persistent challenge remains: accurately identifying object edges in backgrounds with high inherent similarity and complex structures. To address this, we identify the prevalent spectral bias in image segmentation, where networks preferentially learn low-frequency information, as a key impediment to recognizing and learning object edges, which are rich in high-frequency details. To mitigate this bias, we propose MCNet, a segmentation framework designed to promote balanced frequency learning. MCNet comprises two primary components: multi-frequency perception (MP), which independently captures high-frequency details and low-frequency structural components of objects, and complementary fusion (CF), which intelligently fuses these distinct frequency features through learnable, adaptive mechanisms. Crucially, MCNet employs a novel frequency-aware consistency adversarial loss to explicitly guide the learning across different frequency bands. MCNet effectively integrates MP and CF, enhancing the detection of high-frequency details and low-frequency structures, thereby alleviating challenges posed by spectral bias. We evaluate the proposed method on complex scene segmentation tasks, including camouflaged object detection and dichotomous image segmentation. Through extensive comparisons with 31 existing methods across 8 benchmark datasets, we demonstrate the superiority of the proposed method.
AB - Complex scene segmentation aims to segment objects with intricate details or those concealed within the background. Despite significant advancements, a persistent challenge remains: accurately identifying object edges in backgrounds with high inherent similarity and complex structures. To address this, we identify the prevalent spectral bias in image segmentation, where networks preferentially learn low-frequency information, as a key impediment to recognizing and learning object edges, which are rich in high-frequency details. To mitigate this bias, we propose MCNet, a segmentation framework designed to promote balanced frequency learning. MCNet comprises two primary components: multi-frequency perception (MP), which independently captures high-frequency details and low-frequency structural components of objects, and complementary fusion (CF), which intelligently fuses these distinct frequency features through learnable, adaptive mechanisms. Crucially, MCNet employs a novel frequency-aware consistency adversarial loss to explicitly guide the learning across different frequency bands. MCNet effectively integrates MP and CF, enhancing the detection of high-frequency details and low-frequency structures, thereby alleviating challenges posed by spectral bias. We evaluate the proposed method on complex scene segmentation tasks, including camouflaged object detection and dichotomous image segmentation. Through extensive comparisons with 31 existing methods across 8 benchmark datasets, we demonstrate the superiority of the proposed method.
KW - Complementary fusion
KW - Complex scene segmentation
KW - Multi-frequency perception
KW - Spectral bias
UR - https://www.scopus.com/pages/publications/105020407529
U2 - 10.1109/TCSVT.2025.3626574
DO - 10.1109/TCSVT.2025.3626574
M3 - Article
AN - SCOPUS:105020407529
SN - 1051-8215
JO - IEEE Transactions on Circuits and Systems for Video Technology
JF - IEEE Transactions on Circuits and Systems for Video Technology
ER -