TY - JOUR
T1 - Superpixel-based Visual Feature Enhancement for Compositional Zero-Shot Learning
AU - Du, Wenlong
AU - Bao, Xianglin
AU - Xu, Xiaofeng
AU - Lu, Xingyu
AU - Zhang, Ruiheng
N1 - Publisher Copyright:
© 2025 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
PY - 2026/3
Y1 - 2026/3
N2 - Compositional Zero-Shot Learning (CZSL) is a challenging machine learning task that recognizes new compositional concepts by leveraging learned concepts such as attribute-object combinations. Previous research depended on visual attributes derived from networks pre-trained in object categorization. These approaches are limited in capturing the subtleties of attribute distinctions and fail to account for the critical contextual interactions between attributes and visual objects. To address this problem, in this work, we draw inspiration from superpixels and introduce the Superpixel-based Visual Feature Enhancement (SVFE) model for the compositional zero-shot learning task. In the proposed approach, an innovative superpixel integration strategy is designed to meticulously disentangle and represent the visual concepts of states and objects with finer granularity. Then, we introduce a novel Fourier spectral layer that harnesses the frequency domain to capture global image features and dynamically adjusts component contributions to enhance the local detail representation. Furthermore, we propose a long-range fusion module to optimize the synergy between the local and global features, thereby fortifying the model’s acuity in discerning intricate compositional relationships. Through rigorous experiments on standard CZSL benchmark datasets, the proposed SVFE model demonstrates significant improvement over other state-of-the-art methods in both open-world and closed-world CZSL scenarios.
AB - Compositional Zero-Shot Learning (CZSL) is a challenging machine learning task that recognizes new compositional concepts by leveraging learned concepts such as attribute-object combinations. Previous research depended on visual attributes derived from networks pre-trained in object categorization. These approaches are limited in capturing the subtleties of attribute distinctions and fail to account for the critical contextual interactions between attributes and visual objects. To address this problem, in this work, we draw inspiration from superpixels and introduce the Superpixel-based Visual Feature Enhancement (SVFE) model for the compositional zero-shot learning task. In the proposed approach, an innovative superpixel integration strategy is designed to meticulously disentangle and represent the visual concepts of states and objects with finer granularity. Then, we introduce a novel Fourier spectral layer that harnesses the frequency domain to capture global image features and dynamically adjusts component contributions to enhance the local detail representation. Furthermore, we propose a long-range fusion module to optimize the synergy between the local and global features, thereby fortifying the model’s acuity in discerning intricate compositional relationships. Through rigorous experiments on standard CZSL benchmark datasets, the proposed SVFE model demonstrates significant improvement over other state-of-the-art methods in both open-world and closed-world CZSL scenarios.
KW - Attention fusion
KW - Attribute-object combinations
KW - Compositional zero-shot learning
KW - Fourier spectral layer
KW - Superpixel segmentation
UR - https://www.scopus.com/pages/publications/105017850368
U2 - 10.1016/j.ipm.2025.104414
DO - 10.1016/j.ipm.2025.104414
M3 - Article
AN - SCOPUS:105017850368
SN - 0306-4573
VL - 63
JO - Information Processing and Management
JF - Information Processing and Management
IS - 2
M1 - 104414
ER -