TY - JOUR
T1 - Semantic Hierarchy-Aware Segmentation
AU - Li, Liulei
AU - Wang, Wenguan
AU - Zhou, Tianfei
AU - Quan, Ruijie
AU - Yang, Yi
N1 - Publisher Copyright:
© 1979-2012 IEEE.
PY - 2024/4/1
Y1 - 2024/4/1
N2 - Humans are able to recognize structured relations in observation, allowing us to decompose complex scenes into simpler parts and abstract the visual world at multiple levels. However, such hierarchical reasoning ability of human perception remains largely unexplored in current literature of semantic segmentation. Existing works are often aware of flatten labels and distinguish all the semantic categories exclusively for each pixel. In this work, we instead address hierarchical semantic segmentation (HSS), with the aim of providing a structured, pixel-wise description of visual observation in terms of a class hierarchy. We devise Hssn, a general HSS framework that tackles two critical issues in this task: i) how to efficiently adapt existing hierarchy-agnostic segmentation networks to the HSS setting, and ii) how to leverage the class hierarchy to regularize HSS network learning. To address i), Hssn directly casts HSS as a pixel-wise multi-label classification task, only bringing minimal architecture change to current segmentation models. To solve ii), Hssn first explores inherent properties of the hierarchy as a training objective, which enforces segmentation predictions to obey the hierarchy structure. Furthermore, with a set of hierarchy-induced margin constraints, Hssn efficiently reshapes the learned pixel embedding space, so as to generate hierarchy-aware pixel representations and facilitate structured segmentation eventually. Building upon Hssn, we further exploit the mutual exclusion relation between semantic labels and strengthen the margin based regularization strategy with more meaningful constrains, leading to Hssn+, a more effective framework for HSS. We conduct extensive experiments on six semantic segmentation datasets (i.e., Mapillary Vistas 2.0, Cityscapes, LIP, PASCAL-Person-Part, PASCAL-Part-58, and PASCAL-Part-108), with different class hierarchies, network architectures, and backbones, and the results confirm the generalization and superiority of our algorithms.
AB - Humans are able to recognize structured relations in observation, allowing us to decompose complex scenes into simpler parts and abstract the visual world at multiple levels. However, such hierarchical reasoning ability of human perception remains largely unexplored in current literature of semantic segmentation. Existing works are often aware of flatten labels and distinguish all the semantic categories exclusively for each pixel. In this work, we instead address hierarchical semantic segmentation (HSS), with the aim of providing a structured, pixel-wise description of visual observation in terms of a class hierarchy. We devise Hssn, a general HSS framework that tackles two critical issues in this task: i) how to efficiently adapt existing hierarchy-agnostic segmentation networks to the HSS setting, and ii) how to leverage the class hierarchy to regularize HSS network learning. To address i), Hssn directly casts HSS as a pixel-wise multi-label classification task, only bringing minimal architecture change to current segmentation models. To solve ii), Hssn first explores inherent properties of the hierarchy as a training objective, which enforces segmentation predictions to obey the hierarchy structure. Furthermore, with a set of hierarchy-induced margin constraints, Hssn efficiently reshapes the learned pixel embedding space, so as to generate hierarchy-aware pixel representations and facilitate structured segmentation eventually. Building upon Hssn, we further exploit the mutual exclusion relation between semantic labels and strengthen the margin based regularization strategy with more meaningful constrains, leading to Hssn+, a more effective framework for HSS. We conduct extensive experiments on six semantic segmentation datasets (i.e., Mapillary Vistas 2.0, Cityscapes, LIP, PASCAL-Person-Part, PASCAL-Part-58, and PASCAL-Part-108), with different class hierarchies, network architectures, and backbones, and the results confirm the generalization and superiority of our algorithms.
KW - Hierarchy constraint
KW - scene parsing
KW - semantic hierarchy
KW - semantic segmentation
UR - http://www.scopus.com/inward/record.url?scp=85177043649&partnerID=8YFLogxK
U2 - 10.1109/TPAMI.2023.3332435
DO - 10.1109/TPAMI.2023.3332435
M3 - Article
AN - SCOPUS:85177043649
SN - 0162-8828
VL - 46
SP - 2123
EP - 2138
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
IS - 4
ER -