TY - JOUR
T1 - Learning intrinsic decomposition with semantic information fusion based on transformer
AU - ZHAO, Pengjie
AU - SHA, Hao
AU - WANG, Yongtian
AU - LIU, Yue
N1 - Publisher Copyright:
© 2025 Beijing Zhongke Journal Publishing Co. Ltd.
PY - 2025/12
Y1 - 2025/12
N2 - Intrinsic decomposition, the process of decomposing an image into reflectance and shading, is widely used in virtual and augmented reality tasks. Reflectance and shading often exhibit large gradients at the object edges, and the intrinsic properties on the same object tend to be similar. This spatial coherence is closely related to semantic consistency because objects within the same semantic category often exhibit similar intrinsic properties. Therefore, incorporating semantic segmentation into a deep intrinsic decomposition framework helps the network distinguish between different object instances and understand high-level scene structures. To this end, we design an intrinsic decomposition network jointly trained with a dedicated semantic segmentation module, allowing semantic cues to enhance the decomposition of reflectance and shading. The semantic module provides guidance during training but is removed during inference, improving performance without increasing the inference cost. Additionally, to capture the global contextual dependencies critical for intrinsic decomposition, we adopt a Transformer-based backbone. The proposed backbone enables the model to associate distant regions with similar material properties, thereby maintaining consistency in reflectance and learning smooth illumination patterns across a scene. A convolutional decoder is also designed to output predictions with improved details. Experiments demonstrate that our approach achieves state-of-the-art performance in the quantitative evaluations on the Intrinsic Images in the Wild (IIW) and Shading Annotations in the wild (SAW) datasets.
AB - Intrinsic decomposition, the process of decomposing an image into reflectance and shading, is widely used in virtual and augmented reality tasks. Reflectance and shading often exhibit large gradients at the object edges, and the intrinsic properties on the same object tend to be similar. This spatial coherence is closely related to semantic consistency because objects within the same semantic category often exhibit similar intrinsic properties. Therefore, incorporating semantic segmentation into a deep intrinsic decomposition framework helps the network distinguish between different object instances and understand high-level scene structures. To this end, we design an intrinsic decomposition network jointly trained with a dedicated semantic segmentation module, allowing semantic cues to enhance the decomposition of reflectance and shading. The semantic module provides guidance during training but is removed during inference, improving performance without increasing the inference cost. Additionally, to capture the global contextual dependencies critical for intrinsic decomposition, we adopt a Transformer-based backbone. The proposed backbone enables the model to associate distant regions with similar material properties, thereby maintaining consistency in reflectance and learning smooth illumination patterns across a scene. A convolutional decoder is also designed to output predictions with improved details. Experiments demonstrate that our approach achieves state-of-the-art performance in the quantitative evaluations on the Intrinsic Images in the Wild (IIW) and Shading Annotations in the wild (SAW) datasets.
KW - Augmented reality
KW - Detachable decoder
KW - Intrinsic image decomposition
KW - Joint learning
KW - Semantic segmentation
UR - https://www.scopus.com/pages/publications/105027931692
U2 - 10.1016/j.vrih.2025.09.001
DO - 10.1016/j.vrih.2025.09.001
M3 - Article
AN - SCOPUS:105027931692
SN - 2096-5796
VL - 7
SP - 543
EP - 559
JO - Virtual Reality and Intelligent Hardware
JF - Virtual Reality and Intelligent Hardware
IS - 6
ER -