Learning intrinsic decomposition with semantic information fusion based on transformer

  • Pengjie ZHAO
  • , Hao SHA
  • , Yongtian WANG
  • , Yue LIU*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Intrinsic decomposition, the process of decomposing an image into reflectance and shading, is widely used in virtual and augmented reality tasks. Reflectance and shading often exhibit large gradients at the object edges, and the intrinsic properties on the same object tend to be similar. This spatial coherence is closely related to semantic consistency because objects within the same semantic category often exhibit similar intrinsic properties. Therefore, incorporating semantic segmentation into a deep intrinsic decomposition framework helps the network distinguish between different object instances and understand high-level scene structures. To this end, we design an intrinsic decomposition network jointly trained with a dedicated semantic segmentation module, allowing semantic cues to enhance the decomposition of reflectance and shading. The semantic module provides guidance during training but is removed during inference, improving performance without increasing the inference cost. Additionally, to capture the global contextual dependencies critical for intrinsic decomposition, we adopt a Transformer-based backbone. The proposed backbone enables the model to associate distant regions with similar material properties, thereby maintaining consistency in reflectance and learning smooth illumination patterns across a scene. A convolutional decoder is also designed to output predictions with improved details. Experiments demonstrate that our approach achieves state-of-the-art performance in the quantitative evaluations on the Intrinsic Images in the Wild (IIW) and Shading Annotations in the wild (SAW) datasets.

Original languageEnglish
Pages (from-to)543-559
Number of pages17
JournalVirtual Reality and Intelligent Hardware
Volume7
Issue number6
DOIs
Publication statusPublished - Dec 2025
Externally publishedYes

Keywords

  • Augmented reality
  • Detachable decoder
  • Intrinsic image decomposition
  • Joint learning
  • Semantic segmentation

Fingerprint

Dive into the research topics of 'Learning intrinsic decomposition with semantic information fusion based on transformer'. Together they form a unique fingerprint.

Cite this