Self-Supervised Monocular Depth Estimation for All-Day Images Based on Dual-Axis Transformer

Shengyu Hou, Mengyin Fu, Rongchuan Wang, Yi Yang, Wenjie Song*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

All-day self-supervised monocular depth estimation has strong practical significance for autonomous systems to continuously perceive the 3D information of the world. However, night-time scenes pose challenges of weak texture and violating the brightness consistency assumption due to low illumination and varying lighting, respectively, which easily leads to most existing self-supervised models only being able to handle day-time scenes. To address this problem, we propose a self-supervised monocular depth estimation unified framework that can handle all-day scenarios, which has three features: 1) an Illumination Compensation PoseNet (ICP) is designed, which is based on the classic Phong illumination theory and compensates for lighting changes in adjacent frames by estimating per-pixel transformations; 2) a Dual-Axis Transformer (DAT) block is proposed as the backbone network of the depth encoder, which infers the depth of local low-illumination areas through spatial-channel dual-dimensional global context information of night-time images; 3) a cross-layer Adaptive Fusion Module (AFM) is introduced between multiple DAT blocks, which learns attention weights between different layer features and adaptively fuses cross-layer features using the learned weights, enhancing the complementarity of different layer features. This work was evaluated on multiple datasets, including: RobotCar, Waymo and KITTI datasets, achieving state-of-the-art results in both day-time and night-time scenarios.

Original languageEnglish
Pages (from-to)9939-9953
Number of pages15
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume34
Issue number10
DOIs
Publication statusPublished - 2024

Keywords

  • Monocular depth estimation
  • multi-task learning
  • transformer network
  • unsupervised estimation

Fingerprint

Dive into the research topics of 'Self-Supervised Monocular Depth Estimation for All-Day Images Based on Dual-Axis Transformer'. Together they form a unique fingerprint.

Cite this