Self-Supervised Monocular Depth Estimation for Dynamic Targets

Jing Zhao, Liquan Dong*, Haojie Liu, Chengwei Lv, Rujia Zhang, Lingqin Kong, Ming Liu

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Monocular depth estimation is one of the classic problems in the field of computer vision. It is widely used in fields such as 3D scene reconstruction and augmented reality. In this paper, a self-supervised monocular depth estimation model for dynamic targets is designed to solve the problems of current unsupervised monocular depth estimation in moving scenes. To solve the problem of insufficient accuracy of depth estimation in dynamic areas, a method of strengthening the fusion of local dynamic area features with a cross-attention mechanism is proposed, which refines the overall depth structure, expands the receptive field and enhances the representation ability of target area features. At the same time, a depth prior, namely pseudo-depth, is generated. The problem of blurred edges of dynamic targets is solved by matching the surface normals of the predicted depth and pseudo-depth and by constraining the relative normal angles of the two depth maps around the edge of the dynamic area to be consistent. Two channel attention modules are also designed to effectively integrate semantic information from different scales, so as to more fully integrate features of different scales and improve the overall modeling effect. We conduct experiments on the KITTI and DDAD datasets. The experimental results show that the proposed method outperforms the mainstream monocular depth estimation methods, especially in dynamic areas, showing better depth estimation performance. The absolute relative error and square relative error of the proposed method are reduced by up to 30% and 70% respectively compared with the baseline network.

Original languageEnglish
Title of host publicationTenth Symposium on Novel Optoelectronic Detection Technology and Applications
EditorsChen Ping
PublisherSPIE
ISBN (Electronic)9781510688148
DOIs
Publication statusPublished - 2025
Event10th Symposium on Novel Optoelectronic Detection Technology and Applications - Taiyuan, China
Duration: 1 Nov 20243 Nov 2024

Publication series

NameProceedings of SPIE - The International Society for Optical Engineering
Volume13511
ISSN (Print)0277-786X
ISSN (Electronic)1996-756X

Conference

Conference10th Symposium on Novel Optoelectronic Detection Technology and Applications
Country/TerritoryChina
CityTaiyuan
Period1/11/243/11/24

Keywords

  • Dynamic Region Feature Enhancement
  • Monocular Depth Estimation
  • Self-Supervised Learning

Fingerprint

Dive into the research topics of 'Self-Supervised Monocular Depth Estimation for Dynamic Targets'. Together they form a unique fingerprint.

Cite this