Dual-Scale Attention Networks for Efficient Monocular Depth Estimation

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper proposes an innovative self-supervised monocular depth estimation algorithm-Dual-Scale Attention Module (DSAM). This method combines the advantages of Convolutional Neural Networks (CNNs) and Transformers by adapting the CNN architecture and introducing a spatial-channel synergistic attention mechanism (UniSA) for multi-scale feature processing, significantly improving the accuracy and robustness of depth estimation. Specifically, the CNN adaptation enhances local feature extraction and expands the receptive field by stacking depth-separable dilated convolutions with different dilation rates. Compared to existing self-supervised monocular depth estimation methods, DSAM demonstrates stronger adaptability in complex scenes and dynamic objects, achieving significant progress in capturing fine-grained depth variations and handling abrupt depth changes. Using a self-supervised learning framework, our method does not rely on manually labeled depth data and shows excellent performance across multiple datasets. Experimental results show that DSAM outperforms existing methods on several key metrics, especially with significant performance improvements on the KITTI dataset. The contributions of this paper lie in proposing a new dual-scale attention mechanism, a self-supervised depth estimation framework, and adapting the CNN architecture, providing innovative solutions for feature extraction, feature fusion, and global context modeling in depth estimation tasks.

Original languageEnglish
Title of host publicationProceedings of the 44th Chinese Control Conference, CCC 2025
EditorsJian Sun, Hongpeng Yin
PublisherIEEE Computer Society
Pages9187-9192
Number of pages6
ISBN (Electronic)9789887581611
DOIs
Publication statusPublished - 2025
Externally publishedYes
Event44th Chinese Control Conference, CCC 2025 - Chongqing, China
Duration: 28 Jul 202530 Jul 2025

Publication series

NameChinese Control Conference, CCC
ISSN (Print)1934-1768
ISSN (Electronic)2161-2927

Conference

Conference44th Chinese Control Conference, CCC 2025
Country/TerritoryChina
CityChongqing
Period28/07/2530/07/25

Keywords

  • Convolutional Neural Networks (CNN)
  • Dual-Scale Attention
  • Monocular depth
  • Self-supervised Learning

Fingerprint

Dive into the research topics of 'Dual-Scale Attention Networks for Efficient Monocular Depth Estimation'. Together they form a unique fingerprint.

Cite this