基 于 二 维 图 像 和 三 维 几 何 约 束 神 经 网 络 的 单 目室 内 深 度 估 计 方 法

Hao Sha; Yue Liu; Yongtian Wang; Chenguang Lu; Mengze Zhao

doi:10.3788/AOS202242.1911001

基于二维图像和三维几何约束神经网络的单目室内深度估计方法

Translated title of the contribution: Monocular Indoor Depth Estimation Method Based on Neural Networks with Constraints on Two-Dimensional Images and Three-Dimensional Geometry

Hao Sha, Yue Liu^*, Yongtian Wang, Chenguang Lu, Mengze Zhao

^*Corresponding author for this work

School of Optics and Photonics

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

6 Citations (Scopus)

Abstract

This paper proposes a deep convolutional neural network with an encoder-to-decoder structure and constrains the network's in-depth learning from the monocular image at both two-dimensional (2D) and three-dimensional (3D) levels. At the 2D image level, an attention mechanism of channels is introduced to connect encoder features with decoder features with weights at the same scale, so as to balance the shallow detail features and deep semantic features extracted by the network. In addition, a scale-invariant loss and a multi-scale edge loss based on image pyramids are designed to obtain a depth map with rich edge detail information. At the 3D geometric level, a global geometric constraint loss and a local geometric constraint loss of depth are designed based on the local and global geometric relationships of coordinate points in space, in a bid to enhance the geometric consistency between point clouds. Furthermore, the results obtained through the proposed method are quantitatively and qualitatively compared with that obtained through other methods from the NYU Depth-v2 dataset, and it is shown that the proposed method can estimate indoor scene depth with higher accuracy and detail representation, obtaining accurate and smooth 3D reconstruction results on a single image.

Translated title of the contribution	Monocular Indoor Depth Estimation Method Based on Neural Networks with Constraints on Two-Dimensional Images and Three-Dimensional Geometry
Original language	Chinese (Traditional)
Article number	1911001
Journal	Guangxue Xuebao/Acta Optica Sinica
Volume	42
Issue number	19
DOIs	https://doi.org/10.3788/AOS202242.1911001
Publication status	Published - Oct 2022

Access to Document

10.3788/AOS202242.1911001

Cite this

Sha, H., Liu, Y., Wang, Y., Lu, C., & Zhao, M. (2022). 基于二维图像和三维几何约束神经网络的单目室内深度估计方法. Guangxue Xuebao/Acta Optica Sinica, 42(19), Article 1911001. https://doi.org/10.3788/AOS202242.1911001

@article{77191c429e004f8ca4a8f744cee2980c,

title = "基于二维图像和三维几何约束神经网络的单目室内深度估计方法",

abstract = "This paper proposes a deep convolutional neural network with an encoder-to-decoder structure and constrains the network's in-depth learning from the monocular image at both two-dimensional (2D) and three-dimensional (3D) levels. At the 2D image level, an attention mechanism of channels is introduced to connect encoder features with decoder features with weights at the same scale, so as to balance the shallow detail features and deep semantic features extracted by the network. In addition, a scale-invariant loss and a multi-scale edge loss based on image pyramids are designed to obtain a depth map with rich edge detail information. At the 3D geometric level, a global geometric constraint loss and a local geometric constraint loss of depth are designed based on the local and global geometric relationships of coordinate points in space, in a bid to enhance the geometric consistency between point clouds. Furthermore, the results obtained through the proposed method are quantitatively and qualitatively compared with that obtained through other methods from the NYU Depth-v2 dataset, and it is shown that the proposed method can estimate indoor scene depth with higher accuracy and detail representation, obtaining accurate and smooth 3D reconstruction results on a single image.",

keywords = "convolutional neural network, depth estimation, geometric constraint, imaging systems, monocular three-dimensional reconstruction",

author = "Hao Sha and Yue Liu and Yongtian Wang and Chenguang Lu and Mengze Zhao",

year = "2022",

month = oct,

doi = "10.3788/AOS202242.1911001",

language = "繁体中文",

volume = "42",

journal = "Guangxue Xuebao/Acta Optica Sinica",

issn = "0253-2239",

publisher = "Chinese Optical Society",

number = "19",

}

TY - JOUR

T1 - 基于二维图像和三维几何约束神经网络的单目室内深度估计方法

AU - Sha, Hao

AU - Liu, Yue

AU - Wang, Yongtian

AU - Lu, Chenguang

AU - Zhao, Mengze

PY - 2022/10

Y1 - 2022/10

N2 - This paper proposes a deep convolutional neural network with an encoder-to-decoder structure and constrains the network's in-depth learning from the monocular image at both two-dimensional (2D) and three-dimensional (3D) levels. At the 2D image level, an attention mechanism of channels is introduced to connect encoder features with decoder features with weights at the same scale, so as to balance the shallow detail features and deep semantic features extracted by the network. In addition, a scale-invariant loss and a multi-scale edge loss based on image pyramids are designed to obtain a depth map with rich edge detail information. At the 3D geometric level, a global geometric constraint loss and a local geometric constraint loss of depth are designed based on the local and global geometric relationships of coordinate points in space, in a bid to enhance the geometric consistency between point clouds. Furthermore, the results obtained through the proposed method are quantitatively and qualitatively compared with that obtained through other methods from the NYU Depth-v2 dataset, and it is shown that the proposed method can estimate indoor scene depth with higher accuracy and detail representation, obtaining accurate and smooth 3D reconstruction results on a single image.

AB - This paper proposes a deep convolutional neural network with an encoder-to-decoder structure and constrains the network's in-depth learning from the monocular image at both two-dimensional (2D) and three-dimensional (3D) levels. At the 2D image level, an attention mechanism of channels is introduced to connect encoder features with decoder features with weights at the same scale, so as to balance the shallow detail features and deep semantic features extracted by the network. In addition, a scale-invariant loss and a multi-scale edge loss based on image pyramids are designed to obtain a depth map with rich edge detail information. At the 3D geometric level, a global geometric constraint loss and a local geometric constraint loss of depth are designed based on the local and global geometric relationships of coordinate points in space, in a bid to enhance the geometric consistency between point clouds. Furthermore, the results obtained through the proposed method are quantitatively and qualitatively compared with that obtained through other methods from the NYU Depth-v2 dataset, and it is shown that the proposed method can estimate indoor scene depth with higher accuracy and detail representation, obtaining accurate and smooth 3D reconstruction results on a single image.

KW - convolutional neural network

KW - depth estimation

KW - geometric constraint

KW - imaging systems

KW - monocular three-dimensional reconstruction

UR - http://www.scopus.com/inward/record.url?scp=85140142410&partnerID=8YFLogxK

U2 - 10.3788/AOS202242.1911001

DO - 10.3788/AOS202242.1911001

M3 - 文章

AN - SCOPUS:85140142410

SN - 0253-2239

VL - 42

JO - Guangxue Xuebao/Acta Optica Sinica

JF - Guangxue Xuebao/Acta Optica Sinica

IS - 19

M1 - 1911001

ER -

基于二维图像和三维几何约束神经网络的单目室内深度估计方法

Abstract

Access to Document

Other files and links

Fingerprint

Cite this

基 于 二 维 图 像 和 三 维 几 何 约 束 神 经 网 络 的 单 目室 内 深 度 估 计 方 法

Abstract

Access to Document

Other files and links

Fingerprint

Cite this

基于二维图像和三维几何约束神经网络的单目室内深度估计方法