Real-time stereo matching network for vehicle binocular vision based on dynamic cascade correction

Guohao He; Yong Zhai; Jianwei Gong; Yuchun Wang; Xi Zhang

doi:10.11992/tis.202111013

Real-time stereo matching network for vehicle binocular vision based on dynamic cascade correction

Guohao He, Yong Zhai, Jianwei Gong^*, Yuchun Wang, Xi Zhang

^*Corresponding author for this work

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

Abstract

Given the shortcoming of high-precision stereo matching networks based on binocular vision, such as high computing resource consumption, long operating time, and inability to be used in real-time navigation by intelligent driving systems, this study proposes a dynamic fusion stereo matching deep learning network that can meet real-time and accuracy requirements in vehicles. The network includes a global deep convolution-based attention module to com-plete feature extraction while reducing the number of network layers and parameters and optimizing 3D convolution cal-culations through dynamic cost cascade fusion, multi-scale fusion, and dynamic disparity change to accelerate the com-monly used 3D feature fusion process. The trained model is tested on KITTI Stereo 2015 dataset using onboard hardware such as the NVIDIA Jetson TX2. Experiments show that the method can achieve the same operating accuracy as the state-of-the-art method currently in the leaderboard, 3 pixels error is less than 6.58%, and the operating duration is less than 0.1 seconds per frame, meeting real-time performance requirements.

Original language	English
Pages (from-to)	1145-1153
Number of pages	9
Journal	CAAI Transactions on Intelligent Systems
Volume	17
Issue number	6
DOIs	https://doi.org/10.11992/tis.202111013
Publication status	Published - Nov 2022

Keywords

binocular vision
deep learning
disparity estimation
dynamic computation
feature fusion
on-board vision
stereo matching

Access to Document

10.11992/tis.202111013

Cite this

@article{48b338f55f2842708c9b5a57a2660c83,

title = "Real-time stereo matching network for vehicle binocular vision based on dynamic cascade correction",

abstract = "Given the shortcoming of high-precision stereo matching networks based on binocular vision, such as high computing resource consumption, long operating time, and inability to be used in real-time navigation by intelligent driving systems, this study proposes a dynamic fusion stereo matching deep learning network that can meet real-time and accuracy requirements in vehicles. The network includes a global deep convolution-based attention module to com-plete feature extraction while reducing the number of network layers and parameters and optimizing 3D convolution cal-culations through dynamic cost cascade fusion, multi-scale fusion, and dynamic disparity change to accelerate the com-monly used 3D feature fusion process. The trained model is tested on KITTI Stereo 2015 dataset using onboard hardware such as the NVIDIA Jetson TX2. Experiments show that the method can achieve the same operating accuracy as the state-of-the-art method currently in the leaderboard, 3 pixels error is less than 6.58%, and the operating duration is less than 0.1 seconds per frame, meeting real-time performance requirements.",

keywords = "binocular vision, deep learning, disparity estimation, dynamic computation, feature fusion, on-board vision, stereo matching",

author = "Guohao He and Yong Zhai and Jianwei Gong and Yuchun Wang and Xi Zhang",

year = "2022",

month = nov,

doi = "10.11992/tis.202111013",

language = "English",

volume = "17",

pages = "1145--1153",

journal = "CAAI Transactions on Intelligent Systems",

issn = "1673-4785",

publisher = "Editorial Department of CAAI Transactions on Intelligent Systems",

number = "6",

}

TY - JOUR

T1 - Real-time stereo matching network for vehicle binocular vision based on dynamic cascade correction

AU - He, Guohao

AU - Zhai, Yong

AU - Gong, Jianwei

AU - Wang, Yuchun

AU - Zhang, Xi

PY - 2022/11

Y1 - 2022/11

N2 - Given the shortcoming of high-precision stereo matching networks based on binocular vision, such as high computing resource consumption, long operating time, and inability to be used in real-time navigation by intelligent driving systems, this study proposes a dynamic fusion stereo matching deep learning network that can meet real-time and accuracy requirements in vehicles. The network includes a global deep convolution-based attention module to com-plete feature extraction while reducing the number of network layers and parameters and optimizing 3D convolution cal-culations through dynamic cost cascade fusion, multi-scale fusion, and dynamic disparity change to accelerate the com-monly used 3D feature fusion process. The trained model is tested on KITTI Stereo 2015 dataset using onboard hardware such as the NVIDIA Jetson TX2. Experiments show that the method can achieve the same operating accuracy as the state-of-the-art method currently in the leaderboard, 3 pixels error is less than 6.58%, and the operating duration is less than 0.1 seconds per frame, meeting real-time performance requirements.

AB - Given the shortcoming of high-precision stereo matching networks based on binocular vision, such as high computing resource consumption, long operating time, and inability to be used in real-time navigation by intelligent driving systems, this study proposes a dynamic fusion stereo matching deep learning network that can meet real-time and accuracy requirements in vehicles. The network includes a global deep convolution-based attention module to com-plete feature extraction while reducing the number of network layers and parameters and optimizing 3D convolution cal-culations through dynamic cost cascade fusion, multi-scale fusion, and dynamic disparity change to accelerate the com-monly used 3D feature fusion process. The trained model is tested on KITTI Stereo 2015 dataset using onboard hardware such as the NVIDIA Jetson TX2. Experiments show that the method can achieve the same operating accuracy as the state-of-the-art method currently in the leaderboard, 3 pixels error is less than 6.58%, and the operating duration is less than 0.1 seconds per frame, meeting real-time performance requirements.

KW - binocular vision

KW - deep learning

KW - disparity estimation

KW - dynamic computation

KW - feature fusion

KW - on-board vision

KW - stereo matching

UR - http://www.scopus.com/inward/record.url?scp=85168383404&partnerID=8YFLogxK

U2 - 10.11992/tis.202111013

DO - 10.11992/tis.202111013

M3 - Article

AN - SCOPUS:85168383404

SN - 1673-4785

VL - 17

SP - 1145

EP - 1153

JO - CAAI Transactions on Intelligent Systems

JF - CAAI Transactions on Intelligent Systems

IS - 6

ER -

Real-time stereo matching network for vehicle binocular vision based on dynamic cascade correction

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this