TY - GEN
T1 - Adaptive Semantic Fusion Framework for Unsupervised Monocular Depth Estimation
AU - Li, Ruoqi
AU - Yu, Huimin
AU - Du, Kaiyang
AU - Xiao, Zhuoling
AU - Yan, Bo
AU - Yuan, Zhengxi
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Unsupervised monocular depth estimation plays an important role in autonomous driving, and has been received considerable research attention in recent years. Nevertheless, numerous existing methods relying on photometric consistency are excessively susceptible to variations in illumination and suffer in the regions with strong reflection. To overcome this limitation, we propose a novel unsupervised depth estimation framework named ColorDepth, which forces the model to explore object semantic to infer depth. Specifically, we extract pixel-level semantic prior clues of objects using the semantic segmentation network. These priors and the original image are then adaptively fused into color data by a learnable parameter for depth estimation. The incorporation of semantics endows our model with the ability to perceive scene structure information. The fused data effectively alleviates the depth ambiguity within the same semantic block, leading to improved consistency and robustness in challenging scenarios. Extensive experiments on the KITTI and Make3D datasets show that our method surpasses the previous state-of-the-art methods even those supervised by additional constraints, and brings significant performance improvement particularly in the regions of high reflection.
AB - Unsupervised monocular depth estimation plays an important role in autonomous driving, and has been received considerable research attention in recent years. Nevertheless, numerous existing methods relying on photometric consistency are excessively susceptible to variations in illumination and suffer in the regions with strong reflection. To overcome this limitation, we propose a novel unsupervised depth estimation framework named ColorDepth, which forces the model to explore object semantic to infer depth. Specifically, we extract pixel-level semantic prior clues of objects using the semantic segmentation network. These priors and the original image are then adaptively fused into color data by a learnable parameter for depth estimation. The incorporation of semantics endows our model with the ability to perceive scene structure information. The fused data effectively alleviates the depth ambiguity within the same semantic block, leading to improved consistency and robustness in challenging scenarios. Extensive experiments on the KITTI and Make3D datasets show that our method surpasses the previous state-of-the-art methods even those supervised by additional constraints, and brings significant performance improvement particularly in the regions of high reflection.
KW - Adaptive semantic fusion model
KW - High-reflective regions
KW - Monocular unsupervised depth estimation
UR - http://www.scopus.com/inward/record.url?scp=86000374573&partnerID=8YFLogxK
U2 - 10.1109/ICASSP49357.2023.10096716
DO - 10.1109/ICASSP49357.2023.10096716
M3 - Conference contribution
AN - SCOPUS:86000374573
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
BT - ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023
Y2 - 4 June 2023 through 10 June 2023
ER -