TY - GEN
T1 - Monocular Depth Estimation with Enhanced Target Awareness for UAV Environment Perception
AU - Zhang, Leyi
AU - Sun, Jian
AU - Zhang, Yanjun
AU - Li, Zhuo
N1 - Publisher Copyright:
© 2025 Technical Committee on Control Theory, Chinese Association of Automation.
PY - 2025
Y1 - 2025
N2 - Depth perception has draw a lot of attention recently as it enables 3D sensing capabilities for robots. Monocular depth estimation offers cost-effective and computationally efficient solutions, making it an ideal choice for deployment on resource-constrained platforms such as unmanned ariel vehicles (UAVs). However, existing work primarily focuses on indoor reconstruction and ground-based autonomous driving - methods that are not directly applicable to aerial scenarios. Moreover, during UAV flight, particular emphasis is placed on specific entities like targets and obstacles, whereas current methods typically perform global depth estimation and thus fail to satisfy this requirement. To tackle these challenges, this work presents a novel approach that jointly learns outdoor depth and target-region information. Target prior knowledge is injected into multi-scale encoder features through a novel mask-guided feature fusion strategy, sharpening object boundaries without adding extra constraints. A target-prioritized loss amplifies the model's attention on regions of interest. Extensive experiments on the DDOS dataset demonstrate that our method achieves state-of-the-art performance.
AB - Depth perception has draw a lot of attention recently as it enables 3D sensing capabilities for robots. Monocular depth estimation offers cost-effective and computationally efficient solutions, making it an ideal choice for deployment on resource-constrained platforms such as unmanned ariel vehicles (UAVs). However, existing work primarily focuses on indoor reconstruction and ground-based autonomous driving - methods that are not directly applicable to aerial scenarios. Moreover, during UAV flight, particular emphasis is placed on specific entities like targets and obstacles, whereas current methods typically perform global depth estimation and thus fail to satisfy this requirement. To tackle these challenges, this work presents a novel approach that jointly learns outdoor depth and target-region information. Target prior knowledge is injected into multi-scale encoder features through a novel mask-guided feature fusion strategy, sharpening object boundaries without adding extra constraints. A target-prioritized loss amplifies the model's attention on regions of interest. Extensive experiments on the DDOS dataset demonstrate that our method achieves state-of-the-art performance.
KW - Mask-guided Feature Fusion
KW - Monocular Depth Estimation
KW - Target Regions
KW - Unmanned Aerial Vehicles (UAVs)
UR - https://www.scopus.com/pages/publications/105020292685
U2 - 10.23919/CCC64809.2025.11179450
DO - 10.23919/CCC64809.2025.11179450
M3 - Conference contribution
AN - SCOPUS:105020292685
T3 - Chinese Control Conference, CCC
SP - 8314
EP - 8320
BT - Proceedings of the 44th Chinese Control Conference, CCC 2025
A2 - Sun, Jian
A2 - Yin, Hongpeng
PB - IEEE Computer Society
T2 - 44th Chinese Control Conference, CCC 2025
Y2 - 28 July 2025 through 30 July 2025
ER -