TY - GEN
T1 - Structured Bird's-Eye View Road Scene Understanding from Surround Video
AU - Jia, Peng
AU - Gong, Jianwei
AU - Jiang, Yahui
AU - Wang, Yuchun
AU - Zhang, Yubo
AU - Ju, Zhiyang
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Autonomous vehicles require an accurate understanding of the surrounding road scene for navigation. One crucial task in this understanding is the bird's-eye view (BEV) road network estimation. However, accurately extracting the BEV road network around the vehicle in complex scenes, considering variations in lane curvature and shape, remains a challenge. This paper aims to accurately represent and learn the BEV road network around the vehicle for structured road scene understanding. Specifically, we propose a road network representation, i.e., representing the lane centerline as an ordered point set and the road network as a directed graph, which accurately describes lane centerline instances and lane topological relationships in complex scenes. Then, we introduce an online road network estimation framework that takes onboard surround-view video as input and utilizes hierarchical query embedding to extract the BEV road network around the vehicle. Furthermore, we present a temporal aggregation module to alleviate occlusion issues in road scenes and enhance the accuracy of road network estimation by incorporating historical frame information flexibly. Finally, we conducted extensive experiments on the nuScenes dataset to validate the effectiveness of the proposed method in structured BEV road scene understanding.
AB - Autonomous vehicles require an accurate understanding of the surrounding road scene for navigation. One crucial task in this understanding is the bird's-eye view (BEV) road network estimation. However, accurately extracting the BEV road network around the vehicle in complex scenes, considering variations in lane curvature and shape, remains a challenge. This paper aims to accurately represent and learn the BEV road network around the vehicle for structured road scene understanding. Specifically, we propose a road network representation, i.e., representing the lane centerline as an ordered point set and the road network as a directed graph, which accurately describes lane centerline instances and lane topological relationships in complex scenes. Then, we introduce an online road network estimation framework that takes onboard surround-view video as input and utilizes hierarchical query embedding to extract the BEV road network around the vehicle. Furthermore, we present a temporal aggregation module to alleviate occlusion issues in road scenes and enhance the accuracy of road network estimation by incorporating historical frame information flexibly. Finally, we conducted extensive experiments on the nuScenes dataset to validate the effectiveness of the proposed method in structured BEV road scene understanding.
UR - http://www.scopus.com/inward/record.url?scp=85199757887&partnerID=8YFLogxK
U2 - 10.1109/IV55156.2024.10588512
DO - 10.1109/IV55156.2024.10588512
M3 - Conference contribution
AN - SCOPUS:85199757887
T3 - IEEE Intelligent Vehicles Symposium, Proceedings
SP - 3173
EP - 3178
BT - 35th IEEE Intelligent Vehicles Symposium, IV 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 35th IEEE Intelligent Vehicles Symposium, IV 2024
Y2 - 2 June 2024 through 5 June 2024
ER -