TY - GEN
T1 - Leveraging Temporal Contexts to Enhance Vehicle-Infrastructure Cooperative Perception
AU - Zhong, Jiaru
AU - Yu, Haibao
AU - Zhu, Tianyi
AU - Xu, Jiahui
AU - Yang, Wenxian
AU - Nie, Zaiqing
AU - Sun, Chao
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Infrastructure sensors installed at elevated positions offer a broader perception range and encounter fewer occlusions. Integrating both infrastructure and ego-vehicle data through V2X communication, known as vehicle-infrastructure cooperation, has shown considerable advantages in enhancing perception capabilities and addressing corner cases encountered in single-vehicle autonomous driving. However, cooperative perception still faces numerous challenges, including limited communication bandwidth and practical communication interruptions. In this paper, we propose CTCE, a novel framework for cooperative 3D object detection. This framework transmits queries with temporal contexts enhancement, effectively balancing transmission efficiency and performance to accommodate real-world communication conditions. Additionally, we propose a temporal-guided fusion module to further improve performance. The roadside temporal enhancement and vehicle-side spatial-temporal fusion together constitute a multi-level temporal contexts integration mechanism, fully leveraging temporal information to enhance performance. Furthermore, a motion-aware reconstruction module is introduced to recover lost road-side queries due to communication interruptions. Experimental results on V2X-Seq and V2X-Sim datasets demonstrate that CTCE outperforms the baseline QUEST, achieving improvements of 3.8% and 1.3% in mAP, respectively. Experiments under communication interruption conditions validate CTCE's robustness to communication interruptions.
AB - Infrastructure sensors installed at elevated positions offer a broader perception range and encounter fewer occlusions. Integrating both infrastructure and ego-vehicle data through V2X communication, known as vehicle-infrastructure cooperation, has shown considerable advantages in enhancing perception capabilities and addressing corner cases encountered in single-vehicle autonomous driving. However, cooperative perception still faces numerous challenges, including limited communication bandwidth and practical communication interruptions. In this paper, we propose CTCE, a novel framework for cooperative 3D object detection. This framework transmits queries with temporal contexts enhancement, effectively balancing transmission efficiency and performance to accommodate real-world communication conditions. Additionally, we propose a temporal-guided fusion module to further improve performance. The roadside temporal enhancement and vehicle-side spatial-temporal fusion together constitute a multi-level temporal contexts integration mechanism, fully leveraging temporal information to enhance performance. Furthermore, a motion-aware reconstruction module is introduced to recover lost road-side queries due to communication interruptions. Experimental results on V2X-Seq and V2X-Sim datasets demonstrate that CTCE outperforms the baseline QUEST, achieving improvements of 3.8% and 1.3% in mAP, respectively. Experiments under communication interruption conditions validate CTCE's robustness to communication interruptions.
KW - 3D Object Detection
KW - Autonomous Driving
KW - Cooperative Perception
KW - Temporal
KW - Transformer and Query
UR - http://www.scopus.com/inward/record.url?scp=105001672310&partnerID=8YFLogxK
U2 - 10.1109/ITSC58415.2024.10920140
DO - 10.1109/ITSC58415.2024.10920140
M3 - Conference contribution
AN - SCOPUS:105001672310
T3 - IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC
SP - 915
EP - 922
BT - 2024 IEEE 27th International Conference on Intelligent Transportation Systems, ITSC 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 27th IEEE International Conference on Intelligent Transportation Systems, ITSC 2024
Y2 - 24 September 2024 through 27 September 2024
ER -