TY - JOUR
T1 - SGV3D
T2 - Toward Scenario Generalization for Vision-Based Roadside 3D Object Detection
AU - Yang, Lei
AU - Zhang, Xinyu
AU - Li, Jun
AU - Wang, Li
AU - Zhang, Chuang
AU - Ju, Li
AU - Li, Zhiwei
AU - Shen, Yang
AU - Lv, Chen
AU - Wang, Hong
N1 - Publisher Copyright:
© IEEE. 2000-2011 IEEE.
PY - 2025
Y1 - 2025
N2 - Roadside perception can significantly enhance the safety of autonomous vehicles by extending their perceptual capabilities beyond the visual range and addressing occluded regions. However, current state-of-the-art vision-based roadside detection methods exhibit high accuracy on labeled scenes but perform poorly on new scenes. This limitation arises because roadside cameras remain stationary after installation and can only gather data from a single scene, leading the algorithm to overfit these roadside backgrounds and camera positions. To tackle this issue, we propose an innovative Scenario Generalization Framework for Vision-based Roadside 3D Object Detection, called SGV3D. Specifically, we utilize a Background-suppressed Module (BSM) to reduce background overfitting in vision-centric pipelines by diminishing background features during the 2D to bird's-eye-view projection. Furthermore, by introducing the Semi-supervised Data Generation Pipeline (SSDG) that employs unlabeled images from new scenes, we generate diverse foreground instances with varying camera poses, mitigating the risk of overfitting to specific camera positions. Experiments conducted on two large-scale roadside benchmarks demonstrate that SGV3D, with only a minimal increase in latency, effectively improves the scenario generalization capabilities of vision-based roadside 3D object detectors. The code is available here (https://github.com/yanglei18/SGV3D).
AB - Roadside perception can significantly enhance the safety of autonomous vehicles by extending their perceptual capabilities beyond the visual range and addressing occluded regions. However, current state-of-the-art vision-based roadside detection methods exhibit high accuracy on labeled scenes but perform poorly on new scenes. This limitation arises because roadside cameras remain stationary after installation and can only gather data from a single scene, leading the algorithm to overfit these roadside backgrounds and camera positions. To tackle this issue, we propose an innovative Scenario Generalization Framework for Vision-based Roadside 3D Object Detection, called SGV3D. Specifically, we utilize a Background-suppressed Module (BSM) to reduce background overfitting in vision-centric pipelines by diminishing background features during the 2D to bird's-eye-view projection. Furthermore, by introducing the Semi-supervised Data Generation Pipeline (SSDG) that employs unlabeled images from new scenes, we generate diverse foreground instances with varying camera poses, mitigating the risk of overfitting to specific camera positions. Experiments conducted on two large-scale roadside benchmarks demonstrate that SGV3D, with only a minimal increase in latency, effectively improves the scenario generalization capabilities of vision-based roadside 3D object detectors. The code is available here (https://github.com/yanglei18/SGV3D).
KW - Scenario generalization
KW - autonomous driving
KW - roadside perception
KW - vision-based 3D object detection
UR - https://www.scopus.com/pages/publications/105007289061
U2 - 10.1109/TITS.2025.3569399
DO - 10.1109/TITS.2025.3569399
M3 - Article
AN - SCOPUS:105007289061
SN - 1524-9050
VL - 26
SP - 11782
EP - 11793
JO - IEEE Transactions on Intelligent Transportation Systems
JF - IEEE Transactions on Intelligent Transportation Systems
IS - 8
ER -