Abstract
Integrating temporal multi-scale bird’s eye view (BEV) features, a novel multi-task perception network was proposed to solve the problems of insufficient temporal feature fusion and the difficulty in reliably perceiving occluded or distant targets. Firstly, modeling the depth prediction probability, a module with occlusion adaptability was established to estimate visible depth, map the image features into BEV features and carry out supervision based on the depth maps. Afterwards, in order to improve the effectiveness of long-distance obstacle detection, a temporal BEV sampling module was designed based on deformable attention mechanism to make multi-scale BEV feature weighted fusion in time sequence. Finally, expanding data augmentation strategies to multi tasks, 3D object detection and lane line segmentation were achieved according to corresponding task heads separately. The results from nuScenes dataset and real-vehicle experiment show that this solution can improve the accuracy in detecting occluded areas and distant targets, and the inference speed can meet the requirements of real-world applications.
| Translated title of the contribution | Multi Task Perception Network Based on Multi-Scale Temporal Sampling |
|---|---|
| Original language | Chinese (Traditional) |
| Pages (from-to) | 789-797 |
| Number of pages | 9 |
| Journal | Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology |
| Volume | 45 |
| Issue number | 8 |
| DOIs | |
| Publication status | Published - Aug 2025 |
Fingerprint
Dive into the research topics of 'Multi Task Perception Network Based on Multi-Scale Temporal Sampling'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver