Abstract
Recently, the performance improvement of BEV visual detection task has benefited from the extensive use of deformable attention. Deformable attention can easily transfer the features of the image space to the BEV space through the cross-attention mechanism. Compared with the global attention mechanism, when the feature resolution of the graph is larger, the computational consumption of deformable attention will be much smaller, so it can support larger Bird's-Eye-View (BEV) feature resolution. However, there are also shortcomings such as a small receptive field and insufficient information exchange. We propose a deformable attention mechanism for dynamic reference points. This module is to accumulate the reference points of each cross-attention layer on the basis of the previous layer, thereby effectively expanding the perceptual field of BEV features for querying in the image space. Extensive experiments on the nuScenes benchmark demonstrate the effectiveness of our method.
| Original language | English |
|---|---|
| Pages (from-to) | 886-890 |
| Number of pages | 5 |
| Journal | IEEE Journal of Radio Frequency Identification |
| Volume | 6 |
| DOIs | |
| Publication status | Published - 2022 |
Keywords
- autonomous driving
- birdâs-eye-view
- Dynamic deformable attention
- visual detection