Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection

Shihao Wang; Yingfei Liu; Tiancai Wang; Ying Li; Xiangyu Zhang

doi:10.1109/ICCV51070.2023.00335

Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection

Shihao Wang, Yingfei Liu, Tiancai Wang^*, Ying Li, Xiangyu Zhang

^*Corresponding author for this work

School of Mechanical Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

25 Citations (Scopus)

Abstract

In this paper, we propose a long-sequence modeling framework, named StreamPETR, for multi-view 3D object detection. Built upon the sparse query design in the PETR series, we systematically develop an object-centric temporal mechanism. The model is performed in an online manner and the long-term historical information is propagated through object queries frame by frame. Besides, we introduce a motion-aware layer normalization to model the movement of the objects. StreamPETR achieves significant performance improvements only with negligible computation cost, compared to the single-frame baseline. On the standard nuScenes benchmark, it is the first online multi-view method that achieves comparable performance (67.6% NDS & 65.3% AMOTA) with lidar-based methods. The lightweight version realizes 45.0% mAP and 31.7 FPS, outperforming the state-of-the-art method (SOLOFusion) by 2.3% mAP and 1.8× faster FPS. Code has been available at https://github.com/exiawsh/StreamPETR.git.

Original language	English
Title of host publication	Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	3598-3608
Number of pages	11
ISBN (Electronic)	9798350307184
DOIs	https://doi.org/10.1109/ICCV51070.2023.00335
Publication status	Published - 2023
Event	2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023 - Paris, France Duration: 2 Oct 2023 → 6 Oct 2023

Publication series

Name	Proceedings of the IEEE International Conference on Computer Vision
ISSN (Print)	1550-5499

Conference

Conference	2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023
Country/Territory	France
City	Paris
Period	2/10/23 → 6/10/23

Access to Document

10.1109/ICCV51070.2023.00335

Cite this

Wang, S., Liu, Y., Wang, T., Li, Y., & Zhang, X. (2023). Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection. In Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023 (pp. 3598-3608). (Proceedings of the IEEE International Conference on Computer Vision). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICCV51070.2023.00335

Wang, Shihao ; Liu, Yingfei ; Wang, Tiancai et al. / Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection. Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023. Institute of Electrical and Electronics Engineers Inc., 2023. pp. 3598-3608 (Proceedings of the IEEE International Conference on Computer Vision).

@inproceedings{5de15b85aa4145ee9caf6cbe7e20295d,

title = "Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection",

abstract = "In this paper, we propose a long-sequence modeling framework, named StreamPETR, for multi-view 3D object detection. Built upon the sparse query design in the PETR series, we systematically develop an object-centric temporal mechanism. The model is performed in an online manner and the long-term historical information is propagated through object queries frame by frame. Besides, we introduce a motion-aware layer normalization to model the movement of the objects. StreamPETR achieves significant performance improvements only with negligible computation cost, compared to the single-frame baseline. On the standard nuScenes benchmark, it is the first online multi-view method that achieves comparable performance (67.6% NDS & 65.3% AMOTA) with lidar-based methods. The lightweight version realizes 45.0% mAP and 31.7 FPS, outperforming the state-of-the-art method (SOLOFusion) by 2.3% mAP and 1.8× faster FPS. Code has been available at https://github.com/exiawsh/StreamPETR.git.",

author = "Shihao Wang and Yingfei Liu and Tiancai Wang and Ying Li and Xiangyu Zhang",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023 ; Conference date: 02-10-2023 Through 06-10-2023",

year = "2023",

doi = "10.1109/ICCV51070.2023.00335",

language = "English",

series = "Proceedings of the IEEE International Conference on Computer Vision",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "3598--3608",

booktitle = "Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023",

address = "United States",

}

Wang, S, Liu, Y, Wang, T, Li, Y & Zhang, X 2023, Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection. in Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023. Proceedings of the IEEE International Conference on Computer Vision, Institute of Electrical and Electronics Engineers Inc., pp. 3598-3608, 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, France, 2/10/23. https://doi.org/10.1109/ICCV51070.2023.00335

Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection. / Wang, Shihao; Liu, Yingfei; Wang, Tiancai et al.
Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023. Institute of Electrical and Electronics Engineers Inc., 2023. p. 3598-3608 (Proceedings of the IEEE International Conference on Computer Vision).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection

AU - Wang, Shihao

AU - Liu, Yingfei

AU - Wang, Tiancai

AU - Li, Ying

AU - Zhang, Xiangyu

PY - 2023

Y1 - 2023

N2 - In this paper, we propose a long-sequence modeling framework, named StreamPETR, for multi-view 3D object detection. Built upon the sparse query design in the PETR series, we systematically develop an object-centric temporal mechanism. The model is performed in an online manner and the long-term historical information is propagated through object queries frame by frame. Besides, we introduce a motion-aware layer normalization to model the movement of the objects. StreamPETR achieves significant performance improvements only with negligible computation cost, compared to the single-frame baseline. On the standard nuScenes benchmark, it is the first online multi-view method that achieves comparable performance (67.6% NDS & 65.3% AMOTA) with lidar-based methods. The lightweight version realizes 45.0% mAP and 31.7 FPS, outperforming the state-of-the-art method (SOLOFusion) by 2.3% mAP and 1.8× faster FPS. Code has been available at https://github.com/exiawsh/StreamPETR.git.

AB - In this paper, we propose a long-sequence modeling framework, named StreamPETR, for multi-view 3D object detection. Built upon the sparse query design in the PETR series, we systematically develop an object-centric temporal mechanism. The model is performed in an online manner and the long-term historical information is propagated through object queries frame by frame. Besides, we introduce a motion-aware layer normalization to model the movement of the objects. StreamPETR achieves significant performance improvements only with negligible computation cost, compared to the single-frame baseline. On the standard nuScenes benchmark, it is the first online multi-view method that achieves comparable performance (67.6% NDS & 65.3% AMOTA) with lidar-based methods. The lightweight version realizes 45.0% mAP and 31.7 FPS, outperforming the state-of-the-art method (SOLOFusion) by 2.3% mAP and 1.8× faster FPS. Code has been available at https://github.com/exiawsh/StreamPETR.git.

UR - http://www.scopus.com/inward/record.url?scp=85179238749&partnerID=8YFLogxK

U2 - 10.1109/ICCV51070.2023.00335

DO - 10.1109/ICCV51070.2023.00335

M3 - Conference contribution

AN - SCOPUS:85179238749

T3 - Proceedings of the IEEE International Conference on Computer Vision

SP - 3598

EP - 3608

BT - Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023

Y2 - 2 October 2023 through 6 October 2023

ER -

Wang S, Liu Y, Wang T, Li Y, Zhang X. Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection. In Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023. Institute of Electrical and Electronics Engineers Inc. 2023. p. 3598-3608. (Proceedings of the IEEE International Conference on Computer Vision). doi: 10.1109/ICCV51070.2023.00335

Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this