TY - JOUR
T1 - Asynchronous Spatio-Temporal Memory Network for Continuous Event-Based Object Detection
AU - Li, Jianing
AU - Li, Jia
AU - Zhu, Lin
AU - Xiang, Xijie
AU - Huang, Tiejun
AU - Tian, Yonghong
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Event cameras, offering extremely high temporal resolution and high dynamic range, have brought a new perspective to addressing common object detection challenges (e.g., motion blur and low light). However, how to learn a better spatio-temporal representation and exploit rich temporal cues from asynchronous events for object detection still remains an open issue. To address this problem, we propose a novel asynchronous spatio-temporal memory network (ASTMNet) that directly consumes asynchronous events instead of event images prior to processing, which can well detect objects in a continuous manner. Technically, ASTMNet learns an asynchronous attention embedding from the continuous event stream by adopting an adaptive temporal sampling strategy and a temporal attention convolutional module. Besides, a spatio-temporal memory module is designed to exploit rich temporal cues via a lightweight yet efficient inter-weaved recurrent-convolutional architecture. Empirically, it shows that our approach outperforms the state-of-the-art methods using the feed-forward frame-based detectors on three datasets by a large margin (i.e., 7.6% in the KITTI Simulated Dataset, 10.8% in the Gen1 Automotive Dataset, and 10.5% in the 1Mpx Detection Dataset). The results demonstrate that event cameras can perform robust object detection even in cases where conventional cameras fail, e.g., fast motion and challenging light conditions.
AB - Event cameras, offering extremely high temporal resolution and high dynamic range, have brought a new perspective to addressing common object detection challenges (e.g., motion blur and low light). However, how to learn a better spatio-temporal representation and exploit rich temporal cues from asynchronous events for object detection still remains an open issue. To address this problem, we propose a novel asynchronous spatio-temporal memory network (ASTMNet) that directly consumes asynchronous events instead of event images prior to processing, which can well detect objects in a continuous manner. Technically, ASTMNet learns an asynchronous attention embedding from the continuous event stream by adopting an adaptive temporal sampling strategy and a temporal attention convolutional module. Besides, a spatio-temporal memory module is designed to exploit rich temporal cues via a lightweight yet efficient inter-weaved recurrent-convolutional architecture. Empirically, it shows that our approach outperforms the state-of-the-art methods using the feed-forward frame-based detectors on three datasets by a large margin (i.e., 7.6% in the KITTI Simulated Dataset, 10.8% in the Gen1 Automotive Dataset, and 10.5% in the 1Mpx Detection Dataset). The results demonstrate that event cameras can perform robust object detection even in cases where conventional cameras fail, e.g., fast motion and challenging light conditions.
KW - Object detection
KW - deep neural networks
KW - event cameras
KW - event-based vision
KW - neuromorphic engineering
UR - http://www.scopus.com/inward/record.url?scp=85127774383&partnerID=8YFLogxK
U2 - 10.1109/TIP.2022.3162962
DO - 10.1109/TIP.2022.3162962
M3 - Article
C2 - 35377848
AN - SCOPUS:85127774383
SN - 1057-7149
VL - 31
SP - 2975
EP - 2987
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
ER -