TY - GEN
T1 - Learning Stereo Depth Estimation with Bio-Inspired Spike Cameras
AU - Wang, Yixuan
AU - Li, Jianing
AU - Zhu, Lin
AU - Xiang, Xijie
AU - Huang, Tiejun
AU - Tian, Yonghong
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Bio-inspired spike cameras, offering high temporal resolution spike streams, have brought a new perspective to address common challenges (e.g.,high-speed motion blur) in depth estimation tasks. In this paper, we propose a novel problem setting, spike-based stereo depth estimation, which is the first trail that explores an end-to-end network to learn stereo depth estimation with transformers for spike cameras, named Spike-based Stereo Depth Estimation Transformer (SSDEFormer). We first build a hybrid camera platform and provide a new stereo depth estimation dataset (i.e.,PKU-Spike-Stereo) with spatiotemporal synchronized labels. Then, we propose a novel spike representation to effectively exploit spatiotemporal information from spike streams. Finally, a transformer-based network is designed to generate dense depth maps without a fixed-disparity cost volume. Empirically, it shows that our approach is extremely effective on both synthetic and real-world datasets. The results verify that spike cameras can perform robust depth estimation even in cases where conventional cameras and event cameras fail in fast motion scenarios.
AB - Bio-inspired spike cameras, offering high temporal resolution spike streams, have brought a new perspective to address common challenges (e.g.,high-speed motion blur) in depth estimation tasks. In this paper, we propose a novel problem setting, spike-based stereo depth estimation, which is the first trail that explores an end-to-end network to learn stereo depth estimation with transformers for spike cameras, named Spike-based Stereo Depth Estimation Transformer (SSDEFormer). We first build a hybrid camera platform and provide a new stereo depth estimation dataset (i.e.,PKU-Spike-Stereo) with spatiotemporal synchronized labels. Then, we propose a novel spike representation to effectively exploit spatiotemporal information from spike streams. Finally, a transformer-based network is designed to generate dense depth maps without a fixed-disparity cost volume. Empirically, it shows that our approach is extremely effective on both synthetic and real-world datasets. The results verify that spike cameras can perform robust depth estimation even in cases where conventional cameras and event cameras fail in fast motion scenarios.
KW - Stereo depth estimation
KW - neuromorphic vision
KW - spike camera
KW - transformer
UR - http://www.scopus.com/inward/record.url?scp=85137188701&partnerID=8YFLogxK
U2 - 10.1109/ICME52920.2022.9859975
DO - 10.1109/ICME52920.2022.9859975
M3 - Conference contribution
AN - SCOPUS:85137188701
T3 - Proceedings - IEEE International Conference on Multimedia and Expo
BT - ICME 2022 - IEEE International Conference on Multimedia and Expo 2022, Proceedings
PB - IEEE Computer Society
T2 - 2022 IEEE International Conference on Multimedia and Expo, ICME 2022
Y2 - 18 July 2022 through 22 July 2022
ER -