SA-FlowNet: Event-based self-attention optical flow estimation with spiking-analogue neural networks

Fan Yang; Li Su; Jinxiu Zhao; Xuena Chen; Xiangyu Wang; Na Jiang; Quan Hu

doi:10.1049/cvi2.12206

SA-FlowNet: Event-based self-attention optical flow estimation with spiking-analogue neural networks

Fan Yang, Li Su^*, Jinxiu Zhao, Xuena Chen, Xiangyu Wang, Na Jiang, Quan Hu

^*此作品的通讯作者

宇航学院

Capital Normal University

科研成果: 期刊稿件 › 文章 › 同行评审

4 引用（Scopus）

摘要

Inspired by biological vision mechanism, event-based cameras have been developed to capture continuous object motion and detect brightness changes independently and asynchronously, which overcome the limitations of traditional frame-based cameras. Complementarily, spiking neural networks (SNNs) offer asynchronous computations and exploit the inherent sparseness of spatio-temporal events. Notably, event-based pixel-wise optical flow estimations calculate the positions and relationships of objects in adjacent frames; however, as event camera outputs are sparse and uneven, dense scene information is difficult to generate and the local receptive fields of the neural network also lead to poor moving objects tracking. To address these issues, an improved event-based self-attention optical flow estimation network (SA-FlowNet) that independently uses criss-cross and temporal self-attention mechanisms, directly capturing long-range dependencies and efficiently extracting the temporal and spatial features from the event streams is proposed. In the former mechanism, a cross-domain attention scheme dynamically fusing the temporal-spatial features is introduced. The proposed network adopts a spiking-analogue neural network architecture using an end-to-end learning method and gains significant computational energy benefits especially for SNNs. The state-of-the-art results of the error rate for optical flow prediction on the Multi-Vehicle Stereo Event Camera (MVSEC) dataset compared with the current SNN-based approaches is demonstrated.

源语言	英语
页（从-至）	925-935
页数	11
期刊	IET Computer Vision
卷	17
期	8
DOI	https://doi.org/10.1049/cvi2.12206
出版状态	已出版 - 12月 2023

访问文件

10.1049/cvi2.12206

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{a423ddd08c6c46ffbc9e5b7297d6985e,

title = "SA-FlowNet: Event-based self-attention optical flow estimation with spiking-analogue neural networks",

abstract = "Inspired by biological vision mechanism, event-based cameras have been developed to capture continuous object motion and detect brightness changes independently and asynchronously, which overcome the limitations of traditional frame-based cameras. Complementarily, spiking neural networks (SNNs) offer asynchronous computations and exploit the inherent sparseness of spatio-temporal events. Notably, event-based pixel-wise optical flow estimations calculate the positions and relationships of objects in adjacent frames; however, as event camera outputs are sparse and uneven, dense scene information is difficult to generate and the local receptive fields of the neural network also lead to poor moving objects tracking. To address these issues, an improved event-based self-attention optical flow estimation network (SA-FlowNet) that independently uses criss-cross and temporal self-attention mechanisms, directly capturing long-range dependencies and efficiently extracting the temporal and spatial features from the event streams is proposed. In the former mechanism, a cross-domain attention scheme dynamically fusing the temporal-spatial features is introduced. The proposed network adopts a spiking-analogue neural network architecture using an end-to-end learning method and gains significant computational energy benefits especially for SNNs. The state-of-the-art results of the error rate for optical flow prediction on the Multi-Vehicle Stereo Event Camera (MVSEC) dataset compared with the current SNN-based approaches is demonstrated.",

keywords = "computer vision, feature extraction, motion estimation, optical tracking",

author = "Fan Yang and Li Su and Jinxiu Zhao and Xuena Chen and Xiangyu Wang and Na Jiang and Quan Hu",

note = "Publisher Copyright: {\textcopyright} 2023 The Authors. IET Computer Vision published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology.",

year = "2023",

month = dec,

doi = "10.1049/cvi2.12206",

language = "English",

volume = "17",

pages = "925--935",

journal = "IET Computer Vision",

issn = "1751-9632",

publisher = "John Wiley & Sons Inc.",

number = "8",

}

TY - JOUR

T1 - SA-FlowNet

T2 - Event-based self-attention optical flow estimation with spiking-analogue neural networks

AU - Yang, Fan

AU - Su, Li

AU - Zhao, Jinxiu

AU - Chen, Xuena

AU - Wang, Xiangyu

AU - Jiang, Na

AU - Hu, Quan

PY - 2023/12

Y1 - 2023/12

N2 - Inspired by biological vision mechanism, event-based cameras have been developed to capture continuous object motion and detect brightness changes independently and asynchronously, which overcome the limitations of traditional frame-based cameras. Complementarily, spiking neural networks (SNNs) offer asynchronous computations and exploit the inherent sparseness of spatio-temporal events. Notably, event-based pixel-wise optical flow estimations calculate the positions and relationships of objects in adjacent frames; however, as event camera outputs are sparse and uneven, dense scene information is difficult to generate and the local receptive fields of the neural network also lead to poor moving objects tracking. To address these issues, an improved event-based self-attention optical flow estimation network (SA-FlowNet) that independently uses criss-cross and temporal self-attention mechanisms, directly capturing long-range dependencies and efficiently extracting the temporal and spatial features from the event streams is proposed. In the former mechanism, a cross-domain attention scheme dynamically fusing the temporal-spatial features is introduced. The proposed network adopts a spiking-analogue neural network architecture using an end-to-end learning method and gains significant computational energy benefits especially for SNNs. The state-of-the-art results of the error rate for optical flow prediction on the Multi-Vehicle Stereo Event Camera (MVSEC) dataset compared with the current SNN-based approaches is demonstrated.

AB - Inspired by biological vision mechanism, event-based cameras have been developed to capture continuous object motion and detect brightness changes independently and asynchronously, which overcome the limitations of traditional frame-based cameras. Complementarily, spiking neural networks (SNNs) offer asynchronous computations and exploit the inherent sparseness of spatio-temporal events. Notably, event-based pixel-wise optical flow estimations calculate the positions and relationships of objects in adjacent frames; however, as event camera outputs are sparse and uneven, dense scene information is difficult to generate and the local receptive fields of the neural network also lead to poor moving objects tracking. To address these issues, an improved event-based self-attention optical flow estimation network (SA-FlowNet) that independently uses criss-cross and temporal self-attention mechanisms, directly capturing long-range dependencies and efficiently extracting the temporal and spatial features from the event streams is proposed. In the former mechanism, a cross-domain attention scheme dynamically fusing the temporal-spatial features is introduced. The proposed network adopts a spiking-analogue neural network architecture using an end-to-end learning method and gains significant computational energy benefits especially for SNNs. The state-of-the-art results of the error rate for optical flow prediction on the Multi-Vehicle Stereo Event Camera (MVSEC) dataset compared with the current SNN-based approaches is demonstrated.

KW - computer vision

KW - feature extraction

KW - motion estimation

KW - optical tracking

UR - http://www.scopus.com/inward/record.url?scp=85159311234&partnerID=8YFLogxK

U2 - 10.1049/cvi2.12206

DO - 10.1049/cvi2.12206

M3 - Article

AN - SCOPUS:85159311234

SN - 1751-9632

VL - 17

SP - 925

EP - 935

JO - IET Computer Vision

JF - IET Computer Vision

IS - 8

ER -

SA-FlowNet: Event-based self-attention optical flow estimation with spiking-analogue neural networks

摘要

访问文件

其它文件与链接

指纹

引用此