Sign Language Gesture Recognition and Classification Based on Event Camera with Spiking Neural Networks

Xuena Chen; Li Su; Jinxiu Zhao; Keni Qiu; Na Jiang; Guang Zhai

doi:10.3390/electronics12040786

Sign Language Gesture Recognition and Classification Based on Event Camera with Spiking Neural Networks

Xuena Chen, Li Su^*, Jinxiu Zhao, Keni Qiu, Na Jiang, Guang Zhai

^*此作品的通讯作者

宇航学院

Capital Normal University

科研成果: 期刊稿件 › 文章 › 同行评审

18 引用（Scopus）

摘要

Sign language recognition has been utilized in human–machine interactions, improving the lives of people with speech impairments or who rely on nonverbal instructions. Thanks to its higher temporal resolution, less visual redundancy information and lower energy consumption, the use of an event camera with a new dynamic vision sensor (DVS) shows promise with regard to sign language recognition with robot perception and intelligent control. Although previous work has focused on event camera-based, simple gesture datasets, such as DVS128Gesture, event camera gesture datasets inspired by sign language are critical, which poses a great impediment to the development of event camera-based sign language recognition. An effective method to extract spatio-temporal features from event data is significantly desired. Firstly, the event-based sign language gesture datasets are proposed and the data have two sources: traditional sign language videos to event stream (DVS_Sign_v2e) and DAVIS346 (DVS_Sign). In the present dataset, data are divided into five classification, verbs, quantifiers, position, things and people, adapting to actual scenarios where robots provide instruction or assistance. Sign language classification is demonstrated in spike neuron networks with a spatio-temporal back-propagation training method, leading to the best recognition accuracy of 77%. This work paves the way for the combination of event camera-based sign language gesture recognition and robotic perception for the future intelligent systems.

源语言	英语
文章编号	786
期刊	Electronics (Switzerland)
卷	12
期	4
DOI	https://doi.org/10.3390/electronics12040786
出版状态	已出版 - 2月 2023

联合国可持续发展目标

此成果有助于实现下列可持续发展目标：

访问文件

10.3390/electronics12040786

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{5befd928c3d145438d2e43015f6e4c4d,

title = "Sign Language Gesture Recognition and Classification Based on Event Camera with Spiking Neural Networks",

abstract = "Sign language recognition has been utilized in human–machine interactions, improving the lives of people with speech impairments or who rely on nonverbal instructions. Thanks to its higher temporal resolution, less visual redundancy information and lower energy consumption, the use of an event camera with a new dynamic vision sensor (DVS) shows promise with regard to sign language recognition with robot perception and intelligent control. Although previous work has focused on event camera-based, simple gesture datasets, such as DVS128Gesture, event camera gesture datasets inspired by sign language are critical, which poses a great impediment to the development of event camera-based sign language recognition. An effective method to extract spatio-temporal features from event data is significantly desired. Firstly, the event-based sign language gesture datasets are proposed and the data have two sources: traditional sign language videos to event stream (DVS_Sign_v2e) and DAVIS346 (DVS_Sign). In the present dataset, data are divided into five classification, verbs, quantifiers, position, things and people, adapting to actual scenarios where robots provide instruction or assistance. Sign language classification is demonstrated in spike neuron networks with a spatio-temporal back-propagation training method, leading to the best recognition accuracy of 77%. This work paves the way for the combination of event camera-based sign language gesture recognition and robotic perception for the future intelligent systems.",

keywords = "DVS-sign language, event camera, intelligent system, sign language recognition, spiking neural network",

author = "Xuena Chen and Li Su and Jinxiu Zhao and Keni Qiu and Na Jiang and Guang Zhai",

note = "Publisher Copyright: {\textcopyright} 2023 by the authors.",

year = "2023",

month = feb,

doi = "10.3390/electronics12040786",

language = "English",

volume = "12",

journal = "Electronics (Switzerland)",

issn = "2079-9292",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "4",

}

TY - JOUR

T1 - Sign Language Gesture Recognition and Classification Based on Event Camera with Spiking Neural Networks

AU - Chen, Xuena

AU - Su, Li

AU - Zhao, Jinxiu

AU - Qiu, Keni

AU - Jiang, Na

AU - Zhai, Guang

PY - 2023/2

Y1 - 2023/2

N2 - Sign language recognition has been utilized in human–machine interactions, improving the lives of people with speech impairments or who rely on nonverbal instructions. Thanks to its higher temporal resolution, less visual redundancy information and lower energy consumption, the use of an event camera with a new dynamic vision sensor (DVS) shows promise with regard to sign language recognition with robot perception and intelligent control. Although previous work has focused on event camera-based, simple gesture datasets, such as DVS128Gesture, event camera gesture datasets inspired by sign language are critical, which poses a great impediment to the development of event camera-based sign language recognition. An effective method to extract spatio-temporal features from event data is significantly desired. Firstly, the event-based sign language gesture datasets are proposed and the data have two sources: traditional sign language videos to event stream (DVS_Sign_v2e) and DAVIS346 (DVS_Sign). In the present dataset, data are divided into five classification, verbs, quantifiers, position, things and people, adapting to actual scenarios where robots provide instruction or assistance. Sign language classification is demonstrated in spike neuron networks with a spatio-temporal back-propagation training method, leading to the best recognition accuracy of 77%. This work paves the way for the combination of event camera-based sign language gesture recognition and robotic perception for the future intelligent systems.

AB - Sign language recognition has been utilized in human–machine interactions, improving the lives of people with speech impairments or who rely on nonverbal instructions. Thanks to its higher temporal resolution, less visual redundancy information and lower energy consumption, the use of an event camera with a new dynamic vision sensor (DVS) shows promise with regard to sign language recognition with robot perception and intelligent control. Although previous work has focused on event camera-based, simple gesture datasets, such as DVS128Gesture, event camera gesture datasets inspired by sign language are critical, which poses a great impediment to the development of event camera-based sign language recognition. An effective method to extract spatio-temporal features from event data is significantly desired. Firstly, the event-based sign language gesture datasets are proposed and the data have two sources: traditional sign language videos to event stream (DVS_Sign_v2e) and DAVIS346 (DVS_Sign). In the present dataset, data are divided into five classification, verbs, quantifiers, position, things and people, adapting to actual scenarios where robots provide instruction or assistance. Sign language classification is demonstrated in spike neuron networks with a spatio-temporal back-propagation training method, leading to the best recognition accuracy of 77%. This work paves the way for the combination of event camera-based sign language gesture recognition and robotic perception for the future intelligent systems.

KW - DVS-sign language

KW - event camera

KW - intelligent system

KW - sign language recognition

KW - spiking neural network

UR - http://www.scopus.com/inward/record.url?scp=85149225824&partnerID=8YFLogxK

U2 - 10.3390/electronics12040786

DO - 10.3390/electronics12040786

M3 - Article

AN - SCOPUS:85149225824

SN - 2079-9292

VL - 12

JO - Electronics (Switzerland)

JF - Electronics (Switzerland)

IS - 4

M1 - 786

ER -

Sign Language Gesture Recognition and Classification Based on Event Camera with Spiking Neural Networks

摘要

联合国可持续发展目标

访问文件

其它文件与链接

指纹

引用此