TY - JOUR
T1 - Sign Language Gesture Recognition and Classification Based on Event Camera with Spiking Neural Networks
AU - Chen, Xuena
AU - Su, Li
AU - Zhao, Jinxiu
AU - Qiu, Keni
AU - Jiang, Na
AU - Zhai, Guang
N1 - Publisher Copyright:
© 2023 by the authors.
PY - 2023/2
Y1 - 2023/2
N2 - Sign language recognition has been utilized in human–machine interactions, improving the lives of people with speech impairments or who rely on nonverbal instructions. Thanks to its higher temporal resolution, less visual redundancy information and lower energy consumption, the use of an event camera with a new dynamic vision sensor (DVS) shows promise with regard to sign language recognition with robot perception and intelligent control. Although previous work has focused on event camera-based, simple gesture datasets, such as DVS128Gesture, event camera gesture datasets inspired by sign language are critical, which poses a great impediment to the development of event camera-based sign language recognition. An effective method to extract spatio-temporal features from event data is significantly desired. Firstly, the event-based sign language gesture datasets are proposed and the data have two sources: traditional sign language videos to event stream (DVS_Sign_v2e) and DAVIS346 (DVS_Sign). In the present dataset, data are divided into five classification, verbs, quantifiers, position, things and people, adapting to actual scenarios where robots provide instruction or assistance. Sign language classification is demonstrated in spike neuron networks with a spatio-temporal back-propagation training method, leading to the best recognition accuracy of 77%. This work paves the way for the combination of event camera-based sign language gesture recognition and robotic perception for the future intelligent systems.
AB - Sign language recognition has been utilized in human–machine interactions, improving the lives of people with speech impairments or who rely on nonverbal instructions. Thanks to its higher temporal resolution, less visual redundancy information and lower energy consumption, the use of an event camera with a new dynamic vision sensor (DVS) shows promise with regard to sign language recognition with robot perception and intelligent control. Although previous work has focused on event camera-based, simple gesture datasets, such as DVS128Gesture, event camera gesture datasets inspired by sign language are critical, which poses a great impediment to the development of event camera-based sign language recognition. An effective method to extract spatio-temporal features from event data is significantly desired. Firstly, the event-based sign language gesture datasets are proposed and the data have two sources: traditional sign language videos to event stream (DVS_Sign_v2e) and DAVIS346 (DVS_Sign). In the present dataset, data are divided into five classification, verbs, quantifiers, position, things and people, adapting to actual scenarios where robots provide instruction or assistance. Sign language classification is demonstrated in spike neuron networks with a spatio-temporal back-propagation training method, leading to the best recognition accuracy of 77%. This work paves the way for the combination of event camera-based sign language gesture recognition and robotic perception for the future intelligent systems.
KW - DVS-sign language
KW - event camera
KW - intelligent system
KW - sign language recognition
KW - spiking neural network
UR - http://www.scopus.com/inward/record.url?scp=85149225824&partnerID=8YFLogxK
U2 - 10.3390/electronics12040786
DO - 10.3390/electronics12040786
M3 - Article
AN - SCOPUS:85149225824
SN - 2079-9292
VL - 12
JO - Electronics (Switzerland)
JF - Electronics (Switzerland)
IS - 4
M1 - 786
ER -