3D Contextual Transformer & Double Inception Network for Human Action Recognition

Enqi Liu, Kaoru Hirota, Chang Liu, Yaping Dai*

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

The 3D Contextual Transformer & Double Inception Network called CoTDIL-Net is proposed for human action recognition. The spatio-temporal enrichment module based on a 3D Contextual Transformer (CoT3D) is proposed for enhancing the features of adjacent frames. In addition, 3D Inception and 2D Inception are combined to form the feature extractor called DIFE for capturing short-term contextual features. Moreover, the LSTM is used to obtain long-term action change features, and a multi-stream input framework is introduced to obtain fuller contextual information. It aims to obtain multi-scale spatio-temporal features compared with single convolution methods, where CoT3D combines contextual action information, the DIFE captures short-term features while LSTM fuses long-term features. The experiments are carried out on a laptop with 32G RAM and a GeForce RTX3070 8G GPU by using the KTH dataset, and the results show a recognition accuracy of 97.2%. The obtained results indicate that the proposed CoTDIL-Net promote the convolutional structure understanding of human actions changes.

源语言英语
主期刊名Proceedings of the 35th Chinese Control and Decision Conference, CCDC 2023
出版商Institute of Electrical and Electronics Engineers Inc.
1795-1800
页数6
ISBN(电子版)9798350334722
DOI
出版状态已出版 - 2023
活动35th Chinese Control and Decision Conference, CCDC 2023 - Yichang, 中国
期限: 20 5月 202322 5月 2023

出版系列

姓名Proceedings of the 35th Chinese Control and Decision Conference, CCDC 2023

会议

会议35th Chinese Control and Decision Conference, CCDC 2023
国家/地区中国
Yichang
时期20/05/2322/05/23

指纹

探究 '3D Contextual Transformer & Double Inception Network for Human Action Recognition' 的科研主题。它们共同构成独一无二的指纹。

引用此