A survey of video human action recognition based on deep learning

Chun Yan Bi; Yue Liu

doi:10.11996/JG.j.2095-302X.2023040625

A survey of video human action recognition based on deep learning

Chun Yan Bi, Yue Liu^*

^*此作品的通讯作者

光电学院

科研成果: 期刊稿件 › 文献综述 › 同行评审

1 引用（Scopus）

摘要

With the rapid advancement of network multimedia technology and the continuous improvement of video capture equipment, an increasing number of videos are shared on network platforms, gradually becoming an integral part of human life. Consequently, video understanding has become one of the hot spots of computer vision research, with video understanding being a pivotal task. At present, 2D image recognition classification methods based on deep learning have made significant strides. However, video action recognition still faces a formidable challenge. The reason is that videos differ from 2D images by an additional temporal dimension, and that understanding actions such as walking, running, high jumping, and long jumping in videos requires not only the spatial semantic information that 2D images possess but also temporal information. Therefore, effectively utilizing the temporal information of videos is critical for action recognition. This paper firstly introduced the research background and development process of action recognition, followed by an analysis of the current challenges in video action recognition. The methods of temporal modeling and parameter optimization were then presented in detail, along with an examination of the commonly used action recognition datasets and metric parameters. Finally, the paper outlined the future research directions in this field.

源语言	英语
页（从-至）	625-639
页数	15
期刊	Journal of Graphics
卷	44
期	4
DOI	https://doi.org/10.11996/JG.j.2095-302X.2023040625
出版状态	已出版 - 31 8月 2023

访问文件

10.11996/JG.j.2095-302X.2023040625

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{f6ba157729ad4c4285d690de0338a6ea,

title = "A survey of video human action recognition based on deep learning",

abstract = "With the rapid advancement of network multimedia technology and the continuous improvement of video capture equipment, an increasing number of videos are shared on network platforms, gradually becoming an integral part of human life. Consequently, video understanding has become one of the hot spots of computer vision research, with video understanding being a pivotal task. At present, 2D image recognition classification methods based on deep learning have made significant strides. However, video action recognition still faces a formidable challenge. The reason is that videos differ from 2D images by an additional temporal dimension, and that understanding actions such as walking, running, high jumping, and long jumping in videos requires not only the spatial semantic information that 2D images possess but also temporal information. Therefore, effectively utilizing the temporal information of videos is critical for action recognition. This paper firstly introduced the research background and development process of action recognition, followed by an analysis of the current challenges in video action recognition. The methods of temporal modeling and parameter optimization were then presented in detail, along with an examination of the commonly used action recognition datasets and metric parameters. Finally, the paper outlined the future research directions in this field.",

keywords = "Action recognition, Computer vision, Convolutional neural network, Deep learning, Video understanding",

author = "Bi, {Chun Yan} and Yue Liu",

year = "2023",

month = aug,

day = "31",

doi = "10.11996/JG.j.2095-302X.2023040625",

language = "English",

volume = "44",

pages = "625--639",

journal = "Journal of Graphics",

issn = "2095-302X",

publisher = "Editorial of Board of Journal of Graphics",

number = "4",

}

TY - JOUR

T1 - A survey of video human action recognition based on deep learning

AU - Bi, Chun Yan

AU - Liu, Yue

PY - 2023/8/31

Y1 - 2023/8/31

N2 - With the rapid advancement of network multimedia technology and the continuous improvement of video capture equipment, an increasing number of videos are shared on network platforms, gradually becoming an integral part of human life. Consequently, video understanding has become one of the hot spots of computer vision research, with video understanding being a pivotal task. At present, 2D image recognition classification methods based on deep learning have made significant strides. However, video action recognition still faces a formidable challenge. The reason is that videos differ from 2D images by an additional temporal dimension, and that understanding actions such as walking, running, high jumping, and long jumping in videos requires not only the spatial semantic information that 2D images possess but also temporal information. Therefore, effectively utilizing the temporal information of videos is critical for action recognition. This paper firstly introduced the research background and development process of action recognition, followed by an analysis of the current challenges in video action recognition. The methods of temporal modeling and parameter optimization were then presented in detail, along with an examination of the commonly used action recognition datasets and metric parameters. Finally, the paper outlined the future research directions in this field.

AB - With the rapid advancement of network multimedia technology and the continuous improvement of video capture equipment, an increasing number of videos are shared on network platforms, gradually becoming an integral part of human life. Consequently, video understanding has become one of the hot spots of computer vision research, with video understanding being a pivotal task. At present, 2D image recognition classification methods based on deep learning have made significant strides. However, video action recognition still faces a formidable challenge. The reason is that videos differ from 2D images by an additional temporal dimension, and that understanding actions such as walking, running, high jumping, and long jumping in videos requires not only the spatial semantic information that 2D images possess but also temporal information. Therefore, effectively utilizing the temporal information of videos is critical for action recognition. This paper firstly introduced the research background and development process of action recognition, followed by an analysis of the current challenges in video action recognition. The methods of temporal modeling and parameter optimization were then presented in detail, along with an examination of the commonly used action recognition datasets and metric parameters. Finally, the paper outlined the future research directions in this field.

KW - Action recognition

KW - Computer vision

KW - Convolutional neural network

KW - Deep learning

KW - Video understanding

UR - http://www.scopus.com/inward/record.url?scp=85172115606&partnerID=8YFLogxK

U2 - 10.11996/JG.j.2095-302X.2023040625

DO - 10.11996/JG.j.2095-302X.2023040625

M3 - Review article

AN - SCOPUS:85172115606

SN - 2095-302X

VL - 44

SP - 625

EP - 639

JO - Journal of Graphics

JF - Journal of Graphics

IS - 4

ER -

A survey of video human action recognition based on deep learning

摘要

访问文件

其它文件与链接

指纹

引用此