TY - GEN
T1 - A Temporal Action Detection Model Based on Deep Reinforcement Learning
AU - Han, Zhaojia
AU - Li, Kan
AU - Qu, Shaojie
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Existing methods for temporal action detection typically follow a two-stage approach, generating numerous action proposals at various scales on untrimmed long videos. These proposals are then used with a pre-trained action classifier to estimate the action type and accuracy for each proposal. This method not only generates a large number of irrelevant proposals, leading to significant computational overhead, but also deviates from the human perception process. This paper solves the temporal action detection problem from a human cognition perspective, considering it as a process of observing and refining proposals to locate temporal actions. The paper introduces a deep reinforcement learning-based Temporal Action Detection model which leverages deep learning networks to understand the temporal information in video sequences. Through reinforcement learning, the agent learns strategies to change the position of proposals, aiming to locate different temporal actions. The Action Buffer module records the actions performed by the proposals, the Proposal Regression Network refines the position deviation between the predicted results and labels, ensuring more accurate results. Correct proposals are stored in the Proposal Buffer module to avoid redundant predictions. Experimental results on the THUMOS'14 dataset confirm the accuracy of the model, achieving excellent results in metrics such as AP values and recall.
AB - Existing methods for temporal action detection typically follow a two-stage approach, generating numerous action proposals at various scales on untrimmed long videos. These proposals are then used with a pre-trained action classifier to estimate the action type and accuracy for each proposal. This method not only generates a large number of irrelevant proposals, leading to significant computational overhead, but also deviates from the human perception process. This paper solves the temporal action detection problem from a human cognition perspective, considering it as a process of observing and refining proposals to locate temporal actions. The paper introduces a deep reinforcement learning-based Temporal Action Detection model which leverages deep learning networks to understand the temporal information in video sequences. Through reinforcement learning, the agent learns strategies to change the position of proposals, aiming to locate different temporal actions. The Action Buffer module records the actions performed by the proposals, the Proposal Regression Network refines the position deviation between the predicted results and labels, ensuring more accurate results. Correct proposals are stored in the Proposal Buffer module to avoid redundant predictions. Experimental results on the THUMOS'14 dataset confirm the accuracy of the model, achieving excellent results in metrics such as AP values and recall.
KW - deep learning
KW - reinforcement learning
KW - temporal action detection
UR - http://www.scopus.com/inward/record.url?scp=85186125799&partnerID=8YFLogxK
U2 - 10.1109/ICSAI61474.2023.10423353
DO - 10.1109/ICSAI61474.2023.10423353
M3 - Conference contribution
AN - SCOPUS:85186125799
T3 - ICSAI 2023 - 9th International Conference on Systems and Informatics
BT - ICSAI 2023 - 9th International Conference on Systems and Informatics
A2 - Yao, Shaowen
A2 - He, Zhenli
A2 - Xiao, Zheng
A2 - Tu, Wanqing
A2 - Tu, Wanqing
A2 - Li, Kenli
A2 - Wang, Lipo
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 9th International Conference on Systems and Informatics, ICSAI 2023
Y2 - 16 December 2023 through 18 December 2023
ER -