Temporal Action Localization in the Deep Learning Era: A Survey

Binglu Wang, Yongqiang Zhao, Le Yang*, Teng Long, Xuelong Li*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)

Abstract

The temporal action localization research aims to discover action instances from untrimmed videos, representing a fundamental step in the field of intelligent video understanding. With the advent of deep learning, backbone networks have been instrumental in providing representative spatiotemporal features, while the end-to-end learning paradigm has enabled the development of high-quality models through data-driven training. Both supervised and weakly supervised learning approaches have contributed to the rapid progress of temporal action localization, resulting in a multitude of methods and a large body of literature, making a comprehensive survey a pressing necessity. This paper presents a thorough analysis of existing action localization works, offering a well-organized taxonomy that highlights the strengths and weaknesses of each strategy. In the realm of supervised learning, in addition to the anchor mechanism, we introduce a novel classification mechanism to categorize and summarize existing works. Similarly, for weakly supervised learning, we extend the traditional pre-classification and post-classification mechanisms by providing a fresh perspective on enhancement strategies. Furthermore, we shed light on the bottleneck of confidence estimation, a critical yet overlooked aspect of current works. By conducting detailed analyses, this survey serves as a valuable resource for researchers, providing beneficial guidance to newcomers and inspiring seasoned researchers alike.

Original languageEnglish
Pages (from-to)2171-2190
Number of pages20
JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
Volume46
Issue number4
DOIs
Publication statusPublished - 1 Apr 2024

Keywords

  • Deep learning
  • supervised learning
  • survey
  • temporal action localization
  • weakly supervised learning

Fingerprint

Dive into the research topics of 'Temporal Action Localization in the Deep Learning Era: A Survey'. Together they form a unique fingerprint.

Cite this