Temporal Action Localization in the Deep Learning Era: A Survey

Binglu Wang; Yongqiang Zhao; Le Yang; Teng Long; Xuelong Li

doi:10.1109/TPAMI.2023.3330794

Temporal Action Localization in the Deep Learning Era: A Survey

Binglu Wang, Yongqiang Zhao, Le Yang^*, Teng Long, Xuelong Li^*

^*Corresponding author for this work

School of Information and Electronics

Research output: Contribution to journal › Article › peer-review

16 Citations (Scopus)

Abstract

The temporal action localization research aims to discover action instances from untrimmed videos, representing a fundamental step in the field of intelligent video understanding. With the advent of deep learning, backbone networks have been instrumental in providing representative spatiotemporal features, while the end-to-end learning paradigm has enabled the development of high-quality models through data-driven training. Both supervised and weakly supervised learning approaches have contributed to the rapid progress of temporal action localization, resulting in a multitude of methods and a large body of literature, making a comprehensive survey a pressing necessity. This paper presents a thorough analysis of existing action localization works, offering a well-organized taxonomy that highlights the strengths and weaknesses of each strategy. In the realm of supervised learning, in addition to the anchor mechanism, we introduce a novel classification mechanism to categorize and summarize existing works. Similarly, for weakly supervised learning, we extend the traditional pre-classification and post-classification mechanisms by providing a fresh perspective on enhancement strategies. Furthermore, we shed light on the bottleneck of confidence estimation, a critical yet overlooked aspect of current works. By conducting detailed analyses, this survey serves as a valuable resource for researchers, providing beneficial guidance to newcomers and inspiring seasoned researchers alike.

Original language	English
Pages (from-to)	2171-2190
Number of pages	20
Journal	IEEE Transactions on Pattern Analysis and Machine Intelligence
Volume	46
Issue number	4
DOIs	https://doi.org/10.1109/TPAMI.2023.3330794
Publication status	Published - 1 Apr 2024

Keywords

Deep learning
supervised learning
survey
temporal action localization
weakly supervised learning

Access to Document

10.1109/TPAMI.2023.3330794

Cite this

Wang, B., Zhao, Y., Yang, L., Long, T., & Li, X. (2024). Temporal Action Localization in the Deep Learning Era: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(4), 2171-2190. https://doi.org/10.1109/TPAMI.2023.3330794

@article{685375c8ebc74a4482ee00e993aad678,

title = "Temporal Action Localization in the Deep Learning Era: A Survey",

abstract = "The temporal action localization research aims to discover action instances from untrimmed videos, representing a fundamental step in the field of intelligent video understanding. With the advent of deep learning, backbone networks have been instrumental in providing representative spatiotemporal features, while the end-to-end learning paradigm has enabled the development of high-quality models through data-driven training. Both supervised and weakly supervised learning approaches have contributed to the rapid progress of temporal action localization, resulting in a multitude of methods and a large body of literature, making a comprehensive survey a pressing necessity. This paper presents a thorough analysis of existing action localization works, offering a well-organized taxonomy that highlights the strengths and weaknesses of each strategy. In the realm of supervised learning, in addition to the anchor mechanism, we introduce a novel classification mechanism to categorize and summarize existing works. Similarly, for weakly supervised learning, we extend the traditional pre-classification and post-classification mechanisms by providing a fresh perspective on enhancement strategies. Furthermore, we shed light on the bottleneck of confidence estimation, a critical yet overlooked aspect of current works. By conducting detailed analyses, this survey serves as a valuable resource for researchers, providing beneficial guidance to newcomers and inspiring seasoned researchers alike.",

keywords = "Deep learning, supervised learning, survey, temporal action localization, weakly supervised learning",

author = "Binglu Wang and Yongqiang Zhao and Le Yang and Teng Long and Xuelong Li",

note = "Publisher Copyright: {\textcopyright} 1979-2012 IEEE.",

year = "2024",

month = apr,

day = "1",

doi = "10.1109/TPAMI.2023.3330794",

language = "English",

volume = "46",

pages = "2171--2190",

journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",

issn = "0162-8828",

publisher = "IEEE Computer Society",

number = "4",

}

TY - JOUR

T1 - Temporal Action Localization in the Deep Learning Era

T2 - A Survey

AU - Wang, Binglu

AU - Zhao, Yongqiang

AU - Yang, Le

AU - Long, Teng

AU - Li, Xuelong

PY - 2024/4/1

Y1 - 2024/4/1

N2 - The temporal action localization research aims to discover action instances from untrimmed videos, representing a fundamental step in the field of intelligent video understanding. With the advent of deep learning, backbone networks have been instrumental in providing representative spatiotemporal features, while the end-to-end learning paradigm has enabled the development of high-quality models through data-driven training. Both supervised and weakly supervised learning approaches have contributed to the rapid progress of temporal action localization, resulting in a multitude of methods and a large body of literature, making a comprehensive survey a pressing necessity. This paper presents a thorough analysis of existing action localization works, offering a well-organized taxonomy that highlights the strengths and weaknesses of each strategy. In the realm of supervised learning, in addition to the anchor mechanism, we introduce a novel classification mechanism to categorize and summarize existing works. Similarly, for weakly supervised learning, we extend the traditional pre-classification and post-classification mechanisms by providing a fresh perspective on enhancement strategies. Furthermore, we shed light on the bottleneck of confidence estimation, a critical yet overlooked aspect of current works. By conducting detailed analyses, this survey serves as a valuable resource for researchers, providing beneficial guidance to newcomers and inspiring seasoned researchers alike.

AB - The temporal action localization research aims to discover action instances from untrimmed videos, representing a fundamental step in the field of intelligent video understanding. With the advent of deep learning, backbone networks have been instrumental in providing representative spatiotemporal features, while the end-to-end learning paradigm has enabled the development of high-quality models through data-driven training. Both supervised and weakly supervised learning approaches have contributed to the rapid progress of temporal action localization, resulting in a multitude of methods and a large body of literature, making a comprehensive survey a pressing necessity. This paper presents a thorough analysis of existing action localization works, offering a well-organized taxonomy that highlights the strengths and weaknesses of each strategy. In the realm of supervised learning, in addition to the anchor mechanism, we introduce a novel classification mechanism to categorize and summarize existing works. Similarly, for weakly supervised learning, we extend the traditional pre-classification and post-classification mechanisms by providing a fresh perspective on enhancement strategies. Furthermore, we shed light on the bottleneck of confidence estimation, a critical yet overlooked aspect of current works. By conducting detailed analyses, this survey serves as a valuable resource for researchers, providing beneficial guidance to newcomers and inspiring seasoned researchers alike.

KW - Deep learning

KW - supervised learning

KW - survey

KW - temporal action localization

KW - weakly supervised learning

UR - http://www.scopus.com/inward/record.url?scp=85177063602&partnerID=8YFLogxK

U2 - 10.1109/TPAMI.2023.3330794

DO - 10.1109/TPAMI.2023.3330794

M3 - Article

C2 - 37930912

AN - SCOPUS:85177063602

SN - 0162-8828

VL - 46

SP - 2171

EP - 2190

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

IS - 4

ER -

Temporal Action Localization in the Deep Learning Era: A Survey

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this