TY - GEN
T1 - Learning weighted video segments for temporal action localization
AU - Sun, Che
AU - Song, Hao
AU - Wu, Xinxiao
AU - Jia, Yunde
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2019.
PY - 2019
Y1 - 2019
N2 - This paper proposes a novel approach of learning weighted video segments via supervised temporal attention for action localization in untrimmed videos. The learned segment weights represent informativeness of video segments to recognize actions and benefit inferring the boundaries to temporally localize actions. We build a Supervised Temporal Attention Network (STAN) to dynamically learn the weights of video segments, and generate descriptive and discriminative video representations. We use a proposal generator and a classifier to estimate the boundaries of actions and classify the classes of actions, respectively. Extensive experiments are conducted on two public benchmarks THUMOS2014 and ActivityNet1.3. The results demonstrate that our approach achieves substantially better performance than the state-of-the-art methods, verifying the effectiveness of learning weighted video segments.
AB - This paper proposes a novel approach of learning weighted video segments via supervised temporal attention for action localization in untrimmed videos. The learned segment weights represent informativeness of video segments to recognize actions and benefit inferring the boundaries to temporally localize actions. We build a Supervised Temporal Attention Network (STAN) to dynamically learn the weights of video segments, and generate descriptive and discriminative video representations. We use a proposal generator and a classifier to estimate the boundaries of actions and classify the classes of actions, respectively. Extensive experiments are conducted on two public benchmarks THUMOS2014 and ActivityNet1.3. The results demonstrate that our approach achieves substantially better performance than the state-of-the-art methods, verifying the effectiveness of learning weighted video segments.
KW - Attention mechanism
KW - Temporal action localization
KW - Weighted video segments
UR - http://www.scopus.com/inward/record.url?scp=85086145329&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-31654-9_16
DO - 10.1007/978-3-030-31654-9_16
M3 - Conference contribution
AN - SCOPUS:85086145329
SN - 9783030316532
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 181
EP - 192
BT - Pattern Recognition and Computer Vision- 2nd Chinese Conference, PRCV 2019, Proceedings, Part I
A2 - Lin, Zhouchen
A2 - Wang, Liang
A2 - Tan, Tieniu
A2 - Yang, Jian
A2 - Shi, Guangming
A2 - Zheng, Nanning
A2 - Chen, Xilin
A2 - Zhang, Yanning
PB - Springer
T2 - 2nd Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2019
Y2 - 8 November 2019 through 11 November 2019
ER -