Entity-aware and Motion-aware Transformers for Language-driven Action Localization

Shuo Yang, Xinxiao Wu*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Citations (Scopus)

Abstract

Language-driven action localization in videos is a challenging task that involves not only visual-linguistic matching but also action boundary prediction. Recent progress has been achieved through aligning language query to video segments, but estimating precise boundaries is still under-explored. In this paper, we propose entity-aware and motion-aware Transformers that progressively localizes actions in videos by first coarsely locating clips with entity queries and then finely predicting exact boundaries in a shrunken temporal region with motion queries. The entity-aware Transformer incorporates the textual entities into visual representation learning via cross-modal and cross-frame attentions to facilitate attending action-related video clips. The motion-aware Transformer captures fine-grained motion changes at multiple temporal scales via integrating long short-term memory into the self-attention module to further improve the precision of action boundary prediction. Extensive experiments on the Charades-STA and TACoS datasets demonstrate that our method achieves better performance than existing methods.

Original languageEnglish
Title of host publicationProceedings of the 31st International Joint Conference on Artificial Intelligence, IJCAI 2022
EditorsLuc De Raedt, Luc De Raedt
PublisherInternational Joint Conferences on Artificial Intelligence
Pages1552-1558
Number of pages7
ISBN (Electronic)9781956792003
Publication statusPublished - 2022
Event31st International Joint Conference on Artificial Intelligence, IJCAI 2022 - Vienna, Austria
Duration: 23 Jul 202229 Jul 2022

Publication series

NameIJCAI International Joint Conference on Artificial Intelligence
ISSN (Print)1045-0823

Conference

Conference31st International Joint Conference on Artificial Intelligence, IJCAI 2022
Country/TerritoryAustria
CityVienna
Period23/07/2229/07/22

Fingerprint

Dive into the research topics of 'Entity-aware and Motion-aware Transformers for Language-driven Action Localization'. Together they form a unique fingerprint.

Cite this