Dynamic Pathway for Query-Aware Feature Learning in Language-Driven Action Localization

Shuo Yang, Xinxiao Wu*, Zirui Shang, Jiebo Luo

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Language-driven action localization aims to search a video segment in an untrimmed video, which is semantically relevant to an input language query. This task is challenging since language queries describe diverse actions with different motion characteristics and semantic granularities. Some actions, such as 'the person takes off their shoes, and goes to the door', are characterized by complex motion relationships, while others, such as 'a person is standing holding a mirror in one hand', are distinguished by salient body postures. In this paper, we propose a dynamic pathway between an exploitation module and an exploration module for query-aware feature learning to handle the diversity of actions. The exploitation module works in a coarse-to-fine manner, first learns the feature of general motion relationships to search the coarse segment of the target action and then learns the feature of subtle motion changes to predict the refined action boundaries. The exploration module functions in a point-to-area diffusion fashion, first learns the feature of sub-action pattern to search the salient postures of the target action and then learns the feature of temporal dependency to expand the posture frames to the action segment. The exploitation module and the exploration module are dynamically and adaptively selected to learn comprehensive representations of diverse actions to improve the action localization accuracy. Extensive experiments on the Charades-STA and TACoS datasets demonstrate that our method performs better than existing methods.

Original languageEnglish
Pages (from-to)7451-7461
Number of pages11
JournalIEEE Transactions on Multimedia
Volume26
DOIs
Publication statusPublished - 2024

Keywords

  • Dynamic pathway
  • exploitation
  • exploration
  • language-driven action localization
  • video grounding
  • video moment retrieval

Fingerprint

Dive into the research topics of 'Dynamic Pathway for Query-Aware Feature Learning in Language-Driven Action Localization'. Together they form a unique fingerprint.

Cite this