Dynamic Pathway for Query-Aware Feature Learning in Language-Driven Action Localization

Shuo Yang, Xinxiao Wu*, Zirui Shang, Jiebo Luo

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

1 引用 (Scopus)

摘要

Language-driven action localization aims to search a video segment in an untrimmed video, which is semantically relevant to an input language query. This task is challenging since language queries describe diverse actions with different motion characteristics and semantic granularities. Some actions, such as 'the person takes off their shoes, and goes to the door', are characterized by complex motion relationships, while others, such as 'a person is standing holding a mirror in one hand', are distinguished by salient body postures. In this paper, we propose a dynamic pathway between an exploitation module and an exploration module for query-aware feature learning to handle the diversity of actions. The exploitation module works in a coarse-to-fine manner, first learns the feature of general motion relationships to search the coarse segment of the target action and then learns the feature of subtle motion changes to predict the refined action boundaries. The exploration module functions in a point-to-area diffusion fashion, first learns the feature of sub-action pattern to search the salient postures of the target action and then learns the feature of temporal dependency to expand the posture frames to the action segment. The exploitation module and the exploration module are dynamically and adaptively selected to learn comprehensive representations of diverse actions to improve the action localization accuracy. Extensive experiments on the Charades-STA and TACoS datasets demonstrate that our method performs better than existing methods.

源语言英语
页(从-至)7451-7461
页数11
期刊IEEE Transactions on Multimedia
26
DOI
出版状态已出版 - 2024

指纹

探究 'Dynamic Pathway for Query-Aware Feature Learning in Language-Driven Action Localization' 的科研主题。它们共同构成独一无二的指纹。

引用此