Hierarchical Matching and Reasoning for Action Localization via Language Query

Tianyu Li, Xinxiao Wu*

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

This paper strives for temporal localization of actions in untrimmed videos via natural language queries. Prevailing methods represent both query sentence and video as a whole and perform sentence-video matching via global features, which neglects local correspondence between sentence and video. In this work, we aim to move beyond this limitation by delving into the fine-grained local sentence-video matching, such as phrase-motion matching and word-object matching. We propose a hierarchical matching and reasoning method based on deep conditional random field to integrate hierarchical matching between visual concepts and textual semantics for temporal action localization via query sentence. Our method decomposes each sentence into textual semantics (i.e., phrases and words), obtains multi-level matching results between the textual semantics and the visual concepts in a video (i.e., results of phrase-motion matching and word-object matching), and then reasons relations between multi-level matching via pairwise potentials of conditional random field to achieve coherence in hierarchical matching. By minimizing the overall potential, the final matching score between a sentence and a video is computed as the conditional probability of the conditional random field. Our proposed method is evaluated on public Charades-STA dataset and the experimental results verify its superiority over the state-of-the-art methods.

源语言英语
主期刊名Pattern Recognition and Computer Vision - 3rd Chinese Conference, PRCV 2020, Proceedings
编辑Yuxin Peng, Hongbin Zha, Qingshan Liu, Huchuan Lu, Zhenan Sun, Chenglin Liu, Xilin Chen, Jian Yang
出版商Springer Science and Business Media Deutschland GmbH
137-148
页数12
ISBN(印刷版)9783030606350
DOI
出版状态已出版 - 2020
活动3rd Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2020 - Nanjing, 中国
期限: 16 10月 202018 10月 2020

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
12307 LNCS
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议3rd Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2020
国家/地区中国
Nanjing
时期16/10/2018/10/20

指纹

探究 'Hierarchical Matching and Reasoning for Action Localization via Language Query' 的科研主题。它们共同构成独一无二的指纹。

引用此

Li, T., & Wu, X. (2020). Hierarchical Matching and Reasoning for Action Localization via Language Query. 在 Y. Peng, H. Zha, Q. Liu, H. Lu, Z. Sun, C. Liu, X. Chen, & J. Yang (编辑), Pattern Recognition and Computer Vision - 3rd Chinese Conference, PRCV 2020, Proceedings (页码 137-148). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 12307 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-60636-7_12