Attention Guided Relation Detection Approach for Video Visual Relation Detection

Qianwen Cao, Heyan Huang*

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

4 引用 (Scopus)

摘要

Video Visual Relation Detection (VidVRD) aims at detecting the relation instances between two observed objects in the form of < subject-predicate-object >. Unlike image visual relation detection, due to the introduction of the time dimensions, the various predicates and spatial-temporal locations are both required to be tackled, making the task challenging. To balance these challenges, most existing works perform this task in two phases: first predicting relationships in segmented clips to capture the motions, and then associating them into the relation instances with proper locations in videos. These works detect different relationships by collecting the cues from multi-aspects, but treat them equally without distinction. Furthermore, due to the dynamic scenes and drifting problem in object tracking, the rigid spatial overlap used to determine the association in previous works is insufficient, which leads to missing associations. To address the problems, in this paper, we propose a novel attention guided relation detection approach for VidVRD. In order to model the distinction among different cues and strengthen the salient characteristics, we assign these cues the attention weights for relationship prediction and association decision-making. In addition, to comprehensively measure whether merging the relationships, we put forward a customized network to take both visual appearance and geometric location into account. Extensive experiment results on ImageNet-VidVRD dataset and VidOR dataset demonstrate the effectiveness of our proposed approach. And abundant ablation studies verify the component designed in the approach is essential.

源语言英语
页(从-至)3896-3907
页数12
期刊IEEE Transactions on Multimedia
24
DOI
出版状态已出版 - 2022

指纹

探究 'Attention Guided Relation Detection Approach for Video Visual Relation Detection' 的科研主题。它们共同构成独一无二的指纹。

引用此