Counterfactual Inference for Visual Relationship Detection in Videos

Xiaofeng Ji, Jin Chen, Xinxiao Wu*

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

1 引用 (Scopus)

摘要

Visual relationship detection in videos is a challenging task since it requires not only to detect static relationships but also to infer dynamic relationships. Recent progress has been made through enriching visual representations by appearance and motion fusion or spatial and temporal reasoning, but without exploring the intrinsic causality between representations and predictions. In this paper, we propose a novel counterfactual inference method for video relationship detection, which infers the causal effects of appearance, motion and language features on the predictions of static and dynamic relationships. Specifically, starting with building a causal graph to represent the causality between features and relationship categories, we then construct counterfactual scenes by intervening the features to infer their effects on prediction, and finally incorporate the inferred effects into the relationship categorization by adaptively learning the weights of appearance, motion and language. Extensive experiments on two benchmark datasets demonstrate the effectiveness of our method.

源语言英语
主期刊名Proceedings - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023
出版商IEEE Computer Society
162-167
页数6
ISBN(电子版)9781665468916
DOI
出版状态已出版 - 2023
活动2023 IEEE International Conference on Multimedia and Expo, ICME 2023 - Brisbane, 澳大利亚
期限: 10 7月 202314 7月 2023

出版系列

姓名Proceedings - IEEE International Conference on Multimedia and Expo
2023-July
ISSN(印刷版)1945-7871
ISSN(电子版)1945-788X

会议

会议2023 IEEE International Conference on Multimedia and Expo, ICME 2023
国家/地区澳大利亚
Brisbane
时期10/07/2314/07/23

指纹

探究 'Counterfactual Inference for Visual Relationship Detection in Videos' 的科研主题。它们共同构成独一无二的指纹。

引用此