Counterfactual Inference for Visual Relationship Detection in Videos

Xiaofeng Ji, Jin Chen, Xinxiao Wu*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

Visual relationship detection in videos is a challenging task since it requires not only to detect static relationships but also to infer dynamic relationships. Recent progress has been made through enriching visual representations by appearance and motion fusion or spatial and temporal reasoning, but without exploring the intrinsic causality between representations and predictions. In this paper, we propose a novel counterfactual inference method for video relationship detection, which infers the causal effects of appearance, motion and language features on the predictions of static and dynamic relationships. Specifically, starting with building a causal graph to represent the causality between features and relationship categories, we then construct counterfactual scenes by intervening the features to infer their effects on prediction, and finally incorporate the inferred effects into the relationship categorization by adaptively learning the weights of appearance, motion and language. Extensive experiments on two benchmark datasets demonstrate the effectiveness of our method.

Original languageEnglish
Title of host publicationProceedings - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023
PublisherIEEE Computer Society
Pages162-167
Number of pages6
ISBN (Electronic)9781665468916
DOIs
Publication statusPublished - 2023
Event2023 IEEE International Conference on Multimedia and Expo, ICME 2023 - Brisbane, Australia
Duration: 10 Jul 202314 Jul 2023

Publication series

NameProceedings - IEEE International Conference on Multimedia and Expo
Volume2023-July
ISSN (Print)1945-7871
ISSN (Electronic)1945-788X

Conference

Conference2023 IEEE International Conference on Multimedia and Expo, ICME 2023
Country/TerritoryAustralia
CityBrisbane
Period10/07/2314/07/23

Keywords

  • Counterfactual Inference
  • Video Relationship Detection
  • Video Understanding

Fingerprint

Dive into the research topics of 'Counterfactual Inference for Visual Relationship Detection in Videos'. Together they form a unique fingerprint.

Cite this