Counterfactual Inference for Visual Relationship Detection in Videos

Xiaofeng Ji; Jin Chen; Xinxiao Wu

doi:10.1109/ICME55011.2023.00036

Counterfactual Inference for Visual Relationship Detection in Videos

Xiaofeng Ji, Jin Chen, Xinxiao Wu^*

^*Corresponding author for this work

School of Computer Science and Technology

Beijing Institute of Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

1 Citation (Scopus)

Abstract

Visual relationship detection in videos is a challenging task since it requires not only to detect static relationships but also to infer dynamic relationships. Recent progress has been made through enriching visual representations by appearance and motion fusion or spatial and temporal reasoning, but without exploring the intrinsic causality between representations and predictions. In this paper, we propose a novel counterfactual inference method for video relationship detection, which infers the causal effects of appearance, motion and language features on the predictions of static and dynamic relationships. Specifically, starting with building a causal graph to represent the causality between features and relationship categories, we then construct counterfactual scenes by intervening the features to infer their effects on prediction, and finally incorporate the inferred effects into the relationship categorization by adaptively learning the weights of appearance, motion and language. Extensive experiments on two benchmark datasets demonstrate the effectiveness of our method.

Original language	English
Title of host publication	Proceedings - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023
Publisher	IEEE Computer Society
Pages	162-167
Number of pages	6
ISBN (Electronic)	9781665468916
DOIs	https://doi.org/10.1109/ICME55011.2023.00036
Publication status	Published - 2023
Event	2023 IEEE International Conference on Multimedia and Expo, ICME 2023 - Brisbane, Australia Duration: 10 Jul 2023 → 14 Jul 2023

Publication series

Name	Proceedings - IEEE International Conference on Multimedia and Expo
Volume	2023-July
ISSN (Print)	1945-7871
ISSN (Electronic)	1945-788X

Conference

Conference	2023 IEEE International Conference on Multimedia and Expo, ICME 2023
Country/Territory	Australia
City	Brisbane
Period	10/07/23 → 14/07/23

Keywords

Counterfactual Inference
Video Relationship Detection
Video Understanding

Access to Document

10.1109/ICME55011.2023.00036

Cite this

@inproceedings{b16d68ca2e30428f9f989f7015fc4024,

title = "Counterfactual Inference for Visual Relationship Detection in Videos",

abstract = "Visual relationship detection in videos is a challenging task since it requires not only to detect static relationships but also to infer dynamic relationships. Recent progress has been made through enriching visual representations by appearance and motion fusion or spatial and temporal reasoning, but without exploring the intrinsic causality between representations and predictions. In this paper, we propose a novel counterfactual inference method for video relationship detection, which infers the causal effects of appearance, motion and language features on the predictions of static and dynamic relationships. Specifically, starting with building a causal graph to represent the causality between features and relationship categories, we then construct counterfactual scenes by intervening the features to infer their effects on prediction, and finally incorporate the inferred effects into the relationship categorization by adaptively learning the weights of appearance, motion and language. Extensive experiments on two benchmark datasets demonstrate the effectiveness of our method.",

keywords = "Counterfactual Inference, Video Relationship Detection, Video Understanding",

author = "Xiaofeng Ji and Jin Chen and Xinxiao Wu",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 2023 IEEE International Conference on Multimedia and Expo, ICME 2023 ; Conference date: 10-07-2023 Through 14-07-2023",

year = "2023",

doi = "10.1109/ICME55011.2023.00036",

language = "English",

series = "Proceedings - IEEE International Conference on Multimedia and Expo",

publisher = "IEEE Computer Society",

pages = "162--167",

booktitle = "Proceedings - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023",

address = "United States",

}

Ji, X, Chen, J & Wu, X 2023, Counterfactual Inference for Visual Relationship Detection in Videos. in Proceedings - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023. Proceedings - IEEE International Conference on Multimedia and Expo, vol. 2023-July, IEEE Computer Society, pp. 162-167, 2023 IEEE International Conference on Multimedia and Expo, ICME 2023, Brisbane, Australia, 10/07/23. https://doi.org/10.1109/ICME55011.2023.00036

Counterfactual Inference for Visual Relationship Detection in Videos. / Ji, Xiaofeng; Chen, Jin; Wu, Xinxiao.
Proceedings - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023. IEEE Computer Society, 2023. p. 162-167 (Proceedings - IEEE International Conference on Multimedia and Expo; Vol. 2023-July).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Counterfactual Inference for Visual Relationship Detection in Videos

AU - Ji, Xiaofeng

AU - Chen, Jin

AU - Wu, Xinxiao

PY - 2023

Y1 - 2023

N2 - Visual relationship detection in videos is a challenging task since it requires not only to detect static relationships but also to infer dynamic relationships. Recent progress has been made through enriching visual representations by appearance and motion fusion or spatial and temporal reasoning, but without exploring the intrinsic causality between representations and predictions. In this paper, we propose a novel counterfactual inference method for video relationship detection, which infers the causal effects of appearance, motion and language features on the predictions of static and dynamic relationships. Specifically, starting with building a causal graph to represent the causality between features and relationship categories, we then construct counterfactual scenes by intervening the features to infer their effects on prediction, and finally incorporate the inferred effects into the relationship categorization by adaptively learning the weights of appearance, motion and language. Extensive experiments on two benchmark datasets demonstrate the effectiveness of our method.

AB - Visual relationship detection in videos is a challenging task since it requires not only to detect static relationships but also to infer dynamic relationships. Recent progress has been made through enriching visual representations by appearance and motion fusion or spatial and temporal reasoning, but without exploring the intrinsic causality between representations and predictions. In this paper, we propose a novel counterfactual inference method for video relationship detection, which infers the causal effects of appearance, motion and language features on the predictions of static and dynamic relationships. Specifically, starting with building a causal graph to represent the causality between features and relationship categories, we then construct counterfactual scenes by intervening the features to infer their effects on prediction, and finally incorporate the inferred effects into the relationship categorization by adaptively learning the weights of appearance, motion and language. Extensive experiments on two benchmark datasets demonstrate the effectiveness of our method.

KW - Counterfactual Inference

KW - Video Relationship Detection

KW - Video Understanding

UR - http://www.scopus.com/inward/record.url?scp=85171154518&partnerID=8YFLogxK

U2 - 10.1109/ICME55011.2023.00036

DO - 10.1109/ICME55011.2023.00036

M3 - Conference contribution

AN - SCOPUS:85171154518

T3 - Proceedings - IEEE International Conference on Multimedia and Expo

SP - 162

EP - 167

BT - Proceedings - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023

PB - IEEE Computer Society

T2 - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023

Y2 - 10 July 2023 through 14 July 2023

ER -

Counterfactual Inference for Visual Relationship Detection in Videos

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this