Spatial-temporal Causal Inference for Partial Image-to-video Adaptation

Jin Chen; Xinxiao Wu; Yao Hu; Jiebo Luo

Spatial-temporal Causal Inference for Partial Image-to-video Adaptation

Jin Chen, Xinxiao Wu^*, Yao Hu, Jiebo Luo

^*Corresponding author for this work

School of Computer Science and Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

13 Citations (Scopus)

Abstract

Image-to-video adaptation leverages off-the-shelf learned models in labeled images to help classification in unlabeled videos, thus alleviating the high computation overhead of training a video classifier from scratch. This task is very challenging since there exist two types of domain shifts between images and videos: 1) spatial domain shift caused by static appearance variance between images and video frames, and 2) temporal domain shift caused by the absence of dynamic motion in images. Moreover, for different video classes, these two domain shifts have different effects on the domain gap and should not be treated equally during adaptation. In this paper, we propose a spatial-temporal causal inference framework for image-to-video adaptation. We first construct a spatial-temporal causal graph to infer the effects of the spatial and temporal domain shifts by performing counterfactual causality. We then learn causality-guided bidirectional heterogeneous mappings between images and videos to adaptively reduce the two domain shifts. Moreover, to relax the assumption that the label spaces of the image and video domains are the same by the existing methods, we incorporate class-wise alignment into the learning of image-video mappings to perform partial image-to-video adaptation where the image label space subsumes the video label space. Extensive experiments on several video datasets have validated the effectiveness of our proposed method.

Original language	English
Title of host publication	35th AAAI Conference on Artificial Intelligence, AAAI 2021
Publisher	Association for the Advancement of Artificial Intelligence
Pages	1027-1035
Number of pages	9
ISBN (Electronic)	9781713835974
Publication status	Published - 2021
Event	35th AAAI Conference on Artificial Intelligence, AAAI 2021 - Virtual, Online Duration: 2 Feb 2021 → 9 Feb 2021

Publication series

Name	35th AAAI Conference on Artificial Intelligence, AAAI 2021
Volume	2A

Conference

Conference	35th AAAI Conference on Artificial Intelligence, AAAI 2021
City	Virtual, Online
Period	2/02/21 → 9/02/21

Cite this

Chen, J., Wu, X., Hu, Y., & Luo, J. (2021). Spatial-temporal Causal Inference for Partial Image-to-video Adaptation. In 35th AAAI Conference on Artificial Intelligence, AAAI 2021 (pp. 1027-1035). (35th AAAI Conference on Artificial Intelligence, AAAI 2021; Vol. 2A). Association for the Advancement of Artificial Intelligence.

@inproceedings{64feedfdba5345afb13a2dedea17a237,

title = "Spatial-temporal Causal Inference for Partial Image-to-video Adaptation",

abstract = "Image-to-video adaptation leverages off-the-shelf learned models in labeled images to help classification in unlabeled videos, thus alleviating the high computation overhead of training a video classifier from scratch. This task is very challenging since there exist two types of domain shifts between images and videos: 1) spatial domain shift caused by static appearance variance between images and video frames, and 2) temporal domain shift caused by the absence of dynamic motion in images. Moreover, for different video classes, these two domain shifts have different effects on the domain gap and should not be treated equally during adaptation. In this paper, we propose a spatial-temporal causal inference framework for image-to-video adaptation. We first construct a spatial-temporal causal graph to infer the effects of the spatial and temporal domain shifts by performing counterfactual causality. We then learn causality-guided bidirectional heterogeneous mappings between images and videos to adaptively reduce the two domain shifts. Moreover, to relax the assumption that the label spaces of the image and video domains are the same by the existing methods, we incorporate class-wise alignment into the learning of image-video mappings to perform partial image-to-video adaptation where the image label space subsumes the video label space. Extensive experiments on several video datasets have validated the effectiveness of our proposed method.",

author = "Jin Chen and Xinxiao Wu and Yao Hu and Jiebo Luo",

note = "Publisher Copyright: Copyright {\textcopyright} 2021, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved; 35th AAAI Conference on Artificial Intelligence, AAAI 2021 ; Conference date: 02-02-2021 Through 09-02-2021",

year = "2021",

language = "English",

series = "35th AAAI Conference on Artificial Intelligence, AAAI 2021",

publisher = "Association for the Advancement of Artificial Intelligence",

pages = "1027--1035",

booktitle = "35th AAAI Conference on Artificial Intelligence, AAAI 2021",

}

Chen, J, Wu, X, Hu, Y & Luo, J 2021, Spatial-temporal Causal Inference for Partial Image-to-video Adaptation. in 35th AAAI Conference on Artificial Intelligence, AAAI 2021. 35th AAAI Conference on Artificial Intelligence, AAAI 2021, vol. 2A, Association for the Advancement of Artificial Intelligence, pp. 1027-1035, 35th AAAI Conference on Artificial Intelligence, AAAI 2021, Virtual, Online, 2/02/21.

Spatial-temporal Causal Inference for Partial Image-to-video Adaptation. / Chen, Jin; Wu, Xinxiao; Hu, Yao et al.
35th AAAI Conference on Artificial Intelligence, AAAI 2021. Association for the Advancement of Artificial Intelligence, 2021. p. 1027-1035 (35th AAAI Conference on Artificial Intelligence, AAAI 2021; Vol. 2A).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Spatial-temporal Causal Inference for Partial Image-to-video Adaptation

AU - Chen, Jin

AU - Wu, Xinxiao

AU - Hu, Yao

AU - Luo, Jiebo

PY - 2021

Y1 - 2021

N2 - Image-to-video adaptation leverages off-the-shelf learned models in labeled images to help classification in unlabeled videos, thus alleviating the high computation overhead of training a video classifier from scratch. This task is very challenging since there exist two types of domain shifts between images and videos: 1) spatial domain shift caused by static appearance variance between images and video frames, and 2) temporal domain shift caused by the absence of dynamic motion in images. Moreover, for different video classes, these two domain shifts have different effects on the domain gap and should not be treated equally during adaptation. In this paper, we propose a spatial-temporal causal inference framework for image-to-video adaptation. We first construct a spatial-temporal causal graph to infer the effects of the spatial and temporal domain shifts by performing counterfactual causality. We then learn causality-guided bidirectional heterogeneous mappings between images and videos to adaptively reduce the two domain shifts. Moreover, to relax the assumption that the label spaces of the image and video domains are the same by the existing methods, we incorporate class-wise alignment into the learning of image-video mappings to perform partial image-to-video adaptation where the image label space subsumes the video label space. Extensive experiments on several video datasets have validated the effectiveness of our proposed method.

AB - Image-to-video adaptation leverages off-the-shelf learned models in labeled images to help classification in unlabeled videos, thus alleviating the high computation overhead of training a video classifier from scratch. This task is very challenging since there exist two types of domain shifts between images and videos: 1) spatial domain shift caused by static appearance variance between images and video frames, and 2) temporal domain shift caused by the absence of dynamic motion in images. Moreover, for different video classes, these two domain shifts have different effects on the domain gap and should not be treated equally during adaptation. In this paper, we propose a spatial-temporal causal inference framework for image-to-video adaptation. We first construct a spatial-temporal causal graph to infer the effects of the spatial and temporal domain shifts by performing counterfactual causality. We then learn causality-guided bidirectional heterogeneous mappings between images and videos to adaptively reduce the two domain shifts. Moreover, to relax the assumption that the label spaces of the image and video domains are the same by the existing methods, we incorporate class-wise alignment into the learning of image-video mappings to perform partial image-to-video adaptation where the image label space subsumes the video label space. Extensive experiments on several video datasets have validated the effectiveness of our proposed method.

UR - http://www.scopus.com/inward/record.url?scp=85121642516&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85121642516

T3 - 35th AAAI Conference on Artificial Intelligence, AAAI 2021

SP - 1027

EP - 1035

BT - 35th AAAI Conference on Artificial Intelligence, AAAI 2021

PB - Association for the Advancement of Artificial Intelligence

T2 - 35th AAAI Conference on Artificial Intelligence, AAAI 2021

Y2 - 2 February 2021 through 9 February 2021

ER -

Spatial-temporal Causal Inference for Partial Image-to-video Adaptation

Abstract

Publication series

Conference

Other files and links

Fingerprint

Cite this