TY - GEN
T1 - Spatial-temporal Causal Inference for Partial Image-to-video Adaptation
AU - Chen, Jin
AU - Wu, Xinxiao
AU - Hu, Yao
AU - Luo, Jiebo
N1 - Publisher Copyright:
Copyright © 2021, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved
PY - 2021
Y1 - 2021
N2 - Image-to-video adaptation leverages off-the-shelf learned models in labeled images to help classification in unlabeled videos, thus alleviating the high computation overhead of training a video classifier from scratch. This task is very challenging since there exist two types of domain shifts between images and videos: 1) spatial domain shift caused by static appearance variance between images and video frames, and 2) temporal domain shift caused by the absence of dynamic motion in images. Moreover, for different video classes, these two domain shifts have different effects on the domain gap and should not be treated equally during adaptation. In this paper, we propose a spatial-temporal causal inference framework for image-to-video adaptation. We first construct a spatial-temporal causal graph to infer the effects of the spatial and temporal domain shifts by performing counterfactual causality. We then learn causality-guided bidirectional heterogeneous mappings between images and videos to adaptively reduce the two domain shifts. Moreover, to relax the assumption that the label spaces of the image and video domains are the same by the existing methods, we incorporate class-wise alignment into the learning of image-video mappings to perform partial image-to-video adaptation where the image label space subsumes the video label space. Extensive experiments on several video datasets have validated the effectiveness of our proposed method.
AB - Image-to-video adaptation leverages off-the-shelf learned models in labeled images to help classification in unlabeled videos, thus alleviating the high computation overhead of training a video classifier from scratch. This task is very challenging since there exist two types of domain shifts between images and videos: 1) spatial domain shift caused by static appearance variance between images and video frames, and 2) temporal domain shift caused by the absence of dynamic motion in images. Moreover, for different video classes, these two domain shifts have different effects on the domain gap and should not be treated equally during adaptation. In this paper, we propose a spatial-temporal causal inference framework for image-to-video adaptation. We first construct a spatial-temporal causal graph to infer the effects of the spatial and temporal domain shifts by performing counterfactual causality. We then learn causality-guided bidirectional heterogeneous mappings between images and videos to adaptively reduce the two domain shifts. Moreover, to relax the assumption that the label spaces of the image and video domains are the same by the existing methods, we incorporate class-wise alignment into the learning of image-video mappings to perform partial image-to-video adaptation where the image label space subsumes the video label space. Extensive experiments on several video datasets have validated the effectiveness of our proposed method.
UR - http://www.scopus.com/inward/record.url?scp=85121642516&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85121642516
T3 - 35th AAAI Conference on Artificial Intelligence, AAAI 2021
SP - 1027
EP - 1035
BT - 35th AAAI Conference on Artificial Intelligence, AAAI 2021
PB - Association for the Advancement of Artificial Intelligence
T2 - 35th AAAI Conference on Artificial Intelligence, AAAI 2021
Y2 - 2 February 2021 through 9 February 2021
ER -