Synthesizing Counterfactual Samples for Overcoming Moment Biases in Temporal Video Grounding

Mingliang Zhai, Chuanhao Li, Chenchen Jing, Yuwei Wu*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Moment bias is a critical issue in temporal video grounding (TVG), where models often exploit superficial correlations between language queries and moment locations as shortcuts to predict temporal boundaries. In this paper, we propose a model-agnostic counterfactual samples synthesizing method to overcome moment biases by endowing TVG models with sensitivity to linguistic and visual variations. The models with sensitivity sufficiently utilize linguistic information and focus on important video clips rather than fixed patterns, therefore are not dominated by moment biases. Specifically, we synthesize counterfactual samples by masking important words in queries or deleting important frames in videos for training TVG models. During training, we penalize the model if it makes similar predictions on counterfactual samples and original samples to encourage the model to perceive linguistic and visual variations. Experiment results on two datasets (i.e., Charades-CD and ActivityNet-CD) demonstrate the effectiveness of our method.

Original languageEnglish
Title of host publicationPattern Recognition and Computer Vision - 5th Chinese Conference, PRCV 2022, Proceedings
EditorsShiqi Yu, Jianguo Zhang, Zhaoxiang Zhang, Tieniu Tan, Pong C. Yuen, Yike Guo, Junwei Han, Jianhuang Lai
PublisherSpringer Science and Business Media Deutschland GmbH
Pages436-448
Number of pages13
ISBN (Print)9783031189067
DOIs
Publication statusPublished - 2022
Event5th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2022 - Shenzhen, China
Duration: 4 Nov 20227 Nov 2022

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13534 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference5th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2022
Country/TerritoryChina
CityShenzhen
Period4/11/227/11/22

Keywords

  • Counterfactual samples
  • Moment biases
  • Temporal video grounding

Fingerprint

Dive into the research topics of 'Synthesizing Counterfactual Samples for Overcoming Moment Biases in Temporal Video Grounding'. Together they form a unique fingerprint.

Cite this