Synthesizing Counterfactual Samples for Overcoming Moment Biases in Temporal Video Grounding

Mingliang Zhai, Chuanhao Li, Chenchen Jing, Yuwei Wu*

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Moment bias is a critical issue in temporal video grounding (TVG), where models often exploit superficial correlations between language queries and moment locations as shortcuts to predict temporal boundaries. In this paper, we propose a model-agnostic counterfactual samples synthesizing method to overcome moment biases by endowing TVG models with sensitivity to linguistic and visual variations. The models with sensitivity sufficiently utilize linguistic information and focus on important video clips rather than fixed patterns, therefore are not dominated by moment biases. Specifically, we synthesize counterfactual samples by masking important words in queries or deleting important frames in videos for training TVG models. During training, we penalize the model if it makes similar predictions on counterfactual samples and original samples to encourage the model to perceive linguistic and visual variations. Experiment results on two datasets (i.e., Charades-CD and ActivityNet-CD) demonstrate the effectiveness of our method.

源语言英语
主期刊名Pattern Recognition and Computer Vision - 5th Chinese Conference, PRCV 2022, Proceedings
编辑Shiqi Yu, Jianguo Zhang, Zhaoxiang Zhang, Tieniu Tan, Pong C. Yuen, Yike Guo, Junwei Han, Jianhuang Lai
出版商Springer Science and Business Media Deutschland GmbH
436-448
页数13
ISBN(印刷版)9783031189067
DOI
出版状态已出版 - 2022
活动5th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2022 - Shenzhen, 中国
期限: 4 11月 20227 11月 2022

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
13534 LNCS
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议5th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2022
国家/地区中国
Shenzhen
时期4/11/227/11/22

指纹

探究 'Synthesizing Counterfactual Samples for Overcoming Moment Biases in Temporal Video Grounding' 的科研主题。它们共同构成独一无二的指纹。

引用此