An Adversarial Video Moment Retrieval Algorithm

Mohan Jia, Zhongjian Dai, Yaping Dai*, Zhiyang Jia

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

1 引用 (Scopus)

摘要

In one-stage methods for video moment retrieval, the common representations indirectly supervised by boundary prediction fail to fully preserve the inherent characteristic of the video and query, which limits the retrieval accuracy. To solve this problem, an Adversarial Video Moment Retrieval (AVMR) algorithm is proposed to learn the common representations with modality invariance and cross-modal similarity. AVMR is implemented through the process of adversarial learning between a feature projector and a modality classifier. The feature projector tries to generate a modality-invariant common representation and to confuse the modality classifier. The modality classifier tries to discriminate between different modalities based on the generated representation by the feature projector. The triplet constraints are further imposed on the feature projector to preserve the underlying cross-modal semantic structure of data. The experimental results show that AVMR surpasses the baseline Attentive Cross-modal Relevance Matching (ACRM) by 1.10% and 1.73% in the 'mIoU' metric on two public datasets Charades-STA and TACoS, respectively.

源语言英语
主期刊名Proceedings of the 41st Chinese Control Conference, CCC 2022
编辑Zhijun Li, Jian Sun
出版商IEEE Computer Society
6689-6694
页数6
ISBN(电子版)9789887581536
DOI
出版状态已出版 - 2022
活动41st Chinese Control Conference, CCC 2022 - Hefei, 中国
期限: 25 7月 202227 7月 2022

出版系列

姓名Chinese Control Conference, CCC
2022-July
ISSN(印刷版)1934-1768
ISSN(电子版)2161-2927

会议

会议41st Chinese Control Conference, CCC 2022
国家/地区中国
Hefei
时期25/07/2227/07/22

指纹

探究 'An Adversarial Video Moment Retrieval Algorithm' 的科研主题。它们共同构成独一无二的指纹。

引用此