An Adversarial Video Moment Retrieval Algorithm

Mohan Jia, Zhongjian Dai, Yaping Dai*, Zhiyang Jia

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

In one-stage methods for video moment retrieval, the common representations indirectly supervised by boundary prediction fail to fully preserve the inherent characteristic of the video and query, which limits the retrieval accuracy. To solve this problem, an Adversarial Video Moment Retrieval (AVMR) algorithm is proposed to learn the common representations with modality invariance and cross-modal similarity. AVMR is implemented through the process of adversarial learning between a feature projector and a modality classifier. The feature projector tries to generate a modality-invariant common representation and to confuse the modality classifier. The modality classifier tries to discriminate between different modalities based on the generated representation by the feature projector. The triplet constraints are further imposed on the feature projector to preserve the underlying cross-modal semantic structure of data. The experimental results show that AVMR surpasses the baseline Attentive Cross-modal Relevance Matching (ACRM) by 1.10% and 1.73% in the 'mIoU' metric on two public datasets Charades-STA and TACoS, respectively.

Original languageEnglish
Title of host publicationProceedings of the 41st Chinese Control Conference, CCC 2022
EditorsZhijun Li, Jian Sun
PublisherIEEE Computer Society
Pages6689-6694
Number of pages6
ISBN (Electronic)9789887581536
DOIs
Publication statusPublished - 2022
Event41st Chinese Control Conference, CCC 2022 - Hefei, China
Duration: 25 Jul 202227 Jul 2022

Publication series

NameChinese Control Conference, CCC
Volume2022-July
ISSN (Print)1934-1768
ISSN (Electronic)2161-2927

Conference

Conference41st Chinese Control Conference, CCC 2022
Country/TerritoryChina
CityHefei
Period25/07/2227/07/22

Keywords

  • Adversarial Learning
  • Cross-modal Retrieval
  • Deep learning
  • Video Moment Retrieval

Fingerprint

Dive into the research topics of 'An Adversarial Video Moment Retrieval Algorithm'. Together they form a unique fingerprint.

Cite this