An Adversarial Video Moment Retrieval Algorithm

Mohan Jia, Zhongjian Dai, Yaping Dai*, Zhiyang Jia

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)


In one-stage methods for video moment retrieval, the common representations indirectly supervised by boundary prediction fail to fully preserve the inherent characteristic of the video and query, which limits the retrieval accuracy. To solve this problem, an Adversarial Video Moment Retrieval (AVMR) algorithm is proposed to learn the common representations with modality invariance and cross-modal similarity. AVMR is implemented through the process of adversarial learning between a feature projector and a modality classifier. The feature projector tries to generate a modality-invariant common representation and to confuse the modality classifier. The modality classifier tries to discriminate between different modalities based on the generated representation by the feature projector. The triplet constraints are further imposed on the feature projector to preserve the underlying cross-modal semantic structure of data. The experimental results show that AVMR surpasses the baseline Attentive Cross-modal Relevance Matching (ACRM) by 1.10% and 1.73% in the 'mIoU' metric on two public datasets Charades-STA and TACoS, respectively.

Original languageEnglish
Title of host publicationProceedings of the 41st Chinese Control Conference, CCC 2022
EditorsZhijun Li, Jian Sun
PublisherIEEE Computer Society
Number of pages6
ISBN (Electronic)9789887581536
Publication statusPublished - 2022
Event41st Chinese Control Conference, CCC 2022 - Hefei, China
Duration: 25 Jul 202227 Jul 2022

Publication series

NameChinese Control Conference, CCC
ISSN (Print)1934-1768
ISSN (Electronic)2161-2927


Conference41st Chinese Control Conference, CCC 2022


  • Adversarial Learning
  • Cross-modal Retrieval
  • Deep learning
  • Video Moment Retrieval


Dive into the research topics of 'An Adversarial Video Moment Retrieval Algorithm'. Together they form a unique fingerprint.

Cite this

Jia, M., Dai, Z., Dai, Y., & Jia, Z. (2022). An Adversarial Video Moment Retrieval Algorithm. In Z. Li, & J. Sun (Eds.), Proceedings of the 41st Chinese Control Conference, CCC 2022 (pp. 6689-6694). (Chinese Control Conference, CCC; Vol. 2022-July). IEEE Computer Society.