An Adversarial Video Moment Retrieval Algorithm

Mohan Jia; Zhongjian Dai; Yaping Dai; Zhiyang Jia

doi:10.23919/CCC55666.2022.9902146

An Adversarial Video Moment Retrieval Algorithm

Mohan Jia, Zhongjian Dai, Yaping Dai^*, Zhiyang Jia

^*Corresponding author for this work

School of Automation

Beijing Institute of Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

1 Citation (Scopus)

Abstract

In one-stage methods for video moment retrieval, the common representations indirectly supervised by boundary prediction fail to fully preserve the inherent characteristic of the video and query, which limits the retrieval accuracy. To solve this problem, an Adversarial Video Moment Retrieval (AVMR) algorithm is proposed to learn the common representations with modality invariance and cross-modal similarity. AVMR is implemented through the process of adversarial learning between a feature projector and a modality classifier. The feature projector tries to generate a modality-invariant common representation and to confuse the modality classifier. The modality classifier tries to discriminate between different modalities based on the generated representation by the feature projector. The triplet constraints are further imposed on the feature projector to preserve the underlying cross-modal semantic structure of data. The experimental results show that AVMR surpasses the baseline Attentive Cross-modal Relevance Matching (ACRM) by 1.10% and 1.73% in the 'mIoU' metric on two public datasets Charades-STA and TACoS, respectively.

Original language	English
Title of host publication	Proceedings of the 41st Chinese Control Conference, CCC 2022
Editors	Zhijun Li, Jian Sun
Publisher	IEEE Computer Society
Pages	6689-6694
Number of pages	6
ISBN (Electronic)	9789887581536
DOIs	https://doi.org/10.23919/CCC55666.2022.9902146
Publication status	Published - 2022
Event	41st Chinese Control Conference, CCC 2022 - Hefei, China Duration: 25 Jul 2022 → 27 Jul 2022

Publication series

Name	Chinese Control Conference, CCC
Volume	2022-July
ISSN (Print)	1934-1768
ISSN (Electronic)	2161-2927

Conference

Conference	41st Chinese Control Conference, CCC 2022
Country/Territory	China
City	Hefei
Period	25/07/22 → 27/07/22

Keywords

Adversarial Learning
Cross-modal Retrieval
Deep learning
Video Moment Retrieval

Access to Document

10.23919/CCC55666.2022.9902146

Cite this

Jia, M., Dai, Z., Dai, Y., & Jia, Z. (2022). An Adversarial Video Moment Retrieval Algorithm. In Z. Li, & J. Sun (Eds.), Proceedings of the 41st Chinese Control Conference, CCC 2022 (pp. 6689-6694). (Chinese Control Conference, CCC; Vol. 2022-July). IEEE Computer Society. https://doi.org/10.23919/CCC55666.2022.9902146

@inproceedings{c8a33507272942bfa5e4bd3c8932c266,

title = "An Adversarial Video Moment Retrieval Algorithm",

abstract = "In one-stage methods for video moment retrieval, the common representations indirectly supervised by boundary prediction fail to fully preserve the inherent characteristic of the video and query, which limits the retrieval accuracy. To solve this problem, an Adversarial Video Moment Retrieval (AVMR) algorithm is proposed to learn the common representations with modality invariance and cross-modal similarity. AVMR is implemented through the process of adversarial learning between a feature projector and a modality classifier. The feature projector tries to generate a modality-invariant common representation and to confuse the modality classifier. The modality classifier tries to discriminate between different modalities based on the generated representation by the feature projector. The triplet constraints are further imposed on the feature projector to preserve the underlying cross-modal semantic structure of data. The experimental results show that AVMR surpasses the baseline Attentive Cross-modal Relevance Matching (ACRM) by 1.10% and 1.73% in the 'mIoU' metric on two public datasets Charades-STA and TACoS, respectively.",

keywords = "Adversarial Learning, Cross-modal Retrieval, Deep learning, Video Moment Retrieval",

author = "Mohan Jia and Zhongjian Dai and Yaping Dai and Zhiyang Jia",

note = "Publisher Copyright: {\textcopyright} 2022 Technical Committee on Control Theory, Chinese Association of Automation.; 41st Chinese Control Conference, CCC 2022 ; Conference date: 25-07-2022 Through 27-07-2022",

year = "2022",

doi = "10.23919/CCC55666.2022.9902146",

language = "English",

series = "Chinese Control Conference, CCC",

publisher = "IEEE Computer Society",

pages = "6689--6694",

editor = "Zhijun Li and Jian Sun",

booktitle = "Proceedings of the 41st Chinese Control Conference, CCC 2022",

address = "United States",

}

Jia, M, Dai, Z, Dai, Y & Jia, Z 2022, An Adversarial Video Moment Retrieval Algorithm. in Z Li & J Sun (eds), Proceedings of the 41st Chinese Control Conference, CCC 2022. Chinese Control Conference, CCC, vol. 2022-July, IEEE Computer Society, pp. 6689-6694, 41st Chinese Control Conference, CCC 2022, Hefei, China, 25/07/22. https://doi.org/10.23919/CCC55666.2022.9902146

An Adversarial Video Moment Retrieval Algorithm. / Jia, Mohan; Dai, Zhongjian; Dai, Yaping et al.
Proceedings of the 41st Chinese Control Conference, CCC 2022. ed. / Zhijun Li; Jian Sun. IEEE Computer Society, 2022. p. 6689-6694 (Chinese Control Conference, CCC; Vol. 2022-July).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - An Adversarial Video Moment Retrieval Algorithm

AU - Jia, Mohan

AU - Dai, Zhongjian

AU - Dai, Yaping

AU - Jia, Zhiyang

PY - 2022

Y1 - 2022

N2 - In one-stage methods for video moment retrieval, the common representations indirectly supervised by boundary prediction fail to fully preserve the inherent characteristic of the video and query, which limits the retrieval accuracy. To solve this problem, an Adversarial Video Moment Retrieval (AVMR) algorithm is proposed to learn the common representations with modality invariance and cross-modal similarity. AVMR is implemented through the process of adversarial learning between a feature projector and a modality classifier. The feature projector tries to generate a modality-invariant common representation and to confuse the modality classifier. The modality classifier tries to discriminate between different modalities based on the generated representation by the feature projector. The triplet constraints are further imposed on the feature projector to preserve the underlying cross-modal semantic structure of data. The experimental results show that AVMR surpasses the baseline Attentive Cross-modal Relevance Matching (ACRM) by 1.10% and 1.73% in the 'mIoU' metric on two public datasets Charades-STA and TACoS, respectively.

AB - In one-stage methods for video moment retrieval, the common representations indirectly supervised by boundary prediction fail to fully preserve the inherent characteristic of the video and query, which limits the retrieval accuracy. To solve this problem, an Adversarial Video Moment Retrieval (AVMR) algorithm is proposed to learn the common representations with modality invariance and cross-modal similarity. AVMR is implemented through the process of adversarial learning between a feature projector and a modality classifier. The feature projector tries to generate a modality-invariant common representation and to confuse the modality classifier. The modality classifier tries to discriminate between different modalities based on the generated representation by the feature projector. The triplet constraints are further imposed on the feature projector to preserve the underlying cross-modal semantic structure of data. The experimental results show that AVMR surpasses the baseline Attentive Cross-modal Relevance Matching (ACRM) by 1.10% and 1.73% in the 'mIoU' metric on two public datasets Charades-STA and TACoS, respectively.

KW - Adversarial Learning

KW - Cross-modal Retrieval

KW - Deep learning

KW - Video Moment Retrieval

UR - http://www.scopus.com/inward/record.url?scp=85140451584&partnerID=8YFLogxK

U2 - 10.23919/CCC55666.2022.9902146

DO - 10.23919/CCC55666.2022.9902146

M3 - Conference contribution

AN - SCOPUS:85140451584

T3 - Chinese Control Conference, CCC

SP - 6689

EP - 6694

BT - Proceedings of the 41st Chinese Control Conference, CCC 2022

A2 - Li, Zhijun

A2 - Sun, Jian

PB - IEEE Computer Society

T2 - 41st Chinese Control Conference, CCC 2022

Y2 - 25 July 2022 through 27 July 2022

ER -

An Adversarial Video Moment Retrieval Algorithm

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this