Using Self-Supervised Dual Constraint Contrastive Learning for Cross-Modal Retrieval

Xintong Wang*, Xiaoyu Li, Liang Ding, Sanyuan Zhao, Chris Biemann

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Citations (Scopus)

Abstract

In this work, we present a self-supervised dual constraint contrastive method for efficiently fine-tuning the vision-language pre-trained (VLP) models that have achieved great success on various cross-modal tasks, since full fine-tune these pre-trained models is computationally expensive and tend to result in catastrophic forgetting restricted by the size and quality of labeled datasets. Our approach freezes the pre-trained VLP models as the fundamental, generalized, and transferable multimodal representation and incorporates lightweight parameters to learn domain and task-specific features without labeled data. We demonstrated that our self-supervised dual contrastive model performs better than previous fine-tuning methods on MS COCO and Flickr 30K datasets on the cross-modal retrieval task, with an even more pronounced improvement in zero-shot performance. Furthermore, experiments on the MOTIF dataset prove that our self-supervised approach remains effective when trained on a small, out-of-domain dataset without overfitting. As a plug-and-play method, our proposed method is agnostic to the underlying models and can be easily integrated with different VLP models, allowing for the potential incorporation of future advancements in VLP models.

Original languageEnglish
Title of host publicationECAI 2023 - 26th European Conference on Artificial Intelligence, including 12th Conference on Prestigious Applications of Intelligent Systems, PAIS 2023 - Proceedings
EditorsKobi Gal, Kobi Gal, Ann Nowe, Grzegorz J. Nalepa, Roy Fairstein, Roxana Radulescu
PublisherIOS Press BV
Pages2552-2559
Number of pages8
ISBN (Electronic)9781643684369
DOIs
Publication statusPublished - 28 Sept 2023
Event26th European Conference on Artificial Intelligence, ECAI 2023 - Krakow, Poland
Duration: 30 Sept 20234 Oct 2023

Publication series

NameFrontiers in Artificial Intelligence and Applications
Volume372
ISSN (Print)0922-6389
ISSN (Electronic)1879-8314

Conference

Conference26th European Conference on Artificial Intelligence, ECAI 2023
Country/TerritoryPoland
CityKrakow
Period30/09/234/10/23

Fingerprint

Dive into the research topics of 'Using Self-Supervised Dual Constraint Contrastive Learning for Cross-Modal Retrieval'. Together they form a unique fingerprint.

Cite this