Adversarial task-specific learning

Xin Fu, Yao Zhao*, Ting Liu, Yunchao Wei, Jianan Li, Shikui Wei

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

In this paper, we investigate a principle way to learn a common feature space for data of different modalities (e.g. image and text), so that the similarity between different modal items can be directly measured for benefiting cross-modal retrieval task. To effectively keep semantic/distribution consistent for common feature embeddings, we propose a new Adversarial Task-Specific Learning (ATSL) approach to learn distinct embeddings for different retrieval tasks, i.e. images retrieve texts (I2T) or texts retrieve images (T2I). In particular, the proposed ATSL is with the following advantages: (a) semantic attributes are leveraged to encourage the learned common feature embeddings of couples to be semantic consistent; (b) adversarial learning is applied to relieve the inconsistent distribution of common feature embeddings for different modalities; (c) triplet optimization is employed to guarantee that similar items from different modalities are with smaller distances in the learned common space compared with the dissimilar ones; (d) task-specific learning produces better optimized common feature embeddings for different retrieval tasks. Our ATSL is embedded in a deep neural network, which can be learned in an end-to-end manner. We conduct extensive experiments on two popular benchmark datasets, e.g. Flickr30K and MS COCO. We achieve R@1 accuracy of 57.1% and 38.4% for I2T and 56.5% and 38.6% T2I on MS COCO and Flickr30K respectively, which are the new state-of-the-arts.

Original languageEnglish
Pages (from-to)118-128
Number of pages11
JournalNeurocomputing
Volume362
DOIs
Publication statusPublished - 14 Oct 2019

Keywords

  • Adversarial learning
  • Cross-modal retrieval
  • Subspace learning

Fingerprint

Dive into the research topics of 'Adversarial task-specific learning'. Together they form a unique fingerprint.

Cite this