Abstract
The key issue for cross-modal retrieval using cross-modal Hashing is how to maximize the consistency of the semantic relationship for heterogeneous media data. This paper presents a self-supervised deep semantics-preserving hashing network (UDSPH) that generates compact Hash codes using an end-to-end architecture. Two modality-specific hashing networks are first trained for generating the Hash codes and high-level features. The semantic relationship hetween different modalities is then measured using cross-modal attention mechanisms that maximize preservation of the local semantic correlation. Multi-label semantic information in the training data is used to simultaneously guide the training of two modality-specific Hashing networks by self-supervised adversarial learning. This constructs a deep semantic hashing network that preserves the semantic association in the global view and improves the discriminative capability of the generated Hash codes. Tests on three widely-used benchmark datasets verify the effectiveness of this method.
Translated title of the contribution | Self-supervised deep semantics-preserving Hashing for cross-modal retrieval |
---|---|
Original language | Chinese (Traditional) |
Pages (from-to) | 1442-1449 |
Number of pages | 8 |
Journal | Qinghua Daxue Xuebao/Journal of Tsinghua University |
Volume | 62 |
Issue number | 9 |
DOIs | |
Publication status | Published - 15 Sept 2022 |