TY - JOUR
T1 - Unsupervised Dual Hashing Coding (UDC) on Semantic Tagging and Sample Content for Cross-Modal Retrieval
AU - Cai, Hongmin
AU - Zhang, Bin
AU - Li, Junyu
AU - Hu, Bin
AU - Chen, Jiazhou
N1 - Publisher Copyright:
© 1999-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - Current cross-modal retrieval methods heavily rely on accurate semantic labels or sample similarity measurements, and need to search for the nearest samples among all samples in the huge search space, severely limiting the application in stratifying large-scale and high-dimensional multimodal data. To tackle with the issues, this paper proposes an unsupervised cross-modal retrieval method to bypass the semanticwise supervision and samplewise similarity from a standpoint of featurewise matching, named by unsupervised dual hashing coding (UDC). It jointly learns the dual hashing codes on semantic tagging and sample content through factorizing a feature matching potential, which is allowed to bridge the semantic and heterogeneous gaps among different modalities simultaneously through maintaining the inter-modality-consistent semantic information and cross-modality-correlated sample content. In this way, each sample is uniquely coded by a head code on semanticwise tags, and tail codes on samplewise content. The dual coding design makes it very efficient for sample retrieval, in which the query sample only need to search for the retrieved ones with the same semantic tag, greatly narrowing down the search space. The proposed model avoids the calculation of massive sample-wise similarity and works with dual hashing coding scheme, which achieves a twofold efficiency enhancement for analyzing the large-scale and high-dimensional multimodal data. Extensive experiments have been conducted to demonstrate that it achieved superiority on computational time and retrieval performance.
AB - Current cross-modal retrieval methods heavily rely on accurate semantic labels or sample similarity measurements, and need to search for the nearest samples among all samples in the huge search space, severely limiting the application in stratifying large-scale and high-dimensional multimodal data. To tackle with the issues, this paper proposes an unsupervised cross-modal retrieval method to bypass the semanticwise supervision and samplewise similarity from a standpoint of featurewise matching, named by unsupervised dual hashing coding (UDC). It jointly learns the dual hashing codes on semantic tagging and sample content through factorizing a feature matching potential, which is allowed to bridge the semantic and heterogeneous gaps among different modalities simultaneously through maintaining the inter-modality-consistent semantic information and cross-modality-correlated sample content. In this way, each sample is uniquely coded by a head code on semanticwise tags, and tail codes on samplewise content. The dual coding design makes it very efficient for sample retrieval, in which the query sample only need to search for the retrieved ones with the same semantic tag, greatly narrowing down the search space. The proposed model avoids the calculation of massive sample-wise similarity and works with dual hashing coding scheme, which achieves a twofold efficiency enhancement for analyzing the large-scale and high-dimensional multimodal data. Extensive experiments have been conducted to demonstrate that it achieved superiority on computational time and retrieval performance.
KW - Cross-modal retrieval
KW - dual hashing coding
KW - hashing
KW - large-scale
KW - multimodal
KW - unsupervised learning
UR - http://www.scopus.com/inward/record.url?scp=85190172487&partnerID=8YFLogxK
U2 - 10.1109/TMM.2024.3385986
DO - 10.1109/TMM.2024.3385986
M3 - Article
AN - SCOPUS:85190172487
SN - 1520-9210
VL - 26
SP - 9109
EP - 9120
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
ER -