Deep Cross-Modal Proxy Hashing

Rong Cheng Tu; Xian Ling Mao; Rong Xin Tu; Binbin Bian; Chengfei Cai; Hongfa Wang; Wei Wei; Heyan Huang

doi:10.1109/TKDE.2022.3187023

Deep Cross-Modal Proxy Hashing

Rong Cheng Tu, Xian Ling Mao^*, Rong Xin Tu, Binbin Bian, Chengfei Cai, Hongfa Wang, Wei Wei, Heyan Huang

^*Corresponding author for this work

School of Computer Science and Technology

Research output: Contribution to journal › Article › peer-review

17 Citations (Scopus)

Abstract

Due to the high retrieval efficiency and low storage cost for cross-modal search tasks, cross-modal hashing methods have attracted considerable attention from the researchers. For the supervised cross-modal hashing methods, how to make the learned hash codes sufficiently preserve semantic information contained in the label of datapoints is the key to further enhance the retrieval performance. Hence, almost all supervised cross-modal hashing methods usually depend on defining similarities between datapoints with the label information to guide the hashing model learning fully or partly. However, the defined similarity between datapoints can only capture the label information of datapoints partially and misses abundant semantic information, which then hinders the further improvement of retrieval performance. Thus, in this paper, different from previous works, we propose a novel cross-modal hashing method without defining the similarity between datapoints, called Deep Cross-modal Proxy Hashing (DCPH). Specifically, DCPH first trains a proxy hashing network to transform each category information of a dataset into a semantic discriminative hash code, called proxy hash code. Each proxy hash code can preserve the semantic information of its corresponding category well. Next, without defining the similarity between datapoints to supervise the training process of the modality-specific hashing networks, we propose a novel margin-dynamic-softmax loss to directly utilize the proxy hashing codes as supervised information. Finally, by minimizing the novel margin-dynamic-softmax loss, the modality-specific hashing networks can be trained to generate hash codes that can simultaneously preserve the cross-modal similarity and abundant semantic information well. Extensive experiments on three benchmark datasets show that the proposed method outperforms the state-of-the-art baselines in the cross-modal retrieval tasks.

Original language	English
Pages (from-to)	6798-6810
Number of pages	13
Journal	IEEE Transactions on Knowledge and Data Engineering
Volume	35
Issue number	7
DOIs	https://doi.org/10.1109/TKDE.2022.3187023
Publication status	Published - 1 Jul 2023

Keywords

Cross-modal retrieval
deep supervised hashing
margin-dynamic-softmax loss
proxy code

Access to Document

10.1109/TKDE.2022.3187023

Cite this

@article{c5df9fc596994142baf19753dfb8723a,

title = "Deep Cross-Modal Proxy Hashing",

abstract = "Due to the high retrieval efficiency and low storage cost for cross-modal search tasks, cross-modal hashing methods have attracted considerable attention from the researchers. For the supervised cross-modal hashing methods, how to make the learned hash codes sufficiently preserve semantic information contained in the label of datapoints is the key to further enhance the retrieval performance. Hence, almost all supervised cross-modal hashing methods usually depend on defining similarities between datapoints with the label information to guide the hashing model learning fully or partly. However, the defined similarity between datapoints can only capture the label information of datapoints partially and misses abundant semantic information, which then hinders the further improvement of retrieval performance. Thus, in this paper, different from previous works, we propose a novel cross-modal hashing method without defining the similarity between datapoints, called Deep Cross-modal Proxy Hashing (DCPH). Specifically, DCPH first trains a proxy hashing network to transform each category information of a dataset into a semantic discriminative hash code, called proxy hash code. Each proxy hash code can preserve the semantic information of its corresponding category well. Next, without defining the similarity between datapoints to supervise the training process of the modality-specific hashing networks, we propose a novel margin-dynamic-softmax loss to directly utilize the proxy hashing codes as supervised information. Finally, by minimizing the novel margin-dynamic-softmax loss, the modality-specific hashing networks can be trained to generate hash codes that can simultaneously preserve the cross-modal similarity and abundant semantic information well. Extensive experiments on three benchmark datasets show that the proposed method outperforms the state-of-the-art baselines in the cross-modal retrieval tasks.",

keywords = "Cross-modal retrieval, deep supervised hashing, margin-dynamic-softmax loss, proxy code",

author = "Tu, {Rong Cheng} and Mao, {Xian Ling} and Tu, {Rong Xin} and Binbin Bian and Chengfei Cai and Hongfa Wang and Wei Wei and Heyan Huang",

note = "Publisher Copyright: {\textcopyright} 2022 IEEE.",

year = "2023",

month = jul,

day = "1",

doi = "10.1109/TKDE.2022.3187023",

language = "English",

volume = "35",

pages = "6798--6810",

journal = "IEEE Transactions on Knowledge and Data Engineering",

issn = "1041-4347",

publisher = "IEEE Computer Society",

number = "7",

}

TY - JOUR

T1 - Deep Cross-Modal Proxy Hashing

AU - Tu, Rong Cheng

AU - Mao, Xian Ling

AU - Tu, Rong Xin

AU - Bian, Binbin

AU - Cai, Chengfei

AU - Wang, Hongfa

AU - Wei, Wei

AU - Huang, Heyan

PY - 2023/7/1

Y1 - 2023/7/1

N2 - Due to the high retrieval efficiency and low storage cost for cross-modal search tasks, cross-modal hashing methods have attracted considerable attention from the researchers. For the supervised cross-modal hashing methods, how to make the learned hash codes sufficiently preserve semantic information contained in the label of datapoints is the key to further enhance the retrieval performance. Hence, almost all supervised cross-modal hashing methods usually depend on defining similarities between datapoints with the label information to guide the hashing model learning fully or partly. However, the defined similarity between datapoints can only capture the label information of datapoints partially and misses abundant semantic information, which then hinders the further improvement of retrieval performance. Thus, in this paper, different from previous works, we propose a novel cross-modal hashing method without defining the similarity between datapoints, called Deep Cross-modal Proxy Hashing (DCPH). Specifically, DCPH first trains a proxy hashing network to transform each category information of a dataset into a semantic discriminative hash code, called proxy hash code. Each proxy hash code can preserve the semantic information of its corresponding category well. Next, without defining the similarity between datapoints to supervise the training process of the modality-specific hashing networks, we propose a novel margin-dynamic-softmax loss to directly utilize the proxy hashing codes as supervised information. Finally, by minimizing the novel margin-dynamic-softmax loss, the modality-specific hashing networks can be trained to generate hash codes that can simultaneously preserve the cross-modal similarity and abundant semantic information well. Extensive experiments on three benchmark datasets show that the proposed method outperforms the state-of-the-art baselines in the cross-modal retrieval tasks.

AB - Due to the high retrieval efficiency and low storage cost for cross-modal search tasks, cross-modal hashing methods have attracted considerable attention from the researchers. For the supervised cross-modal hashing methods, how to make the learned hash codes sufficiently preserve semantic information contained in the label of datapoints is the key to further enhance the retrieval performance. Hence, almost all supervised cross-modal hashing methods usually depend on defining similarities between datapoints with the label information to guide the hashing model learning fully or partly. However, the defined similarity between datapoints can only capture the label information of datapoints partially and misses abundant semantic information, which then hinders the further improvement of retrieval performance. Thus, in this paper, different from previous works, we propose a novel cross-modal hashing method without defining the similarity between datapoints, called Deep Cross-modal Proxy Hashing (DCPH). Specifically, DCPH first trains a proxy hashing network to transform each category information of a dataset into a semantic discriminative hash code, called proxy hash code. Each proxy hash code can preserve the semantic information of its corresponding category well. Next, without defining the similarity between datapoints to supervise the training process of the modality-specific hashing networks, we propose a novel margin-dynamic-softmax loss to directly utilize the proxy hashing codes as supervised information. Finally, by minimizing the novel margin-dynamic-softmax loss, the modality-specific hashing networks can be trained to generate hash codes that can simultaneously preserve the cross-modal similarity and abundant semantic information well. Extensive experiments on three benchmark datasets show that the proposed method outperforms the state-of-the-art baselines in the cross-modal retrieval tasks.

KW - Cross-modal retrieval

KW - deep supervised hashing

KW - margin-dynamic-softmax loss

KW - proxy code

UR - http://www.scopus.com/inward/record.url?scp=85133752404&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2022.3187023

DO - 10.1109/TKDE.2022.3187023

M3 - Article

AN - SCOPUS:85133752404

SN - 1041-4347

VL - 35

SP - 6798

EP - 6810

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

IS - 7

ER -

Deep Cross-Modal Proxy Hashing

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this