Unsupervised Cross-Modal Hashing via Semantic Text Mining

Rong Cheng Tu; Xian Ling Mao; Qinghong Lin; Wenjin Ji; Weize Qin; Wei Wei; Heyan Huang

doi:10.1109/TMM.2023.3243608

Unsupervised Cross-Modal Hashing via Semantic Text Mining

Rong Cheng Tu, Xian Ling Mao^*, Qinghong Lin, Wenjin Ji, Weize Qin, Wei Wei, Heyan Huang

^*Corresponding author for this work

School of Computer Science and Technology

Research output: Contribution to journal › Article › peer-review

29 Citations (Scopus)

Abstract

Cross-modal hashing has been widely used in multimedia retrieval tasks due to its fast retrieval speed and low storage cost. Recently, many deep unsupervised cross-modal hashing methods have been proposed to deal the unlabeled datasets. These methods usually construct an instance similarity matrix by fusing the image and text modality-specific similarity matrices as the guiding information to train the hashing networks. However, most of them directly use cosine similarities between the bag-of-words (BoW) vectors of text datapoints to define the text modality-specific similarity matrix, which fails to mine the semantic similarity information contained in the text modal datapoints and leads to the poor quality of the instance similarity matrix. To tackle the aforementioned problem, in this paper, we propose a novel Unsupervised Cross-modal Hashing via Semantic Text Mining, called UCHSTM. Specifically, UCHSTM first mines the correlations between the words of text datapoints. Then, UCHSTM constructs the text modality-specific similarity matrix for the training instances based on the mined correlations between their words. Next, UCHSTM fuses the image and text modality-specific similarity matrices as the final instance similarity matrix to guide the training of hashing model. Furthermore, during the process of training the hashing networks, a novel self-redefined-similarity loss is proposed to further correct some wrong defined similarities in the constructed instance similarity matrix, thereby further enhancing the retrieval performance. Extensive experiments on two widely used datasets show that the proposed UCHSTM outperforms state-of-the-art baselines on cross-modal retrieval tasks.

Original language	English
Pages (from-to)	8946-8957
Number of pages	12
Journal	IEEE Transactions on Multimedia
Volume	25
DOIs	https://doi.org/10.1109/TMM.2023.3243608
Publication status	Published - 2023

Keywords

Cross-modal retrieval
deep supervised hashing
self-redefined-similarity loss
semantic text mining

Access to Document

10.1109/TMM.2023.3243608

Cite this

Tu, R. C., Mao, X. L., Lin, Q., Ji, W., Qin, W., Wei, W., & Huang, H. (2023). Unsupervised Cross-Modal Hashing via Semantic Text Mining. IEEE Transactions on Multimedia, 25, 8946-8957. https://doi.org/10.1109/TMM.2023.3243608

@article{c014dc006f1145758c7242db1de9f992,

title = "Unsupervised Cross-Modal Hashing via Semantic Text Mining",

abstract = "Cross-modal hashing has been widely used in multimedia retrieval tasks due to its fast retrieval speed and low storage cost. Recently, many deep unsupervised cross-modal hashing methods have been proposed to deal the unlabeled datasets. These methods usually construct an instance similarity matrix by fusing the image and text modality-specific similarity matrices as the guiding information to train the hashing networks. However, most of them directly use cosine similarities between the bag-of-words (BoW) vectors of text datapoints to define the text modality-specific similarity matrix, which fails to mine the semantic similarity information contained in the text modal datapoints and leads to the poor quality of the instance similarity matrix. To tackle the aforementioned problem, in this paper, we propose a novel Unsupervised Cross-modal Hashing via Semantic Text Mining, called UCHSTM. Specifically, UCHSTM first mines the correlations between the words of text datapoints. Then, UCHSTM constructs the text modality-specific similarity matrix for the training instances based on the mined correlations between their words. Next, UCHSTM fuses the image and text modality-specific similarity matrices as the final instance similarity matrix to guide the training of hashing model. Furthermore, during the process of training the hashing networks, a novel self-redefined-similarity loss is proposed to further correct some wrong defined similarities in the constructed instance similarity matrix, thereby further enhancing the retrieval performance. Extensive experiments on two widely used datasets show that the proposed UCHSTM outperforms state-of-the-art baselines on cross-modal retrieval tasks.",

keywords = "Cross-modal retrieval, deep supervised hashing, self-redefined-similarity loss, semantic text mining",

author = "Tu, {Rong Cheng} and Mao, {Xian Ling} and Qinghong Lin and Wenjin Ji and Weize Qin and Wei Wei and Heyan Huang",

note = "Publisher Copyright: {\textcopyright} 1999-2012 IEEE.",

year = "2023",

doi = "10.1109/TMM.2023.3243608",

language = "English",

volume = "25",

pages = "8946--8957",

journal = "IEEE Transactions on Multimedia",

issn = "1520-9210",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Unsupervised Cross-Modal Hashing via Semantic Text Mining

AU - Tu, Rong Cheng

AU - Mao, Xian Ling

AU - Lin, Qinghong

AU - Ji, Wenjin

AU - Qin, Weize

AU - Wei, Wei

AU - Huang, Heyan

PY - 2023

Y1 - 2023

N2 - Cross-modal hashing has been widely used in multimedia retrieval tasks due to its fast retrieval speed and low storage cost. Recently, many deep unsupervised cross-modal hashing methods have been proposed to deal the unlabeled datasets. These methods usually construct an instance similarity matrix by fusing the image and text modality-specific similarity matrices as the guiding information to train the hashing networks. However, most of them directly use cosine similarities between the bag-of-words (BoW) vectors of text datapoints to define the text modality-specific similarity matrix, which fails to mine the semantic similarity information contained in the text modal datapoints and leads to the poor quality of the instance similarity matrix. To tackle the aforementioned problem, in this paper, we propose a novel Unsupervised Cross-modal Hashing via Semantic Text Mining, called UCHSTM. Specifically, UCHSTM first mines the correlations between the words of text datapoints. Then, UCHSTM constructs the text modality-specific similarity matrix for the training instances based on the mined correlations between their words. Next, UCHSTM fuses the image and text modality-specific similarity matrices as the final instance similarity matrix to guide the training of hashing model. Furthermore, during the process of training the hashing networks, a novel self-redefined-similarity loss is proposed to further correct some wrong defined similarities in the constructed instance similarity matrix, thereby further enhancing the retrieval performance. Extensive experiments on two widely used datasets show that the proposed UCHSTM outperforms state-of-the-art baselines on cross-modal retrieval tasks.

AB - Cross-modal hashing has been widely used in multimedia retrieval tasks due to its fast retrieval speed and low storage cost. Recently, many deep unsupervised cross-modal hashing methods have been proposed to deal the unlabeled datasets. These methods usually construct an instance similarity matrix by fusing the image and text modality-specific similarity matrices as the guiding information to train the hashing networks. However, most of them directly use cosine similarities between the bag-of-words (BoW) vectors of text datapoints to define the text modality-specific similarity matrix, which fails to mine the semantic similarity information contained in the text modal datapoints and leads to the poor quality of the instance similarity matrix. To tackle the aforementioned problem, in this paper, we propose a novel Unsupervised Cross-modal Hashing via Semantic Text Mining, called UCHSTM. Specifically, UCHSTM first mines the correlations between the words of text datapoints. Then, UCHSTM constructs the text modality-specific similarity matrix for the training instances based on the mined correlations between their words. Next, UCHSTM fuses the image and text modality-specific similarity matrices as the final instance similarity matrix to guide the training of hashing model. Furthermore, during the process of training the hashing networks, a novel self-redefined-similarity loss is proposed to further correct some wrong defined similarities in the constructed instance similarity matrix, thereby further enhancing the retrieval performance. Extensive experiments on two widely used datasets show that the proposed UCHSTM outperforms state-of-the-art baselines on cross-modal retrieval tasks.

KW - Cross-modal retrieval

KW - deep supervised hashing

KW - self-redefined-similarity loss

KW - semantic text mining

UR - http://www.scopus.com/inward/record.url?scp=85149407617&partnerID=8YFLogxK

U2 - 10.1109/TMM.2023.3243608

DO - 10.1109/TMM.2023.3243608

M3 - Article

AN - SCOPUS:85149407617

SN - 1520-9210

VL - 25

SP - 8946

EP - 8957

JO - IEEE Transactions on Multimedia

JF - IEEE Transactions on Multimedia

ER -

Unsupervised Cross-Modal Hashing via Semantic Text Mining

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this