Intra-Category Aware Hierarchical Supervised Document Hashing

Jia Nan Guo, Xian Ling Mao*, Wei Wei, Heyan Huang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Document hashing is a powerful paradigm for document retrieval, which maps high-dimensional documents to compact hashing codes with preserving the similarity of original data. While fairly successful, the existing document hashing methods do not consider the relevance relationship among different documents from a category and the hierarchical relationship among categories. Intuitively, the intra-category relevance connects related concepts among different documents, which can supplement the omitted information for each document; meanwhile the hierarchical categories can help to identify whether mistakes occur in leaf categories or parent categories, which can be used to reduce the mistakes occurring in parent categories that are often more serious. Inspired by above intuitions, we propose a novel Intra-category aware Hierarchical supervised Document Hashing, called IHDH. Specifically, IHDH is a binary autoencoder architecture equipped with two novel components: intra-category component and hierarchy component. The intra-category component exploits the difference among latent semantic representations of different documents from a category to supplement the omitted information for each document. The hierarchy component utilizes the hierarchical structure to transform the probabilities of leaf categories into the probabilities of parent categories by union operation, and then gives a further parent-level penalty to reduce the mistakes occurring in parent categories. Extensive experiments over three benchmark datasets show that IHDH significantly outperforms the state-of-the-art baselines.

Original languageEnglish
Pages (from-to)6003-6013
Number of pages11
JournalIEEE Transactions on Knowledge and Data Engineering
Volume35
Issue number6
DOIs
Publication statusPublished - 1 Jun 2023

Keywords

  • Semantic hashing
  • document retrieval
  • hierarchical categories

Fingerprint

Dive into the research topics of 'Intra-Category Aware Hierarchical Supervised Document Hashing'. Together they form a unique fingerprint.

Cite this