Intra-Category Aware Hierarchical Supervised Document Hashing

Jia Nan Guo, Xian Ling Mao*, Wei Wei, Heyan Huang

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

摘要

Document hashing is a powerful paradigm for document retrieval, which maps high-dimensional documents to compact hashing codes with preserving the similarity of original data. While fairly successful, the existing document hashing methods do not consider the relevance relationship among different documents from a category and the hierarchical relationship among categories. Intuitively, the intra-category relevance connects related concepts among different documents, which can supplement the omitted information for each document; meanwhile the hierarchical categories can help to identify whether mistakes occur in leaf categories or parent categories, which can be used to reduce the mistakes occurring in parent categories that are often more serious. Inspired by above intuitions, we propose a novel Intra-category aware Hierarchical supervised Document Hashing, called IHDH. Specifically, IHDH is a binary autoencoder architecture equipped with two novel components: intra-category component and hierarchy component. The intra-category component exploits the difference among latent semantic representations of different documents from a category to supplement the omitted information for each document. The hierarchy component utilizes the hierarchical structure to transform the probabilities of leaf categories into the probabilities of parent categories by union operation, and then gives a further parent-level penalty to reduce the mistakes occurring in parent categories. Extensive experiments over three benchmark datasets show that IHDH significantly outperforms the state-of-the-art baselines.

源语言英语
页(从-至)6003-6013
页数11
期刊IEEE Transactions on Knowledge and Data Engineering
35
6
DOI
出版状态已出版 - 1 6月 2023

指纹

探究 'Intra-Category Aware Hierarchical Supervised Document Hashing' 的科研主题。它们共同构成独一无二的指纹。

引用此