TY - JOUR
T1 - Intra-Category Aware Hierarchical Supervised Document Hashing
AU - Guo, Jia Nan
AU - Mao, Xian Ling
AU - Wei, Wei
AU - Huang, Heyan
N1 - Publisher Copyright:
© 1989-2012 IEEE.
PY - 2023/6/1
Y1 - 2023/6/1
N2 - Document hashing is a powerful paradigm for document retrieval, which maps high-dimensional documents to compact hashing codes with preserving the similarity of original data. While fairly successful, the existing document hashing methods do not consider the relevance relationship among different documents from a category and the hierarchical relationship among categories. Intuitively, the intra-category relevance connects related concepts among different documents, which can supplement the omitted information for each document; meanwhile the hierarchical categories can help to identify whether mistakes occur in leaf categories or parent categories, which can be used to reduce the mistakes occurring in parent categories that are often more serious. Inspired by above intuitions, we propose a novel Intra-category aware Hierarchical supervised Document Hashing, called IHDH. Specifically, IHDH is a binary autoencoder architecture equipped with two novel components: intra-category component and hierarchy component. The intra-category component exploits the difference among latent semantic representations of different documents from a category to supplement the omitted information for each document. The hierarchy component utilizes the hierarchical structure to transform the probabilities of leaf categories into the probabilities of parent categories by union operation, and then gives a further parent-level penalty to reduce the mistakes occurring in parent categories. Extensive experiments over three benchmark datasets show that IHDH significantly outperforms the state-of-the-art baselines.
AB - Document hashing is a powerful paradigm for document retrieval, which maps high-dimensional documents to compact hashing codes with preserving the similarity of original data. While fairly successful, the existing document hashing methods do not consider the relevance relationship among different documents from a category and the hierarchical relationship among categories. Intuitively, the intra-category relevance connects related concepts among different documents, which can supplement the omitted information for each document; meanwhile the hierarchical categories can help to identify whether mistakes occur in leaf categories or parent categories, which can be used to reduce the mistakes occurring in parent categories that are often more serious. Inspired by above intuitions, we propose a novel Intra-category aware Hierarchical supervised Document Hashing, called IHDH. Specifically, IHDH is a binary autoencoder architecture equipped with two novel components: intra-category component and hierarchy component. The intra-category component exploits the difference among latent semantic representations of different documents from a category to supplement the omitted information for each document. The hierarchy component utilizes the hierarchical structure to transform the probabilities of leaf categories into the probabilities of parent categories by union operation, and then gives a further parent-level penalty to reduce the mistakes occurring in parent categories. Extensive experiments over three benchmark datasets show that IHDH significantly outperforms the state-of-the-art baselines.
KW - Semantic hashing
KW - document retrieval
KW - hierarchical categories
UR - http://www.scopus.com/inward/record.url?scp=85127056111&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2022.3161807
DO - 10.1109/TKDE.2022.3161807
M3 - Article
AN - SCOPUS:85127056111
SN - 1041-4347
VL - 35
SP - 6003
EP - 6013
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 6
ER -