LASH: Large-Scale Academic Deep Semantic Hashing

Jia Nan Guo, Xian Ling Mao*, Tian Lan, Rong Xin Tu, Wei Wei, Heyan Huang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

8 Citations (Scopus)

Abstract

With the explosively increasing of academic papers, efficient academic document retrieval is becoming an essential requirement for large-scale information retrieval systems. Inspired by the success of deep semantic hashing in normal document retrieval, deep semantic hashing is a promising approach for academic document retrieval by mapping academic documents into efficient hash codes. However, for academic document retrieval, the existing deep semantic hashing methods suffer from following two problems: (1) they cannot differentiate the importance of different field labels; (2) they cannot plenty utilize the structure information in paper citations. To address these problems, we propose a novel Large-scale Academic deep Semantic Hashing, called LASH. Specifically, LASH first treats paper citations as a citation network, and then employs a multi-input deep autoencoder to directly encode both structure information of the citation network and semantic information of academic documents into unified hash codes. Moreover, a weighted percentage similarity is designed to measure the importance of different field labels, which is a linear combination of Jaccard and Cosine similarity. Supervised by the similarity, the learned unified hash codes can further preserve the importance of different field labels. Extensive experiments show LASH significantly outperforms state-of-The-Art baselines over proposed three real-world large-scale academic document datasets.

Original languageEnglish
Pages (from-to)1734-1746
Number of pages13
JournalIEEE Transactions on Knowledge and Data Engineering
Volume35
Issue number2
DOIs
Publication statusPublished - 1 Feb 2023

Keywords

  • Information retrieval
  • academic paper
  • citation network
  • document retrieval
  • semantic hashing

Fingerprint

Dive into the research topics of 'LASH: Large-Scale Academic Deep Semantic Hashing'. Together they form a unique fingerprint.

Cite this