Locality-sensitive hashing for finding nearest neighbors in probability distributions

Yi Kun Tang, Xian Ling Mao*, Yi Jing Hao, Cheng Xu, Heyan Huang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Citations (Scopus)

Abstract

In the past ten years, new powerful algorithms based on efficient data structures have been proposed to solve the problem of Approximate Nearest Neighbors search (ANN). To find the nearest neighbors in probability-distribution-type data, the existing Locality Sensitive Hashing (LSH) algorithms for vector-type data can be directly used to solve it. However, these methods do not consider the special properties of probability distributions. In this paper, based on the special properties of probability distributions, we present a novel LSH scheme adapted to angular distance for ANN search in high-dimensional probability distributions. We define the specific hashing functions, and prove their local-sensitivity. Also, we propose a Sequential Interleaving algorithm based on the “Unbalance Effect” of Euclidean and angular metrics for probability distributions. Finally, we compare, through experiments, our methods with the state-of-the-art LSH algorithms in the context of ANN on six public image databases. The results prove the proposed algorithms can provide far better accuracy in the context of ANN than baselines.

Original languageEnglish
Title of host publicationSocial Media Processing - 6th National Conference, SMP 2017, Proceedings
EditorsHuan Liu, Xing Xie, Xueqi Cheng, Huawei Shen, Weiying Ma, Shizheng Feng
PublisherSpringer Verlag
Pages3-15
Number of pages13
ISBN (Print)9789811068041
DOIs
Publication statusPublished - 2017
Event6th National Conference on Social Media Processing, SMP 2017 - Beijing, China
Duration: 14 Sept 201717 Sept 2017

Publication series

NameCommunications in Computer and Information Science
Volume774
ISSN (Print)1865-0929

Conference

Conference6th National Conference on Social Media Processing, SMP 2017
Country/TerritoryChina
CityBeijing
Period14/09/1717/09/17

Keywords

  • Approximate nearest neighbors
  • Arccos-distance
  • Locality sensitive hashing

Fingerprint

Dive into the research topics of 'Locality-sensitive hashing for finding nearest neighbors in probability distributions'. Together they form a unique fingerprint.

Cite this