TY - GEN
T1 - Locality-sensitive hashing for finding nearest neighbors in probability distributions
AU - Tang, Yi Kun
AU - Mao, Xian Ling
AU - Hao, Yi Jing
AU - Xu, Cheng
AU - Huang, Heyan
N1 - Publisher Copyright:
© Springer Nature Singapore Pte Ltd. 2017.
PY - 2017
Y1 - 2017
N2 - In the past ten years, new powerful algorithms based on efficient data structures have been proposed to solve the problem of Approximate Nearest Neighbors search (ANN). To find the nearest neighbors in probability-distribution-type data, the existing Locality Sensitive Hashing (LSH) algorithms for vector-type data can be directly used to solve it. However, these methods do not consider the special properties of probability distributions. In this paper, based on the special properties of probability distributions, we present a novel LSH scheme adapted to angular distance for ANN search in high-dimensional probability distributions. We define the specific hashing functions, and prove their local-sensitivity. Also, we propose a Sequential Interleaving algorithm based on the “Unbalance Effect” of Euclidean and angular metrics for probability distributions. Finally, we compare, through experiments, our methods with the state-of-the-art LSH algorithms in the context of ANN on six public image databases. The results prove the proposed algorithms can provide far better accuracy in the context of ANN than baselines.
AB - In the past ten years, new powerful algorithms based on efficient data structures have been proposed to solve the problem of Approximate Nearest Neighbors search (ANN). To find the nearest neighbors in probability-distribution-type data, the existing Locality Sensitive Hashing (LSH) algorithms for vector-type data can be directly used to solve it. However, these methods do not consider the special properties of probability distributions. In this paper, based on the special properties of probability distributions, we present a novel LSH scheme adapted to angular distance for ANN search in high-dimensional probability distributions. We define the specific hashing functions, and prove their local-sensitivity. Also, we propose a Sequential Interleaving algorithm based on the “Unbalance Effect” of Euclidean and angular metrics for probability distributions. Finally, we compare, through experiments, our methods with the state-of-the-art LSH algorithms in the context of ANN on six public image databases. The results prove the proposed algorithms can provide far better accuracy in the context of ANN than baselines.
KW - Approximate nearest neighbors
KW - Arccos-distance
KW - Locality sensitive hashing
UR - http://www.scopus.com/inward/record.url?scp=85034231045&partnerID=8YFLogxK
U2 - 10.1007/978-981-10-6805-8_1
DO - 10.1007/978-981-10-6805-8_1
M3 - Conference contribution
AN - SCOPUS:85034231045
SN - 9789811068041
T3 - Communications in Computer and Information Science
SP - 3
EP - 15
BT - Social Media Processing - 6th National Conference, SMP 2017, Proceedings
A2 - Liu, Huan
A2 - Xie, Xing
A2 - Cheng, Xueqi
A2 - Shen, Huawei
A2 - Ma, Weiying
A2 - Feng, Shizheng
PB - Springer Verlag
T2 - 6th National Conference on Social Media Processing, SMP 2017
Y2 - 14 September 2017 through 17 September 2017
ER -