TY - JOUR
T1 - Batch-Orthogonal Locality-Sensitive Hashing for Angular Similarity
AU - Ji, Jianqiu
AU - Yan, Shuicheng
AU - Li, Jianmin
AU - Gao, Guangyu
AU - Tian, Qi
AU - Zhang, Bo
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/10/1
Y1 - 2014/10/1
N2 - Sign-random-projection locality-sensitive hashing (SRP-LSH) is a widely used hashing method, which provides an unbiased estimate of pairwise angular similarity, yet may suffer from its large estimation variance. We propose in this work batch-orthogonal locality-sensitive hashing (BOLSH), as a significant improvement of SRP-LSH. Instead of independent random projections, BOLSH makes use of batch-orthogonalized random projections, i.e., we divide random projection vectors into several batches and orthogonalize the vectors in each batch respectively. These batch-orthogonalized random projections partition the data space into regular regions, and thus provide a more accurate estimator. We prove theoretically that BOLSH still provides an unbiased estimate of pairwise angular similarity, with a smaller variance for any angle in (0,π), compared with SRP-LSH. Furthermore, we give a lower bound on the reduction of variance. The extensive experiments on real data well validate that with the same length of binary code, BOLSH may achieve significant mean squared error reduction in estimating pairwise angular similarity. Moreover, BOLSH shows the superiority in extensive approximate nearest neighbor (ANN) retrieval experiments.
AB - Sign-random-projection locality-sensitive hashing (SRP-LSH) is a widely used hashing method, which provides an unbiased estimate of pairwise angular similarity, yet may suffer from its large estimation variance. We propose in this work batch-orthogonal locality-sensitive hashing (BOLSH), as a significant improvement of SRP-LSH. Instead of independent random projections, BOLSH makes use of batch-orthogonalized random projections, i.e., we divide random projection vectors into several batches and orthogonalize the vectors in each batch respectively. These batch-orthogonalized random projections partition the data space into regular regions, and thus provide a more accurate estimator. We prove theoretically that BOLSH still provides an unbiased estimate of pairwise angular similarity, with a smaller variance for any angle in (0,π), compared with SRP-LSH. Furthermore, we give a lower bound on the reduction of variance. The extensive experiments on real data well validate that with the same length of binary code, BOLSH may achieve significant mean squared error reduction in estimating pairwise angular similarity. Moreover, BOLSH shows the superiority in extensive approximate nearest neighbor (ANN) retrieval experiments.
KW - Sign-random-projection
KW - angular similarity
KW - approximate nearest neighbor search
KW - locality-sensitive hashing
UR - http://www.scopus.com/inward/record.url?scp=84933037463&partnerID=8YFLogxK
U2 - 10.1109/TPAMI.2014.2315806
DO - 10.1109/TPAMI.2014.2315806
M3 - Article
AN - SCOPUS:84933037463
SN - 0162-8828
VL - 36
SP - 1963
EP - 1974
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
IS - 10
M1 - 6783789
ER -