TY - JOUR
T1 - Fast filtering false active subspaces for efficient high dimensional similarity processing
AU - Wang, Guoren
AU - Yu, Ge
AU - Xin, Junchang
AU - Zhao, Yuhai
AU - Zhang, Ende
PY - 2009/2
Y1 - 2009/2
N2 - The query space of a similarity query is usually narrowed down by pruning inactive query subspaces which contain no query results and keeping active query subspaces which may contain objects corresponding to the request. However, some active query subspaces may contain no query results at all, those are called false active query subspaces. It is obvious that the performance of query processing degrades in the presence of false active query subspaces. Our experiments show that this problem becomes seriously when the data are high dimensional and the number of accesses to false active subspaces increases as the dimensionality increases. In order to solve this problem, this paper proposes a space mapping approach to reducing such unnecessary accesses. A given query space can be refined by filtering within its mapped space. To do so, a mapping strategy called maxgap is proposed to improve the efficiency of the refinement processing. Based on the mapping strategy, an index structure called MS-tree and algorithms of query processing are presented in this paper. Finally, the performance of MS-tree is compared with that of other competitors in terms of range queries on a real data set.
AB - The query space of a similarity query is usually narrowed down by pruning inactive query subspaces which contain no query results and keeping active query subspaces which may contain objects corresponding to the request. However, some active query subspaces may contain no query results at all, those are called false active query subspaces. It is obvious that the performance of query processing degrades in the presence of false active query subspaces. Our experiments show that this problem becomes seriously when the data are high dimensional and the number of accesses to false active subspaces increases as the dimensionality increases. In order to solve this problem, this paper proposes a space mapping approach to reducing such unnecessary accesses. A given query space can be refined by filtering within its mapped space. To do so, a mapping strategy called maxgap is proposed to improve the efficiency of the refinement processing. Based on the mapping strategy, an index structure called MS-tree and algorithms of query processing are presented in this paper. Finally, the performance of MS-tree is compared with that of other competitors in terms of range queries on a real data set.
KW - False active subspace
KW - High dimensional index
KW - Refining processing
UR - http://www.scopus.com/inward/record.url?scp=65249143872&partnerID=8YFLogxK
U2 - 10.1007/s11432-009-0051-7
DO - 10.1007/s11432-009-0051-7
M3 - Article
AN - SCOPUS:65249143872
SN - 1009-2757
VL - 52
SP - 286
EP - 294
JO - Science in China, Series F: Information Sciences
JF - Science in China, Series F: Information Sciences
IS - 2
ER -