TY - JOUR
T1 - Indexing probabilistic data for supporting range query over big data
AU - Zhu, Rui
AU - Wang, Bin
AU - Yang, Xiao Chun
AU - Wang, Guo Ren
N1 - Publisher Copyright:
© 2016, Science Press. All right reserved.
PY - 2016/10/1
Y1 - 2016/10/1
N2 - With the increasing of data scale, big data management is great significant. Underlying the popular mathematical models, probabilistic model is suitable for big data management since it could compress volume of data into a few probabilistic data. Therefore, it is significant for studying the problem of probabilistic data management over big data environment. As a classic query, range query over probabilistic data has been fully studied. However, the state of art efforts are not suitable since they all suffer from highly updating cost. In this paper, we propose a novel index named HGD-Tree for solving this problem. First of all, we propose a group of novel strategies for handling newly arrival objects. In this way, we could efficiently apply the insertion, deletion, and updating on the premise of balancing tree structure. In addition, we propose a novel partition-based structure to approach the probability density function of object, where the structure could self-adjust the partition resolution so as to cater for the underlying of uncertain data. Besides, our proposed structure is expressed by a few bit vectors. The above two strategies guarantee low space cost of the proposed index. Last but not least, we propose a novel algorithm for supporting the range query which could effectively apply the pruning under few bitwise operations. Theoretical analysis and extensive experimental results demonstrate the effectiveness of the proposed algorithms.
AB - With the increasing of data scale, big data management is great significant. Underlying the popular mathematical models, probabilistic model is suitable for big data management since it could compress volume of data into a few probabilistic data. Therefore, it is significant for studying the problem of probabilistic data management over big data environment. As a classic query, range query over probabilistic data has been fully studied. However, the state of art efforts are not suitable since they all suffer from highly updating cost. In this paper, we propose a novel index named HGD-Tree for solving this problem. First of all, we propose a group of novel strategies for handling newly arrival objects. In this way, we could efficiently apply the insertion, deletion, and updating on the premise of balancing tree structure. In addition, we propose a novel partition-based structure to approach the probability density function of object, where the structure could self-adjust the partition resolution so as to cater for the underlying of uncertain data. Besides, our proposed structure is expressed by a few bit vectors. The above two strategies guarantee low space cost of the proposed index. Last but not least, we propose a novel algorithm for supporting the range query which could effectively apply the pruning under few bitwise operations. Theoretical analysis and extensive experimental results demonstrate the effectiveness of the proposed algorithms.
KW - Big-data
KW - Index
KW - Multi-resolution grid
KW - Range query
KW - Summary
UR - http://www.scopus.com/inward/record.url?scp=84992358966&partnerID=8YFLogxK
U2 - 10.11897/SP.J.1016.2016.01929
DO - 10.11897/SP.J.1016.2016.01929
M3 - Article
AN - SCOPUS:84992358966
SN - 0254-4164
VL - 39
SP - 1929
EP - 1946
JO - Jisuanji Xuebao/Chinese Journal of Computers
JF - Jisuanji Xuebao/Chinese Journal of Computers
IS - 10
ER -