Indexing probabilistic data for supporting range query over big data

Rui Zhu, Bin Wang*, Xiao Chun Yang, Guo Ren Wang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

With the increasing of data scale, big data management is great significant. Underlying the popular mathematical models, probabilistic model is suitable for big data management since it could compress volume of data into a few probabilistic data. Therefore, it is significant for studying the problem of probabilistic data management over big data environment. As a classic query, range query over probabilistic data has been fully studied. However, the state of art efforts are not suitable since they all suffer from highly updating cost. In this paper, we propose a novel index named HGD-Tree for solving this problem. First of all, we propose a group of novel strategies for handling newly arrival objects. In this way, we could efficiently apply the insertion, deletion, and updating on the premise of balancing tree structure. In addition, we propose a novel partition-based structure to approach the probability density function of object, where the structure could self-adjust the partition resolution so as to cater for the underlying of uncertain data. Besides, our proposed structure is expressed by a few bit vectors. The above two strategies guarantee low space cost of the proposed index. Last but not least, we propose a novel algorithm for supporting the range query which could effectively apply the pruning under few bitwise operations. Theoretical analysis and extensive experimental results demonstrate the effectiveness of the proposed algorithms.

Original languageEnglish
Pages (from-to)1929-1946
Number of pages18
JournalJisuanji Xuebao/Chinese Journal of Computers
Volume39
Issue number10
DOIs
Publication statusPublished - 1 Oct 2016
Externally publishedYes

Keywords

  • Big-data
  • Index
  • Multi-resolution grid
  • Range query
  • Summary

Fingerprint

Dive into the research topics of 'Indexing probabilistic data for supporting range query over big data'. Together they form a unique fingerprint.

Cite this