基于高速乱序流的Top-k连续查询算法

Translated title of the contribution: Continuous Top-k Query Over High Speed Unorder Streaming Data

Rui Zhu, Bin Wang*, Xiao Chun Yang, Guo Ren Wang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Citations (Scopus)

Abstract

Continuous top-k query over sliding window is a fundamental problem in the streaming data environment. It monitors the query window, and returns the k objects with highest scores when the window slides. Due to its importance, many researchers have studied this problem, where many efficient algorithms are proposed accordingly. However, these algorithms are heavily influenced by an assumption that data streams can be viewed as sequences of data that arrived in order. Unfortunately, this assumption is not true in real applications, since streams are not often so well behaved and disruptions of various sorts are endemic. Worse still, these algorithms are sensitive to the order relationship among streaming data, where they are not able to efficiently support continuous top-k query over un-ordered streaming data. Due to the universal of un-ordered streams and the importance of continuous top-k query, in this paper, we study the problem of continuous top-k query over un-ordered streaming data. Compared with the traditional problem definition that returns k objects with highest scores in the window, it returns the k objects with highest scores that are both arriving in the window and satisfying other time-sequence constraints. In this paper, we propose a novel framework called GSTopK(Group Stack TopK) to support top-k query over un-ordered stream.The key of GSTopK is to efficiently maintain a subset of the objects in the window. When the window slides, the new results could be retrieved from this subset as much as possible. In order to efficiently maintain candidates, GSTopK should prune newly arrival objects that are having no chance to become query results for one thing. For another, GSTopK should be efficiently maintain candidates and insensitive to the order relationship among streaming data. In order to achieve the first goal, we propose the concept of boundary window and boundary objects. In this way, we could prune newly arrived objects via maintaining a group boundary objects. We use them as the keys of hash-filter, and prune newly arrived objects created at different moments. Note that, our proposed hash-filters fully consider the distribution of streaming data, which could effectively work under both time-unrelated data distribution and time-related data distribution. In addition, we can make sure that the pruned objects have no chance to become the query results. In order to achieve the second goal, we propose a stack-based algorithm to maintain candidates, where the candidates are the objects that are not pruned by the filter. One benefit is we only utilize stack operations to maintain both the dominance and ordered relationship among candidates. Compared with the other efforts, stack-based algorithm could reduce the cost a lot. Another benefit of this algorithm is that it is insensitive to the order relationship among streaming data. Assuming the window length is N, and the stream speed is s, theoretical analysis result shows that the incremental maintenance complexity of GSTopK could be reduced from O(N) to O(max(1,Nk/s 2)). Last of all, we evaluate the performance of our proposed framework. The experiment results show that GSTopK is efficient and insensitive to the order relationship among streaming data.

Translated title of the contributionContinuous Top-k Query Over High Speed Unorder Streaming Data
Original languageChinese (Traditional)
Pages (from-to)1693-1708
Number of pages16
JournalJisuanji Xuebao/Chinese Journal of Computers
Volume41
Issue number8
DOIs
Publication statusPublished - 1 Aug 2018
Externally publishedYes

Fingerprint

Dive into the research topics of 'Continuous Top-k Query Over High Speed Unorder Streaming Data'. Together they form a unique fingerprint.

Cite this