Finding needles in a hay stream: On persistent item lookup in data streams

Lin Chen*, Haipeng Dai, Lei Meng, Jihong Yu

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

5 引用 (Scopus)

摘要

In a data stream composed of an ordered sequence of data items, persistent items refer to those persisting to occur over a long timespan. Compared with ordinary items, persistent ones, though not necessarily occurring more frequently, typically convey more valuable information. Persistent item lookup, the functionality to identify all persistent items, emerges as a pivotal building block in many computing and network systems. In this paper, we devise a generic persistent item lookup algorithm supporting high-speed, high-accuracy lookup with limited memory cost. The key technicalities we propose in our design are two-fold. First, our algorithm attempts to record only persistent items seen so far based on the currently available information about the stream, thus significantly reducing memory overhead, especially for real-life highly skewed data streams. Second, our algorithm balances the recording load in both time and space domains: in the time domain, we partition persistent items into approximately equal-size subsets and record only one subset in each epoch; in the space domain, we apply the state-of-the-art load balancing technique to evenly distribute recorded items across the on-die memory. By holistically integrating these components, we iron out a persistent item lookup algorithm outperforming existing solutions in a wide range of practical settings.

源语言英语
文章编号107518
期刊Computer Networks
181
DOI
出版状态已出版 - 9 11月 2020

指纹

探究 'Finding needles in a hay stream: On persistent item lookup in data streams' 的科研主题。它们共同构成独一无二的指纹。

引用此