LogLog filter: Filtering cold items within a large range over high speed data streams

Peng Jia, Pinghui Wang*, Junzhou Zhao, Ye Yuan, Jing Tao, Xiaohong Guan

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

14 引用 (Scopus)

摘要

Many real-world datasets are given in the format of data streams, and processing these data streams is fundamental for many applications such as anomaly detection. In this paper, we study the problem of computing item frequencies, finding top-k hot items, and detecting heavy changes. However, the widely-used sketches cost large memory usage and their performance is easily affected by the unbalanced distribution of data streams. To solve this issue, a novel method Cold Filter (CF) is proposed to split cold items and hot items, and use a separate structure to record the frequencies of hot items. Typically, CF has a small filter range and is only effective for filtering cold items with small frequencies. For some real-world applications, however, the cold items' frequencies may also be greater than hundreds or even tens of thousands. To solve the above challenges, we exploit the "LogLog"structure and develop a memory-efficient method LogLog Filter (LLF) to accurately estimate the above three metrics. LLF builds a register array where each register approximately counts the sum of item frequencies hashed into it. Our method remarkably enlarges the filter range of CF with fewer bits and only requires 4 bits to filter cold items with frequencies up to {2{{24}}}. We conduct extensive experiments on real-world and synthetic datasets, and the experimental results demonstrate the efficiency and effectiveness of our method.

源语言英语
主期刊名Proceedings - 2021 IEEE 37th International Conference on Data Engineering, ICDE 2021
出版商IEEE Computer Society
804-815
页数12
ISBN(电子版)9781728191843
DOI
出版状态已出版 - 4月 2021
活动37th IEEE International Conference on Data Engineering, ICDE 2021 - Virtual, Chania, 希腊
期限: 19 4月 202122 4月 2021

出版系列

姓名Proceedings - International Conference on Data Engineering
2021-April
ISSN(印刷版)1084-4627

会议

会议37th IEEE International Conference on Data Engineering, ICDE 2021
国家/地区希腊
Virtual, Chania
时期19/04/2122/04/21

指纹

探究 'LogLog filter: Filtering cold items within a large range over high speed data streams' 的科研主题。它们共同构成独一无二的指纹。

引用此