LogLog filter: Filtering cold items within a large range over high speed data streams

Peng Jia, Pinghui Wang*, Junzhou Zhao, Ye Yuan, Jing Tao, Xiaohong Guan

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

14 Citations (Scopus)

Abstract

Many real-world datasets are given in the format of data streams, and processing these data streams is fundamental for many applications such as anomaly detection. In this paper, we study the problem of computing item frequencies, finding top-k hot items, and detecting heavy changes. However, the widely-used sketches cost large memory usage and their performance is easily affected by the unbalanced distribution of data streams. To solve this issue, a novel method Cold Filter (CF) is proposed to split cold items and hot items, and use a separate structure to record the frequencies of hot items. Typically, CF has a small filter range and is only effective for filtering cold items with small frequencies. For some real-world applications, however, the cold items' frequencies may also be greater than hundreds or even tens of thousands. To solve the above challenges, we exploit the "LogLog"structure and develop a memory-efficient method LogLog Filter (LLF) to accurately estimate the above three metrics. LLF builds a register array where each register approximately counts the sum of item frequencies hashed into it. Our method remarkably enlarges the filter range of CF with fewer bits and only requires 4 bits to filter cold items with frequencies up to {2{{24}}}. We conduct extensive experiments on real-world and synthetic datasets, and the experimental results demonstrate the efficiency and effectiveness of our method.

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE 37th International Conference on Data Engineering, ICDE 2021
PublisherIEEE Computer Society
Pages804-815
Number of pages12
ISBN (Electronic)9781728191843
DOIs
Publication statusPublished - Apr 2021
Event37th IEEE International Conference on Data Engineering, ICDE 2021 - Virtual, Chania, Greece
Duration: 19 Apr 202122 Apr 2021

Publication series

NameProceedings - International Conference on Data Engineering
Volume2021-April
ISSN (Print)1084-4627

Conference

Conference37th IEEE International Conference on Data Engineering, ICDE 2021
Country/TerritoryGreece
CityVirtual, Chania
Period19/04/2122/04/21

Fingerprint

Dive into the research topics of 'LogLog filter: Filtering cold items within a large range over high speed data streams'. Together they form a unique fingerprint.

Cite this