Approximately Counting Butterflies in Large Bipartite Graph Streams

Rundong Li, Pinghui Wang*, Peng Jia, Xiangliang Zhang, Junzhou Zhao, Jing Tao, Ye Yuan, Xiaohong Guan

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

6 Citations (Scopus)

Abstract

Bipartite graphs widely exist in real-world scenarios and model binary relations like host-website, author-paper, and user-product. In bipartite graphs, a butterfly (i.e., $2\times 2$2×2 bi-clique) is the smallest non-trivial cohesive structure and plays an important role in applications such as anomaly detection. Considerable efforts focus on counting butterflies in static bipartite graphs. However, they suffer from high time and space complexity when the bipartite graph of interest is given as a stream of edges. Although there are methods for approximately counting butterflies from bipartite graph streams, they suffer from either low accuracy or high time complexity. Therefore, it is still a challenge to accurately estimate butterfly counts from bipartite graph streams in a short time. To address this issue, we develop novel algorithms by exploiting the bipartite nature, which subtly integrates sampling and sketching techniques. We provide accurate estimators for butterfly counts and derive simple yet exact formulas for bounding their errors. We also conduct extensive experiments on a variety of real-world large bipartite graphs. Experimental results demonstrate that our algorithms are up to 20.0 times more accurate and up to 286.3 times faster than state-of-the-art methods under the same memory usage.

Original languageEnglish
Pages (from-to)5621-5635
Number of pages15
JournalIEEE Transactions on Knowledge and Data Engineering
Volume34
Issue number12
DOIs
Publication statusPublished - 1 Dec 2022

Keywords

  • Butterfly count approximation
  • bipartite graph stream

Fingerprint

Dive into the research topics of 'Approximately Counting Butterflies in Large Bipartite Graph Streams'. Together they form a unique fingerprint.

Cite this