TY - JOUR
T1 - Approximately Counting Butterflies in Large Bipartite Graph Streams
AU - Li, Rundong
AU - Wang, Pinghui
AU - Jia, Peng
AU - Zhang, Xiangliang
AU - Zhao, Junzhou
AU - Tao, Jing
AU - Yuan, Ye
AU - Guan, Xiaohong
N1 - Publisher Copyright:
© 1989-2012 IEEE.
PY - 2022/12/1
Y1 - 2022/12/1
N2 - Bipartite graphs widely exist in real-world scenarios and model binary relations like host-website, author-paper, and user-product. In bipartite graphs, a butterfly (i.e., $2\times 2$2×2 bi-clique) is the smallest non-trivial cohesive structure and plays an important role in applications such as anomaly detection. Considerable efforts focus on counting butterflies in static bipartite graphs. However, they suffer from high time and space complexity when the bipartite graph of interest is given as a stream of edges. Although there are methods for approximately counting butterflies from bipartite graph streams, they suffer from either low accuracy or high time complexity. Therefore, it is still a challenge to accurately estimate butterfly counts from bipartite graph streams in a short time. To address this issue, we develop novel algorithms by exploiting the bipartite nature, which subtly integrates sampling and sketching techniques. We provide accurate estimators for butterfly counts and derive simple yet exact formulas for bounding their errors. We also conduct extensive experiments on a variety of real-world large bipartite graphs. Experimental results demonstrate that our algorithms are up to 20.0 times more accurate and up to 286.3 times faster than state-of-the-art methods under the same memory usage.
AB - Bipartite graphs widely exist in real-world scenarios and model binary relations like host-website, author-paper, and user-product. In bipartite graphs, a butterfly (i.e., $2\times 2$2×2 bi-clique) is the smallest non-trivial cohesive structure and plays an important role in applications such as anomaly detection. Considerable efforts focus on counting butterflies in static bipartite graphs. However, they suffer from high time and space complexity when the bipartite graph of interest is given as a stream of edges. Although there are methods for approximately counting butterflies from bipartite graph streams, they suffer from either low accuracy or high time complexity. Therefore, it is still a challenge to accurately estimate butterfly counts from bipartite graph streams in a short time. To address this issue, we develop novel algorithms by exploiting the bipartite nature, which subtly integrates sampling and sketching techniques. We provide accurate estimators for butterfly counts and derive simple yet exact formulas for bounding their errors. We also conduct extensive experiments on a variety of real-world large bipartite graphs. Experimental results demonstrate that our algorithms are up to 20.0 times more accurate and up to 286.3 times faster than state-of-the-art methods under the same memory usage.
KW - Butterfly count approximation
KW - bipartite graph stream
UR - http://www.scopus.com/inward/record.url?scp=85102313954&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2021.3062987
DO - 10.1109/TKDE.2021.3062987
M3 - Article
AN - SCOPUS:85102313954
SN - 1041-4347
VL - 34
SP - 5621
EP - 5635
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 12
ER -