Efficiently Counting Triangles for Hypergraph Streams by Reservoir-Based Sampling

Lingling Zhang, Zhiwei Zhang*, Guoren Wang*, Ye Yuan, Kangfei Zhao

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Hypergraph streams provide an efficient model to express and preserve complex connections in various applications. Triangles in a hypergraph can be formed by vertices and hyperedges. The specific counts of triangles are important to analyze various applications. Due to the huge costs of counting triangles based on the whole datasets, a sampling-and-estimating framework has low overhead while obtaining a relatively accurate result. However, existing sampling algorithms focus on pairwise graph streams, they estimate the counts of triangles formed by vertices with large estimation errors and can not be applied to count triangles formed by hyperedges. Therefore, this paper first proposes a sampling-and-estimating framework that produces hyperedge samples using a reservoir with static capacity to estimate the total counts of triangles by inferring the probabilities of forming the triangles respectively. Furthermore, to improve the estimation accuracy, this paper proposes another sampling-and-estimating framework to produce samples in the form of hyperedge pairs which can be used to compute the probabilities of the formations of triangles more accurately and then estimate the total triangle counts with smaller estimation variances. The extensive experiments based on real-world datasets confirm the efficiency and accuracy of our proposed frameworks for counting triangles in different types of hypergraphs at a small sampling ratio.

Original languageEnglish
Pages (from-to)11328-11341
Number of pages14
JournalIEEE Transactions on Knowledge and Data Engineering
Volume35
Issue number11
DOIs
Publication statusPublished - 1 Nov 2023

Keywords

  • Hypergraph streams
  • reservoir-based sampling
  • social networks
  • triangles counting

Fingerprint

Dive into the research topics of 'Efficiently Counting Triangles for Hypergraph Streams by Reservoir-Based Sampling'. Together they form a unique fingerprint.

Cite this