Abstract
Hypergraph streams provide an efficient model to express and preserve complex connections in various applications. Triangles in a hypergraph can be formed by vertices and hyperedges. The specific counts of triangles are important to analyze various applications. Due to the huge costs of counting triangles based on the whole datasets, a sampling-and-estimating framework has low overhead while obtaining a relatively accurate result. However, existing sampling algorithms focus on pairwise graph streams, they estimate the counts of triangles formed by vertices with large estimation errors and can not be applied to count triangles formed by hyperedges. Therefore, this paper first proposes a sampling-and-estimating framework that produces hyperedge samples using a reservoir with static capacity to estimate the total counts of triangles by inferring the probabilities of forming the triangles respectively. Furthermore, to improve the estimation accuracy, this paper proposes another sampling-and-estimating framework to produce samples in the form of hyperedge pairs which can be used to compute the probabilities of the formations of triangles more accurately and then estimate the total triangle counts with smaller estimation variances. The extensive experiments based on real-world datasets confirm the efficiency and accuracy of our proposed frameworks for counting triangles in different types of hypergraphs at a small sampling ratio.
Original language | English |
---|---|
Pages (from-to) | 11328-11341 |
Number of pages | 14 |
Journal | IEEE Transactions on Knowledge and Data Engineering |
Volume | 35 |
Issue number | 11 |
DOIs | |
Publication status | Published - 1 Nov 2023 |
Keywords
- Hypergraph streams
- reservoir-based sampling
- social networks
- triangles counting