TY - GEN
T1 - Efficiently Sampling and Estimating Hypergraphs By Hybrid Random Walk
AU - Zhang, Lingling
AU - Zhang, Zhiwei
AU - Wang, Guoren
AU - Yuan, Ye
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Hypergraphs provide a powerful tool for representing group interactions in complicated networks. Analyzing statical properties of hypergraphs by sampling is an increasing fundamental research problem in the field of data processing. However, the state-of-the-art sampling methods either focus on pairwise graphs or are insensitive to the structures formed by vertices and hyperedges, resulting in estimations with low accuracy and efficiency. To efficiently characterize the properties of both vertices and hyperedges, this paper first proposes a hybrid random walk based Markov Chain Monte Carlo (MCMC) model theoretically by carefully designing its mixture states and the transition matrix. For simplifying the implementation of this model, we develop an algorithm formed by vertex and hyperedge transitions saving costs for constructing mixture states in practice along with an estimating method for accurate estimations. Furthermore, we employ a non-backtracking strategy in the vertex transitions to accelerate the convergence of the hybrid random walk and propose to skip the sampled vertices in the hyperedge transitions to avoid being trapped in the local subgraph for improving accuracy and reducing query cost. Extensive experimental results on the real-world datasets confirm the higher accuracy and efficiency of our proposed methods than the sophisticated sampling methods.
AB - Hypergraphs provide a powerful tool for representing group interactions in complicated networks. Analyzing statical properties of hypergraphs by sampling is an increasing fundamental research problem in the field of data processing. However, the state-of-the-art sampling methods either focus on pairwise graphs or are insensitive to the structures formed by vertices and hyperedges, resulting in estimations with low accuracy and efficiency. To efficiently characterize the properties of both vertices and hyperedges, this paper first proposes a hybrid random walk based Markov Chain Monte Carlo (MCMC) model theoretically by carefully designing its mixture states and the transition matrix. For simplifying the implementation of this model, we develop an algorithm formed by vertex and hyperedge transitions saving costs for constructing mixture states in practice along with an estimating method for accurate estimations. Furthermore, we employ a non-backtracking strategy in the vertex transitions to accelerate the convergence of the hybrid random walk and propose to skip the sampled vertices in the hyperedge transitions to avoid being trapped in the local subgraph for improving accuracy and reducing query cost. Extensive experimental results on the real-world datasets confirm the higher accuracy and efficiency of our proposed methods than the sophisticated sampling methods.
UR - http://www.scopus.com/inward/record.url?scp=85167658073&partnerID=8YFLogxK
U2 - 10.1109/ICDE55515.2023.00102
DO - 10.1109/ICDE55515.2023.00102
M3 - Conference contribution
AN - SCOPUS:85167658073
T3 - Proceedings - International Conference on Data Engineering
SP - 1273
EP - 1285
BT - Proceedings - 2023 IEEE 39th International Conference on Data Engineering, ICDE 2023
PB - IEEE Computer Society
T2 - 39th IEEE International Conference on Data Engineering, ICDE 2023
Y2 - 3 April 2023 through 7 April 2023
ER -