Random walk on node cliques for high-quality samples to estimate large graphs with high accuracies and low costs

Lingling Zhang*, Fang Wang, Hong Jiang, Dan Feng, Yanwen Xie, Zhiwei Zhang, Guoren Wang

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

3 引用 (Scopus)

摘要

Random-walk-based sampling is an efficient way to extract and analyze the properties of large and complex graphs representing social networks. However, it is almost impractical for existing random-walk-based sampling schemes to reach the desired node distribution because of the indeterministic sampling budget (i.e., the number of samples or sampling steps) required for doing so with large volumes of data in graphs. On the other hand, under a small sampling budget, these methods produce low-quality samples with many repeats and high correlations (i.e., many common attributes), which leads to a large deviation from the desired node distribution and large estimation errors. In this paper, we propose a new random-walk sampling scheme based on node cliques (a subset of cliques), called node-clique random walk, or NCRW, to strike a good balance between the estimation error and the sampling budget, by producing unique samples with low correlations. Meanwhile, both the deviation from the desired node distribution and the estimation errors under the constraint of the sampling budget are reduced both theoretically and experimentally. Thus, the sampling costs which are closely related to the sampling budget are reduced. Our extensive experimental evaluation driven by real-world datasets further confirms that NCRW significantly increases the quality of samples and accuracy of estimations with much lower costs than those of existing random-walk-based sampling schemes especially in estimating the higher-order node attributes.

源语言英语
页(从-至)1909-1935
页数27
期刊Knowledge and Information Systems
64
7
DOI
出版状态已出版 - 7月 2022

指纹

探究 'Random walk on node cliques for high-quality samples to estimate large graphs with high accuracies and low costs' 的科研主题。它们共同构成独一无二的指纹。

引用此