TY - GEN
T1 - On random walk based graph sampling
AU - Li, Rong Hua
AU - Yu, Jeffrey Xu
AU - Qin, Lu
AU - Mao, Rui
AU - Jin, Tan
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/5/26
Y1 - 2015/5/26
N2 - Random walk based graph sampling has been recognized as a fundamental technique to collect uniform node samples from a large graph. In this paper, we first present a comprehensive analysis of the drawbacks of three widely-used random walk based graph sampling algorithms, called re-weighted random walk (RW) algorithm, Metropolis-Hastings random walk (MH) algorithm and maximum-degree random walk (MD) algorithm. Then, to address the limitations of these algorithms, we propose two general random walk based algorithms, named rejection-controlled Metropolis-Hastings (RCMH) algorithm and generalized maximum-degree random walk (GMD) algorithm. We show that RCMH balances the tradeoff between the limitations of RW and MH, and GMD balances the tradeoff between the drawbacks of RW and MD. To further improve the performance of our algorithms, we integrate the so-called delayed acceptance technique and the non-backtracking random walk technique into RCMH and GMD respectively. We conduct extensive experiments over four real-world datasets, and the results demonstrate the effectiveness of the proposed algorithms.
AB - Random walk based graph sampling has been recognized as a fundamental technique to collect uniform node samples from a large graph. In this paper, we first present a comprehensive analysis of the drawbacks of three widely-used random walk based graph sampling algorithms, called re-weighted random walk (RW) algorithm, Metropolis-Hastings random walk (MH) algorithm and maximum-degree random walk (MD) algorithm. Then, to address the limitations of these algorithms, we propose two general random walk based algorithms, named rejection-controlled Metropolis-Hastings (RCMH) algorithm and generalized maximum-degree random walk (GMD) algorithm. We show that RCMH balances the tradeoff between the limitations of RW and MH, and GMD balances the tradeoff between the drawbacks of RW and MD. To further improve the performance of our algorithms, we integrate the so-called delayed acceptance technique and the non-backtracking random walk technique into RCMH and GMD respectively. We conduct extensive experiments over four real-world datasets, and the results demonstrate the effectiveness of the proposed algorithms.
UR - http://www.scopus.com/inward/record.url?scp=84940877644&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2015.7113345
DO - 10.1109/ICDE.2015.7113345
M3 - Conference contribution
AN - SCOPUS:84940877644
T3 - Proceedings - International Conference on Data Engineering
SP - 927
EP - 938
BT - 2015 IEEE 31st International Conference on Data Engineering, ICDE 2015
PB - IEEE Computer Society
T2 - 2015 31st IEEE International Conference on Data Engineering, ICDE 2015
Y2 - 13 April 2015 through 17 April 2015
ER -