TY - GEN
T1 - Efficient estimation of heat kernel pagerank for local clustering
AU - Yang, Renchi
AU - Bhowmick, Sourav S.
AU - Xiao, Xiaokui
AU - Zhao, Jun
AU - Wei, Zhewei
AU - Li, Rong Hua
N1 - Publisher Copyright:
© 2019 Association for Computing Machinery.
PY - 2019/6/25
Y1 - 2019/6/25
N2 - Given an undirected graph G and a seed node s, the local clustering problem aims to identify a high-quality cluster containing s in time roughly proportional to the size of the cluster, regardless of the size of G. This problem finds numerous applications on large-scale graphs. Recently, heat kernel PageRank (HKPR), which is a measure of the proximity of nodes in graphs, is applied to this problem and found to be more efficient compared with prior methods. However, existing solutions for computing HKPR either are prohibitively expensive or provide unsatisfactory error approximation on HKPR values, rendering them impractical especially on billion-edge graphs. In this paper, we present TEA and TEA+, two novel local graph clustering algorithms based on HKPR, to address the aforementioned limitations. Specifically, these algorithms provide non-trivial theoretical guarantees in relative error of HKPR values and the time complexity. The basic idea is to utilize deterministic graph traversal to produce a rough estimation of exact HKPR vector, and then exploit Monte-Carlo random walks to refine the results in an optimized and non-trivial way. In particular, TEA+ offers practical efficiency and effectiveness due to non-trivial optimizations. Extensive experiments on real-world datasets demonstrate that TEA+ outperforms the state-of-the-art algorithm by more than four times on most benchmark datasets in terms of computational time when achieving the same clustering quality, and in particular, is an order of magnitude faster on large graphs including the widely studied Twitter and Friendster datasets.
AB - Given an undirected graph G and a seed node s, the local clustering problem aims to identify a high-quality cluster containing s in time roughly proportional to the size of the cluster, regardless of the size of G. This problem finds numerous applications on large-scale graphs. Recently, heat kernel PageRank (HKPR), which is a measure of the proximity of nodes in graphs, is applied to this problem and found to be more efficient compared with prior methods. However, existing solutions for computing HKPR either are prohibitively expensive or provide unsatisfactory error approximation on HKPR values, rendering them impractical especially on billion-edge graphs. In this paper, we present TEA and TEA+, two novel local graph clustering algorithms based on HKPR, to address the aforementioned limitations. Specifically, these algorithms provide non-trivial theoretical guarantees in relative error of HKPR values and the time complexity. The basic idea is to utilize deterministic graph traversal to produce a rough estimation of exact HKPR vector, and then exploit Monte-Carlo random walks to refine the results in an optimized and non-trivial way. In particular, TEA+ offers practical efficiency and effectiveness due to non-trivial optimizations. Extensive experiments on real-world datasets demonstrate that TEA+ outperforms the state-of-the-art algorithm by more than four times on most benchmark datasets in terms of computational time when achieving the same clustering quality, and in particular, is an order of magnitude faster on large graphs including the widely studied Twitter and Friendster datasets.
KW - Heat kernel PageRank
KW - Local clustering
UR - http://www.scopus.com/inward/record.url?scp=85069525773&partnerID=8YFLogxK
U2 - 10.1145/3299869.3319886
DO - 10.1145/3299869.3319886
M3 - Conference contribution
AN - SCOPUS:85069525773
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 1339
EP - 1356
BT - SIGMOD 2019 - Proceedings of the 2019 International Conference on Management of Data
PB - Association for Computing Machinery
T2 - 2019 International Conference on Management of Data, SIGMOD 2019
Y2 - 30 June 2019 through 5 July 2019
ER -