TY - GEN
T1 - Lightning Fast and Space Efficient k-clique Counting
AU - Ye, Xiaowei
AU - Li, Rong Hua
AU - Dai, Qiangqiang
AU - Chen, Hongzhi
AU - Wang, Guoren
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/4/25
Y1 - 2022/4/25
N2 - K-clique counting is a fundamental problem in network analysis which has attracted much attention in recent years. Computing the count of k-cliques in a graph for a large k (e.g., k = 8) is often intractable as the number of k-cliques increases exponentially w.r.t. (with respect to) k. Existing exact k-clique counting algorithms are often hard to handle large dense graphs, while sampling-based solutions either require a huge number of samples or consume very high storage space to achieve a satisfactory accuracy. To overcome these limitations, we propose a new framework to estimate the number of k-cliques which integrates both the exact k-clique counting technique and two novel color-based sampling techniques. The key insight of our framework is that we only apply the exact algorithm to compute the k-clique counts in the sparse regions of a graph, and use the proposed sampling-based techniques to estimate the number of k-cliques in the dense regions of the graph. Specifically, we develop two novel dynamic programming based k-color set sampling techniques to efficiently estimate the k-clique counts, where a k-color set contains k nodes with k different colors. Since a k-color set is often a good approximation of a k-clique in the dense regions of a graph, our sampling-based solutions are extremely efficient and accurate. Moreover, the proposed sampling techniques are space efficient which use near-linear space w.r.t. graph size. We conduct extensive experiments to evaluate our algorithms using 8 real-life graphs. The results show that our best algorithm is at least one order of magnitude faster than the state-of-the-art sampling-based solutions (with the same relative error 0.1%) and can be up to three orders of magnitude faster than the state-of-the-art exact algorithm on large graphs.
AB - K-clique counting is a fundamental problem in network analysis which has attracted much attention in recent years. Computing the count of k-cliques in a graph for a large k (e.g., k = 8) is often intractable as the number of k-cliques increases exponentially w.r.t. (with respect to) k. Existing exact k-clique counting algorithms are often hard to handle large dense graphs, while sampling-based solutions either require a huge number of samples or consume very high storage space to achieve a satisfactory accuracy. To overcome these limitations, we propose a new framework to estimate the number of k-cliques which integrates both the exact k-clique counting technique and two novel color-based sampling techniques. The key insight of our framework is that we only apply the exact algorithm to compute the k-clique counts in the sparse regions of a graph, and use the proposed sampling-based techniques to estimate the number of k-cliques in the dense regions of the graph. Specifically, we develop two novel dynamic programming based k-color set sampling techniques to efficiently estimate the k-clique counts, where a k-color set contains k nodes with k different colors. Since a k-color set is often a good approximation of a k-clique in the dense regions of a graph, our sampling-based solutions are extremely efficient and accurate. Moreover, the proposed sampling techniques are space efficient which use near-linear space w.r.t. graph size. We conduct extensive experiments to evaluate our algorithms using 8 real-life graphs. The results show that our best algorithm is at least one order of magnitude faster than the state-of-the-art sampling-based solutions (with the same relative error 0.1%) and can be up to three orders of magnitude faster than the state-of-the-art exact algorithm on large graphs.
KW - cohesive subgraphs
KW - graph sampling
KW - k-clique counting
UR - http://www.scopus.com/inward/record.url?scp=85129825217&partnerID=8YFLogxK
U2 - 10.1145/3485447.3512167
DO - 10.1145/3485447.3512167
M3 - Conference contribution
AN - SCOPUS:85129825217
T3 - WWW 2022 - Proceedings of the ACM Web Conference 2022
SP - 1191
EP - 1202
BT - WWW 2022 - Proceedings of the ACM Web Conference 2022
PB - Association for Computing Machinery, Inc
T2 - 31st ACM World Wide Web Conference, WWW 2022
Y2 - 25 April 2022 through 29 April 2022
ER -