TY - JOUR
T1 - Distributed subgraph counting
T2 - A general approach
AU - Zhang, Hao
AU - Yu, Jeffrey Xu
AU - Zhang, Yikai
AU - Zhao, Kangfei
AU - Cheng, Hong
N1 - Publisher Copyright:
© 2021, VLDB Endowment. All rights reserved.
PY - 2020/7/1
Y1 - 2020/7/1
N2 - In this paper, we study local subgraph counting, which is to count the occurrences of a user-given pattern graph p around every node v in a data graph G, when v matches to a given orbit o in p, where the orbit serves as a center to count p. In general, the orbit can be a node, an edge, or a set of nodes in p. Local subgraph counting has played an important role in characterizing high-order local structures that exhibit in a large graph, and has been widely used in denser and relevant communities mining, graphlet degree distribution, discriminative features selection for link prediction, relational classification and recommendation. In the literature, almost all the existing works support a k-node pattern graph, for k ≤ 5, with either 1 node orbit or 1 edge orbit. Their approaches are difficult to support larger k due to the fact that subgraph counting is to count by subgraph isomorphism. In this work, we develop a new general approach to count any k pattern graphs with any orbits selected. The key idea behind is that we do local subgraph counting by homomorphism counting, which can be solved by relational algebra using joins, group-by and aggregation. By homomorphism counting, we do local subgraph counting by eliminating counts for those that are not subgraph isomorphism matchings from the total count for any possible matchings. We have developed a distributed system named DISC on Spark. Our extensive experiments validate the efficiency of our approach by testing 114 local subgraph counting queries used in the existing work over real graphs, where no existing work can support all.
AB - In this paper, we study local subgraph counting, which is to count the occurrences of a user-given pattern graph p around every node v in a data graph G, when v matches to a given orbit o in p, where the orbit serves as a center to count p. In general, the orbit can be a node, an edge, or a set of nodes in p. Local subgraph counting has played an important role in characterizing high-order local structures that exhibit in a large graph, and has been widely used in denser and relevant communities mining, graphlet degree distribution, discriminative features selection for link prediction, relational classification and recommendation. In the literature, almost all the existing works support a k-node pattern graph, for k ≤ 5, with either 1 node orbit or 1 edge orbit. Their approaches are difficult to support larger k due to the fact that subgraph counting is to count by subgraph isomorphism. In this work, we develop a new general approach to count any k pattern graphs with any orbits selected. The key idea behind is that we do local subgraph counting by homomorphism counting, which can be solved by relational algebra using joins, group-by and aggregation. By homomorphism counting, we do local subgraph counting by eliminating counts for those that are not subgraph isomorphism matchings from the total count for any possible matchings. We have developed a distributed system named DISC on Spark. Our extensive experiments validate the efficiency of our approach by testing 114 local subgraph counting queries used in the existing work over real graphs, where no existing work can support all.
UR - http://www.scopus.com/inward/record.url?scp=85108852873&partnerID=8YFLogxK
U2 - 10.14778/3407790.3407840
DO - 10.14778/3407790.3407840
M3 - Article
AN - SCOPUS:85108852873
SN - 2150-8097
VL - 13
SP - 2493
EP - 2507
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
IS - 11
ER -