TY - JOUR
T1 - Learned sketch for subgraph counting
T2 - a holistic approach
AU - Zhao, Kangfei
AU - Yu, Jeffrey Xu
AU - Li, Qiyan
AU - Zhang, Hao
AU - Rong, Yu
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
PY - 2023/9
Y1 - 2023/9
N2 - Subgraph counting, as a fundamental problem in network analysis, is to count the number of subgraphs in a data graph that match a given query graph by either homomorphism or subgraph isomorphism. The importance of subgraph counting derives from the fact that it provides insights of a large graph, in particular a labeled graph, when a collection of query graphs with different sizes and labels are issued. The problem of counting is challenging. On the one hand, exact counting by enumerating subgraphs is NP-hard. On the other hand, approximate counting by subgraph isomorphism can only support small query graphs over unlabeled graphs. Another way for subgraph counting is to specify it as an SQL query and estimate the cardinality of the query in RDBMS. Existing approaches for cardinality estimation can only support subgraph counting by homomorphism up to some extent, as it is difficult to deal with sampling failure when a query graph becomes large. A question that arises is how we support subgraph counting by machine learning (ML) and deep learning (DL). To devise an ML/DL solution, apart from the query graphs, another issue is to deal with large data graphs by ML/DL, as the existing DL approach for subgraph isomorphism counting can only support small data graphs. In addition, the ML/DL approaches proposed in RDBMS context for approximate query processing and cardinality estimation cannot be used, as subgraph counting is to do complex self-joins over one relation, whereas existing approaches focus on multiple relations. In this work, we propose an active learned sketch for subgraph counting (ALSS) with two main components: a learned sketch for subgraph counting and an active learner. The sketch is constructed by a neural network regression model, and the active learner is to perform model updates based on new arrival test query graphs. Our holistic learning framework supports both undirected graphs and directed graphs, whose nodes and/or edges are associated zero to multiple labels. We conduct extensive experimental studies to confirm the effectiveness and efficiency of ALSS using large real labeled graphs. Moreover, we show that ALSS can assist query optimizers in finding a better query plan for complex multi-way self-joins.
AB - Subgraph counting, as a fundamental problem in network analysis, is to count the number of subgraphs in a data graph that match a given query graph by either homomorphism or subgraph isomorphism. The importance of subgraph counting derives from the fact that it provides insights of a large graph, in particular a labeled graph, when a collection of query graphs with different sizes and labels are issued. The problem of counting is challenging. On the one hand, exact counting by enumerating subgraphs is NP-hard. On the other hand, approximate counting by subgraph isomorphism can only support small query graphs over unlabeled graphs. Another way for subgraph counting is to specify it as an SQL query and estimate the cardinality of the query in RDBMS. Existing approaches for cardinality estimation can only support subgraph counting by homomorphism up to some extent, as it is difficult to deal with sampling failure when a query graph becomes large. A question that arises is how we support subgraph counting by machine learning (ML) and deep learning (DL). To devise an ML/DL solution, apart from the query graphs, another issue is to deal with large data graphs by ML/DL, as the existing DL approach for subgraph isomorphism counting can only support small data graphs. In addition, the ML/DL approaches proposed in RDBMS context for approximate query processing and cardinality estimation cannot be used, as subgraph counting is to do complex self-joins over one relation, whereas existing approaches focus on multiple relations. In this work, we propose an active learned sketch for subgraph counting (ALSS) with two main components: a learned sketch for subgraph counting and an active learner. The sketch is constructed by a neural network regression model, and the active learner is to perform model updates based on new arrival test query graphs. Our holistic learning framework supports both undirected graphs and directed graphs, whose nodes and/or edges are associated zero to multiple labels. We conduct extensive experimental studies to confirm the effectiveness and efficiency of ALSS using large real labeled graphs. Moreover, we show that ALSS can assist query optimizers in finding a better query plan for complex multi-way self-joins.
KW - Active learning
KW - Graph neural network
KW - Subgraph counting
UR - http://www.scopus.com/inward/record.url?scp=85147760813&partnerID=8YFLogxK
U2 - 10.1007/s00778-023-00781-5
DO - 10.1007/s00778-023-00781-5
M3 - Article
AN - SCOPUS:85147760813
SN - 1066-8888
VL - 32
SP - 937
EP - 962
JO - VLDB Journal
JF - VLDB Journal
IS - 5
ER -