TY - JOUR
T1 - Strict and Flexible Rule-Based Graph Repairing
AU - Cheng, Yurong
AU - Chen, Lei
AU - Yuan, Ye
AU - Wang, Guoren
AU - Li, Boyang
AU - Jin, Fusheng
N1 - Publisher Copyright:
© 1989-2012 IEEE.
PY - 2022/7/1
Y1 - 2022/7/1
N2 - Real-life graph datasets extracted from the Web are inevitably full of incompleteness, conflicts, and redundancies, so graph data cleaning shows its necessity. Although rules like data dependencies have been widely studied in relational data repairing, very few works exist to repair graph data. In this article, we introduce a repairing semantics for graphs, called Graph-Repairing Rules (GRRs). This semantics can capture the incompleteness, conflicts, and redundancies in graphs and indicate how to correct these errors. However, this graph repairing semantics can only repair the graphs strictly isomorphic to the rule patterns, which decreases the utility of the rules. To overcome this shortcoming, we further propose a flexible rule-based graph repairing semantics (called δ-GRR). We study three fundamental problems associated with both GRRs and δ-GRRs, consistency, implication, and termination, which show whether a given set of rules make sense. Repairing the graph data using GRRs or δ-GRRs involves a problem of finding isomorphic subgraphs of the graph data, which is NP-complete. To efficiently circumvent the complex calculation of subgraph isomorphism, we design a decomposition-and-join strategy to solve this problem. Extensive experiments on real datasets show that our two graph repairing semantics and corresponding repairing algorithms can effectively and efficiently repair real-life graph data.
AB - Real-life graph datasets extracted from the Web are inevitably full of incompleteness, conflicts, and redundancies, so graph data cleaning shows its necessity. Although rules like data dependencies have been widely studied in relational data repairing, very few works exist to repair graph data. In this article, we introduce a repairing semantics for graphs, called Graph-Repairing Rules (GRRs). This semantics can capture the incompleteness, conflicts, and redundancies in graphs and indicate how to correct these errors. However, this graph repairing semantics can only repair the graphs strictly isomorphic to the rule patterns, which decreases the utility of the rules. To overcome this shortcoming, we further propose a flexible rule-based graph repairing semantics (called δ-GRR). We study three fundamental problems associated with both GRRs and δ-GRRs, consistency, implication, and termination, which show whether a given set of rules make sense. Repairing the graph data using GRRs or δ-GRRs involves a problem of finding isomorphic subgraphs of the graph data, which is NP-complete. To efficiently circumvent the complex calculation of subgraph isomorphism, we design a decomposition-and-join strategy to solve this problem. Extensive experiments on real datasets show that our two graph repairing semantics and corresponding repairing algorithms can effectively and efficiently repair real-life graph data.
KW - Rule
KW - flexible
KW - graph repair
KW - repair algorithms
KW - semantics
UR - http://www.scopus.com/inward/record.url?scp=85090451275&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2020.3019817
DO - 10.1109/TKDE.2020.3019817
M3 - Article
AN - SCOPUS:85090451275
SN - 1041-4347
VL - 34
SP - 3521
EP - 3535
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 7
ER -