TY - JOUR
T1 - Schema matching based on SQL statements
AU - Ding, Guohui
AU - Sun, Shasha
AU - Wang, Guoren
N1 - Publisher Copyright:
© 2019, Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2020/3/1
Y1 - 2020/3/1
N2 - Schema matching is a critical step in numerous database applications such as web data sources integrating, data warehouse loading and information exchanging among several authorities. In this paper, we propose to exploit the similarities of the SQL statements in the query logs to find the correspondences between attributes in the schemas to be matched. We discover three kinds of similarities which benefit schema matching, that is, the similarity of clauses itself, the similarity of the frequency of clauses occurring in different SQL statements and the similarity of statistics about the relationship among clauses. We combine the clauses related to the similarities into a graph, and then transform the task of matching attributes into the problem of matching the graphs. Through matching the graphs, we obtain a set of attribute sequence pairs with the similarity score. Actually, each sequence pair represents a set of correspondences. Next, we exploit the techniques from the quadratic programming field to decompose the sequence pairs into correspondences, that is, to obtain the similarity score of each correspondence. Finally, an efficient method is used to choose the best correspondence for each attribute from the candidate set. The experimental study shows that the proposed approach is effective and its combination with other matchers has good performance.
AB - Schema matching is a critical step in numerous database applications such as web data sources integrating, data warehouse loading and information exchanging among several authorities. In this paper, we propose to exploit the similarities of the SQL statements in the query logs to find the correspondences between attributes in the schemas to be matched. We discover three kinds of similarities which benefit schema matching, that is, the similarity of clauses itself, the similarity of the frequency of clauses occurring in different SQL statements and the similarity of statistics about the relationship among clauses. We combine the clauses related to the similarities into a graph, and then transform the task of matching attributes into the problem of matching the graphs. Through matching the graphs, we obtain a set of attribute sequence pairs with the similarity score. Actually, each sequence pair represents a set of correspondences. Next, we exploit the techniques from the quadratic programming field to decompose the sequence pairs into correspondences, that is, to obtain the similarity score of each correspondence. Finally, an efficient method is used to choose the best correspondence for each attribute from the candidate set. The experimental study shows that the proposed approach is effective and its combination with other matchers has good performance.
KW - Database integration
KW - Metadata
KW - Query log
KW - Schema matching
KW - Similarity metric
UR - http://www.scopus.com/inward/record.url?scp=85065743286&partnerID=8YFLogxK
U2 - 10.1007/s10619-019-07268-9
DO - 10.1007/s10619-019-07268-9
M3 - Article
AN - SCOPUS:85065743286
SN - 0926-8782
VL - 38
SP - 193
EP - 226
JO - Distributed and Parallel Databases
JF - Distributed and Parallel Databases
IS - 1
ER -