Schema matching based on SQL statements

Guohui Ding*, Shasha Sun, Guoren Wang

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

4 引用 (Scopus)

摘要

Schema matching is a critical step in numerous database applications such as web data sources integrating, data warehouse loading and information exchanging among several authorities. In this paper, we propose to exploit the similarities of the SQL statements in the query logs to find the correspondences between attributes in the schemas to be matched. We discover three kinds of similarities which benefit schema matching, that is, the similarity of clauses itself, the similarity of the frequency of clauses occurring in different SQL statements and the similarity of statistics about the relationship among clauses. We combine the clauses related to the similarities into a graph, and then transform the task of matching attributes into the problem of matching the graphs. Through matching the graphs, we obtain a set of attribute sequence pairs with the similarity score. Actually, each sequence pair represents a set of correspondences. Next, we exploit the techniques from the quadratic programming field to decompose the sequence pairs into correspondences, that is, to obtain the similarity score of each correspondence. Finally, an efficient method is used to choose the best correspondence for each attribute from the candidate set. The experimental study shows that the proposed approach is effective and its combination with other matchers has good performance.

源语言英语
页(从-至)193-226
页数34
期刊Distributed and Parallel Databases
38
1
DOI
出版状态已出版 - 1 3月 2020

指纹

探究 'Schema matching based on SQL statements' 的科研主题。它们共同构成独一无二的指纹。

引用此