Schema matching based on SQL statements

Guohui Ding*, Shasha Sun, Guoren Wang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

6 Citations (Scopus)

Abstract

Schema matching is a critical step in numerous database applications such as web data sources integrating, data warehouse loading and information exchanging among several authorities. In this paper, we propose to exploit the similarities of the SQL statements in the query logs to find the correspondences between attributes in the schemas to be matched. We discover three kinds of similarities which benefit schema matching, that is, the similarity of clauses itself, the similarity of the frequency of clauses occurring in different SQL statements and the similarity of statistics about the relationship among clauses. We combine the clauses related to the similarities into a graph, and then transform the task of matching attributes into the problem of matching the graphs. Through matching the graphs, we obtain a set of attribute sequence pairs with the similarity score. Actually, each sequence pair represents a set of correspondences. Next, we exploit the techniques from the quadratic programming field to decompose the sequence pairs into correspondences, that is, to obtain the similarity score of each correspondence. Finally, an efficient method is used to choose the best correspondence for each attribute from the candidate set. The experimental study shows that the proposed approach is effective and its combination with other matchers has good performance.

Original languageEnglish
Pages (from-to)193-226
Number of pages34
JournalDistributed and Parallel Databases
Volume38
Issue number1
DOIs
Publication statusPublished - 1 Mar 2020

Keywords

  • Database integration
  • Metadata
  • Query log
  • Schema matching
  • Similarity metric

Fingerprint

Dive into the research topics of 'Schema matching based on SQL statements'. Together they form a unique fingerprint.

Cite this