Schema matching based on SQL statements

Guohui Ding; Shasha Sun; Guoren Wang

doi:10.1007/s10619-019-07268-9

Schema matching based on SQL statements

Guohui Ding^*, Shasha Sun, Guoren Wang

^*此作品的通讯作者

计算机学院

Shenyang Aerospace University

科研成果: 期刊稿件 › 文章 › 同行评审

4 引用（Scopus）

摘要

Schema matching is a critical step in numerous database applications such as web data sources integrating, data warehouse loading and information exchanging among several authorities. In this paper, we propose to exploit the similarities of the SQL statements in the query logs to find the correspondences between attributes in the schemas to be matched. We discover three kinds of similarities which benefit schema matching, that is, the similarity of clauses itself, the similarity of the frequency of clauses occurring in different SQL statements and the similarity of statistics about the relationship among clauses. We combine the clauses related to the similarities into a graph, and then transform the task of matching attributes into the problem of matching the graphs. Through matching the graphs, we obtain a set of attribute sequence pairs with the similarity score. Actually, each sequence pair represents a set of correspondences. Next, we exploit the techniques from the quadratic programming field to decompose the sequence pairs into correspondences, that is, to obtain the similarity score of each correspondence. Finally, an efficient method is used to choose the best correspondence for each attribute from the candidate set. The experimental study shows that the proposed approach is effective and its combination with other matchers has good performance.

源语言	英语
页（从-至）	193-226
页数	34
期刊	Distributed and Parallel Databases
卷	38
期	1
DOI	https://doi.org/10.1007/s10619-019-07268-9
出版状态	已出版 - 1 3月 2020

访问文件

10.1007/s10619-019-07268-9

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{05ed4494bcb44806bdd0d1b9eb4dee79,

title = "Schema matching based on SQL statements",

abstract = "Schema matching is a critical step in numerous database applications such as web data sources integrating, data warehouse loading and information exchanging among several authorities. In this paper, we propose to exploit the similarities of the SQL statements in the query logs to find the correspondences between attributes in the schemas to be matched. We discover three kinds of similarities which benefit schema matching, that is, the similarity of clauses itself, the similarity of the frequency of clauses occurring in different SQL statements and the similarity of statistics about the relationship among clauses. We combine the clauses related to the similarities into a graph, and then transform the task of matching attributes into the problem of matching the graphs. Through matching the graphs, we obtain a set of attribute sequence pairs with the similarity score. Actually, each sequence pair represents a set of correspondences. Next, we exploit the techniques from the quadratic programming field to decompose the sequence pairs into correspondences, that is, to obtain the similarity score of each correspondence. Finally, an efficient method is used to choose the best correspondence for each attribute from the candidate set. The experimental study shows that the proposed approach is effective and its combination with other matchers has good performance.",

keywords = "Database integration, Metadata, Query log, Schema matching, Similarity metric",

author = "Guohui Ding and Shasha Sun and Guoren Wang",

note = "Publisher Copyright: {\textcopyright} 2019, Springer Science+Business Media, LLC, part of Springer Nature.",

year = "2020",

month = mar,

day = "1",

doi = "10.1007/s10619-019-07268-9",

language = "English",

volume = "38",

pages = "193--226",

journal = "Distributed and Parallel Databases",

issn = "0926-8782",

publisher = "Springer Netherlands",

number = "1",

}

TY - JOUR

T1 - Schema matching based on SQL statements

AU - Ding, Guohui

AU - Sun, Shasha

AU - Wang, Guoren

PY - 2020/3/1

Y1 - 2020/3/1

N2 - Schema matching is a critical step in numerous database applications such as web data sources integrating, data warehouse loading and information exchanging among several authorities. In this paper, we propose to exploit the similarities of the SQL statements in the query logs to find the correspondences between attributes in the schemas to be matched. We discover three kinds of similarities which benefit schema matching, that is, the similarity of clauses itself, the similarity of the frequency of clauses occurring in different SQL statements and the similarity of statistics about the relationship among clauses. We combine the clauses related to the similarities into a graph, and then transform the task of matching attributes into the problem of matching the graphs. Through matching the graphs, we obtain a set of attribute sequence pairs with the similarity score. Actually, each sequence pair represents a set of correspondences. Next, we exploit the techniques from the quadratic programming field to decompose the sequence pairs into correspondences, that is, to obtain the similarity score of each correspondence. Finally, an efficient method is used to choose the best correspondence for each attribute from the candidate set. The experimental study shows that the proposed approach is effective and its combination with other matchers has good performance.

AB - Schema matching is a critical step in numerous database applications such as web data sources integrating, data warehouse loading and information exchanging among several authorities. In this paper, we propose to exploit the similarities of the SQL statements in the query logs to find the correspondences between attributes in the schemas to be matched. We discover three kinds of similarities which benefit schema matching, that is, the similarity of clauses itself, the similarity of the frequency of clauses occurring in different SQL statements and the similarity of statistics about the relationship among clauses. We combine the clauses related to the similarities into a graph, and then transform the task of matching attributes into the problem of matching the graphs. Through matching the graphs, we obtain a set of attribute sequence pairs with the similarity score. Actually, each sequence pair represents a set of correspondences. Next, we exploit the techniques from the quadratic programming field to decompose the sequence pairs into correspondences, that is, to obtain the similarity score of each correspondence. Finally, an efficient method is used to choose the best correspondence for each attribute from the candidate set. The experimental study shows that the proposed approach is effective and its combination with other matchers has good performance.

KW - Database integration

KW - Metadata

KW - Query log

KW - Schema matching

KW - Similarity metric

UR - http://www.scopus.com/inward/record.url?scp=85065743286&partnerID=8YFLogxK

U2 - 10.1007/s10619-019-07268-9

DO - 10.1007/s10619-019-07268-9

M3 - Article

AN - SCOPUS:85065743286

SN - 0926-8782

VL - 38

SP - 193

EP - 226

JO - Distributed and Parallel Databases

JF - Distributed and Parallel Databases

IS - 1

ER -

Schema matching based on SQL statements

摘要

访问文件

其它文件与链接

指纹

引用此