跳到主要导航 跳到搜索 跳到主要内容

Automatic discovery of attributes in relational databases

  • Meihui Zhang*
  • , Marios Hadjieleftheriou
  • , Beng Chin Ooi
  • , Cecilia M. Procopiuc
  • , Divesh Srivastava
  • *此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

In this work we design algorithms for clustering relational columns into attributes, i.e., for identifying strong relationships between columns based on the common properties and characteristics of the values they contain. For example, identifying whether a certain set of columns refers to telephone numbers versus social security numbers, or names of customers versus names of nations. Traditional relational database schema languages use very limited primitive data types and simple foreign key constraints to express relationships between columns. Object oriented schema languages allow the definition of custom data types; still, certain relationships between columns might be unknown at design time or they might appear only in a particular database instance. Nevertheless, these relationships are an invaluable tool for schema matching, and generally for better understanding and working with the data. Here, we introduce data oriented solutions (we do not consider solutions that assume the existence of any external knowledge) that use statistical measures to identify strong relationships between the values of a set of columns. Interpreting the database as a graph where nodes correspond to database columns and edges correspond to column relationships, we decompose the graph into connected components and cluster sets of columns into attributes. To test the quality of our solution, we also provide a comprehensive experimental evaluation using real and synthetic datasets.

源语言英语
主期刊名Proceedings of SIGMOD 2011 and PODS 2011
出版商Association for Computing Machinery
109-120
页数12
ISBN(印刷版)9781450306614
DOI
出版状态已出版 - 2011
已对外发布
活动2011 ACM SIGMOD and 30th PODS 2011 Conference - Athens, 希腊
期限: 12 6月 201116 6月 2011

出版系列

姓名Proceedings of the ACM SIGMOD International Conference on Management of Data
ISSN(印刷版)0730-8078

会议

会议2011 ACM SIGMOD and 30th PODS 2011 Conference
国家/地区希腊
Athens
时期12/06/1116/06/11

指纹

探究 'Automatic discovery of attributes in relational databases' 的科研主题。它们共同构成独一无二的指纹。

引用此