On multi-column foreign key discovery

Meihui Zhang*, Marios Hadjieleftheriou, Beng Chin Ooi, Cecilia M. Procopiuc, Divesh Srivastava

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

77 Citations (Scopus)

Abstract

A foreign/primary key relationship between relational tables is one of the most important constraints in a database. From a data analysis perspective, discovering foreign keys is a crucial step in understanding and working with the data. Nevertheless, more often than not, foreign key constraints are not specified in the data, for various reasons; e.g., some associations are not known to designers but are inherent in the data, while others become invalid due to data inconsistencies. This work proposes a robust algorithm for discovering single-column and multi-column foreign keys. Previous work concentrated mostly on discovering single-column foreign keys using a variety of rules, like inclusion dependencies, column names, and minimum/maximum values. We first propose a general rule, termed Randomness, that subsumes a variety of other rules. We then develop efficient approximation algorithms for evaluating randomness, using only two passes over the data. Finally, we validate our approach via extensive experiments using real and synthetic datasets.

Original languageEnglish
Pages (from-to)805-814
Number of pages10
JournalProceedings of the VLDB Endowment
Volume3
Issue number1
DOIs
Publication statusPublished - Sept 2010
Externally publishedYes

Fingerprint

Dive into the research topics of 'On multi-column foreign key discovery'. Together they form a unique fingerprint.

Cite this