Improving GAN with inverse cumulative distribution function for tabular data synthesis

Ban Li, Senlin Luo, Xiaonan Qin, Limin Pan*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

9 Citations (Scopus)

Abstract

Designing a generative model to synthesize realistic tabular data is of great significance in data science. Existing tabular data generative models have difficulty in handling complicated and diverse marginal distribution types due to the gradient vanishing problem, and these models pay little attention to the correlation between attributes. We propose a method that improves the generative adversarial network (GAN) with inverse cumulative distribution function for tabular data synthesis. This method first transforms continuous columns into uniform distribution data by using the cumulative distribution function, which can alleviate the gradient vanishing problem in model training. Then the method trains GAN with the transformed data, where the discriminator with label reconstruction function is presented to model the correlation among attributes accurately by introducing an auxiliary supervised task to help the correlations extraction. After that, we train a neural network for each continuous column to perform the inverse transformation of generated data into the target distribution, thereby the synthetic data is obtained. Experiments on simulated and real-world datasets show that our method compares favorably against the state-of-the-art methods in modeling tabular data.

Original languageEnglish
Pages (from-to)373-383
Number of pages11
JournalNeurocomputing
Volume456
DOIs
Publication statusPublished - 7 Oct 2021

Keywords

  • Cumulative distribution function
  • Data synthesis
  • Generative adversarial network
  • Tabular data

Fingerprint

Dive into the research topics of 'Improving GAN with inverse cumulative distribution function for tabular data synthesis'. Together they form a unique fingerprint.

Cite this