Abstract
Designing a generative model to synthesize realistic tabular data is of great significance in data science. Existing tabular data generative models have difficulty in handling complicated and diverse marginal distribution types due to the gradient vanishing problem, and these models pay little attention to the correlation between attributes. We propose a method that improves the generative adversarial network (GAN) with inverse cumulative distribution function for tabular data synthesis. This method first transforms continuous columns into uniform distribution data by using the cumulative distribution function, which can alleviate the gradient vanishing problem in model training. Then the method trains GAN with the transformed data, where the discriminator with label reconstruction function is presented to model the correlation among attributes accurately by introducing an auxiliary supervised task to help the correlations extraction. After that, we train a neural network for each continuous column to perform the inverse transformation of generated data into the target distribution, thereby the synthetic data is obtained. Experiments on simulated and real-world datasets show that our method compares favorably against the state-of-the-art methods in modeling tabular data.
Original language | English |
---|---|
Pages (from-to) | 373-383 |
Number of pages | 11 |
Journal | Neurocomputing |
Volume | 456 |
DOIs | |
Publication status | Published - 7 Oct 2021 |
Keywords
- Cumulative distribution function
- Data synthesis
- Generative adversarial network
- Tabular data