Improving GAN with inverse cumulative distribution function for tabular data synthesis

Ban Li; Senlin Luo; Xiaonan Qin; Limin Pan

doi:10.1016/j.neucom.2021.05.098

Improving GAN with inverse cumulative distribution function for tabular data synthesis

Ban Li, Senlin Luo, Xiaonan Qin, Limin Pan^*

^*Corresponding author for this work

School of Information and Electronics

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

9 Citations (Scopus)

Abstract

Designing a generative model to synthesize realistic tabular data is of great significance in data science. Existing tabular data generative models have difficulty in handling complicated and diverse marginal distribution types due to the gradient vanishing problem, and these models pay little attention to the correlation between attributes. We propose a method that improves the generative adversarial network (GAN) with inverse cumulative distribution function for tabular data synthesis. This method first transforms continuous columns into uniform distribution data by using the cumulative distribution function, which can alleviate the gradient vanishing problem in model training. Then the method trains GAN with the transformed data, where the discriminator with label reconstruction function is presented to model the correlation among attributes accurately by introducing an auxiliary supervised task to help the correlations extraction. After that, we train a neural network for each continuous column to perform the inverse transformation of generated data into the target distribution, thereby the synthetic data is obtained. Experiments on simulated and real-world datasets show that our method compares favorably against the state-of-the-art methods in modeling tabular data.

Original language	English
Pages (from-to)	373-383
Number of pages	11
Journal	Neurocomputing
Volume	456
DOIs	https://doi.org/10.1016/j.neucom.2021.05.098
Publication status	Published - 7 Oct 2021

Keywords

Cumulative distribution function
Data synthesis
Generative adversarial network
Tabular data

Access to Document

10.1016/j.neucom.2021.05.098

Cite this

@article{7cfbc1bf3612499e9eef5f048576c48e,

title = "Improving GAN with inverse cumulative distribution function for tabular data synthesis",

abstract = "Designing a generative model to synthesize realistic tabular data is of great significance in data science. Existing tabular data generative models have difficulty in handling complicated and diverse marginal distribution types due to the gradient vanishing problem, and these models pay little attention to the correlation between attributes. We propose a method that improves the generative adversarial network (GAN) with inverse cumulative distribution function for tabular data synthesis. This method first transforms continuous columns into uniform distribution data by using the cumulative distribution function, which can alleviate the gradient vanishing problem in model training. Then the method trains GAN with the transformed data, where the discriminator with label reconstruction function is presented to model the correlation among attributes accurately by introducing an auxiliary supervised task to help the correlations extraction. After that, we train a neural network for each continuous column to perform the inverse transformation of generated data into the target distribution, thereby the synthetic data is obtained. Experiments on simulated and real-world datasets show that our method compares favorably against the state-of-the-art methods in modeling tabular data.",

keywords = "Cumulative distribution function, Data synthesis, Generative adversarial network, Tabular data",

author = "Ban Li and Senlin Luo and Xiaonan Qin and Limin Pan",

note = "Publisher Copyright: {\textcopyright} 2021 Elsevier B.V.",

year = "2021",

month = oct,

day = "7",

doi = "10.1016/j.neucom.2021.05.098",

language = "English",

volume = "456",

pages = "373--383",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Improving GAN with inverse cumulative distribution function for tabular data synthesis

AU - Li, Ban

AU - Luo, Senlin

AU - Qin, Xiaonan

AU - Pan, Limin

PY - 2021/10/7

Y1 - 2021/10/7

N2 - Designing a generative model to synthesize realistic tabular data is of great significance in data science. Existing tabular data generative models have difficulty in handling complicated and diverse marginal distribution types due to the gradient vanishing problem, and these models pay little attention to the correlation between attributes. We propose a method that improves the generative adversarial network (GAN) with inverse cumulative distribution function for tabular data synthesis. This method first transforms continuous columns into uniform distribution data by using the cumulative distribution function, which can alleviate the gradient vanishing problem in model training. Then the method trains GAN with the transformed data, where the discriminator with label reconstruction function is presented to model the correlation among attributes accurately by introducing an auxiliary supervised task to help the correlations extraction. After that, we train a neural network for each continuous column to perform the inverse transformation of generated data into the target distribution, thereby the synthetic data is obtained. Experiments on simulated and real-world datasets show that our method compares favorably against the state-of-the-art methods in modeling tabular data.

AB - Designing a generative model to synthesize realistic tabular data is of great significance in data science. Existing tabular data generative models have difficulty in handling complicated and diverse marginal distribution types due to the gradient vanishing problem, and these models pay little attention to the correlation between attributes. We propose a method that improves the generative adversarial network (GAN) with inverse cumulative distribution function for tabular data synthesis. This method first transforms continuous columns into uniform distribution data by using the cumulative distribution function, which can alleviate the gradient vanishing problem in model training. Then the method trains GAN with the transformed data, where the discriminator with label reconstruction function is presented to model the correlation among attributes accurately by introducing an auxiliary supervised task to help the correlations extraction. After that, we train a neural network for each continuous column to perform the inverse transformation of generated data into the target distribution, thereby the synthetic data is obtained. Experiments on simulated and real-world datasets show that our method compares favorably against the state-of-the-art methods in modeling tabular data.

KW - Cumulative distribution function

KW - Data synthesis

KW - Generative adversarial network

KW - Tabular data

UR - http://www.scopus.com/inward/record.url?scp=85107766606&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2021.05.098

DO - 10.1016/j.neucom.2021.05.098

M3 - Article

AN - SCOPUS:85107766606

SN - 0925-2312

VL - 456

SP - 373

EP - 383

JO - Neurocomputing

JF - Neurocomputing

ER -

Improving GAN with inverse cumulative distribution function for tabular data synthesis

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this