A joint learning method for incomplete and imbalanced data in electronic health record based on generative adversarial networks

Xutao Weng; Hong Song; Yucong Lin; You Wu; Xi Zhang; Bowen Liu; Jian Yang

doi:10.1016/j.compbiomed.2023.107687

A joint learning method for incomplete and imbalanced data in electronic health record based on generative adversarial networks

Xutao Weng, Hong Song^*, Yucong Lin, You Wu, Xi Zhang, Bowen Liu, Jian Yang^*

^*此作品的通讯作者

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

1 引用（Scopus）

摘要

Electronic health records (EHR), present challenges of incomplete and imbalanced data in clinical predictions. Previous studies addressed these two issues with two-step separately, which caused the decrease in the performance of prediction tasks. In this paper, we propose a unified framework to simultaneously addresses the challenges of incomplete and imbalanced data in EHR. Based on the framework, we develop a model called Missing Value Imputation and Imbalanced Learning Generative Adversarial Network (MVIIL-GAN). We use MVIIL-GAN to perform joint learning on the imputation process of high missing rate data and the conditional generation process of EHR data. The joint learning is achieved by introducing two discriminators to distinguish the fake data from the generated data at sample-level and variable-level. MVIIL-GAN integrate the missing values imputation and data generation in one step, improving the consistency of parameter optimization and the performance of prediction tasks. We evaluate our framework using the public dataset MIMIC-IV with high missing rates data and imbalanced data. Experimental results show that MVIIL-GAN outperforms existing methods in prediction performance. The implementation of MVIIL-GAN can be found at https://github.com/Peroxidess/MVIIL-GAN.

源语言	英语
文章编号	107687
期刊	Computers in Biology and Medicine
卷	168
DOI	https://doi.org/10.1016/j.compbiomed.2023.107687
出版状态	已出版 - 1月 2024

访问文件

10.1016/j.compbiomed.2023.107687

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{81b84314ea314029bfea140e3122d5a4,

title = "A joint learning method for incomplete and imbalanced data in electronic health record based on generative adversarial networks",

abstract = "Electronic health records (EHR), present challenges of incomplete and imbalanced data in clinical predictions. Previous studies addressed these two issues with two-step separately, which caused the decrease in the performance of prediction tasks. In this paper, we propose a unified framework to simultaneously addresses the challenges of incomplete and imbalanced data in EHR. Based on the framework, we develop a model called Missing Value Imputation and Imbalanced Learning Generative Adversarial Network (MVIIL-GAN). We use MVIIL-GAN to perform joint learning on the imputation process of high missing rate data and the conditional generation process of EHR data. The joint learning is achieved by introducing two discriminators to distinguish the fake data from the generated data at sample-level and variable-level. MVIIL-GAN integrate the missing values imputation and data generation in one step, improving the consistency of parameter optimization and the performance of prediction tasks. We evaluate our framework using the public dataset MIMIC-IV with high missing rates data and imbalanced data. Experimental results show that MVIIL-GAN outperforms existing methods in prediction performance. The implementation of MVIIL-GAN can be found at https://github.com/Peroxidess/MVIIL-GAN.",

keywords = "Electronic health records, Generative adversarial networks, Imbalanced learning, Missing values imputation",

author = "Xutao Weng and Hong Song and Yucong Lin and You Wu and Xi Zhang and Bowen Liu and Jian Yang",

note = "Publisher Copyright: {\textcopyright} 2023",

year = "2024",

month = jan,

doi = "10.1016/j.compbiomed.2023.107687",

language = "English",

volume = "168",

journal = "Computers in Biology and Medicine",

issn = "0010-4825",

publisher = "Elsevier Ltd.",

}

TY - JOUR

T1 - A joint learning method for incomplete and imbalanced data in electronic health record based on generative adversarial networks

AU - Weng, Xutao

AU - Song, Hong

AU - Lin, Yucong

AU - Wu, You

AU - Zhang, Xi

AU - Liu, Bowen

AU - Yang, Jian

PY - 2024/1

Y1 - 2024/1

N2 - Electronic health records (EHR), present challenges of incomplete and imbalanced data in clinical predictions. Previous studies addressed these two issues with two-step separately, which caused the decrease in the performance of prediction tasks. In this paper, we propose a unified framework to simultaneously addresses the challenges of incomplete and imbalanced data in EHR. Based on the framework, we develop a model called Missing Value Imputation and Imbalanced Learning Generative Adversarial Network (MVIIL-GAN). We use MVIIL-GAN to perform joint learning on the imputation process of high missing rate data and the conditional generation process of EHR data. The joint learning is achieved by introducing two discriminators to distinguish the fake data from the generated data at sample-level and variable-level. MVIIL-GAN integrate the missing values imputation and data generation in one step, improving the consistency of parameter optimization and the performance of prediction tasks. We evaluate our framework using the public dataset MIMIC-IV with high missing rates data and imbalanced data. Experimental results show that MVIIL-GAN outperforms existing methods in prediction performance. The implementation of MVIIL-GAN can be found at https://github.com/Peroxidess/MVIIL-GAN.

AB - Electronic health records (EHR), present challenges of incomplete and imbalanced data in clinical predictions. Previous studies addressed these two issues with two-step separately, which caused the decrease in the performance of prediction tasks. In this paper, we propose a unified framework to simultaneously addresses the challenges of incomplete and imbalanced data in EHR. Based on the framework, we develop a model called Missing Value Imputation and Imbalanced Learning Generative Adversarial Network (MVIIL-GAN). We use MVIIL-GAN to perform joint learning on the imputation process of high missing rate data and the conditional generation process of EHR data. The joint learning is achieved by introducing two discriminators to distinguish the fake data from the generated data at sample-level and variable-level. MVIIL-GAN integrate the missing values imputation and data generation in one step, improving the consistency of parameter optimization and the performance of prediction tasks. We evaluate our framework using the public dataset MIMIC-IV with high missing rates data and imbalanced data. Experimental results show that MVIIL-GAN outperforms existing methods in prediction performance. The implementation of MVIIL-GAN can be found at https://github.com/Peroxidess/MVIIL-GAN.

KW - Electronic health records

KW - Generative adversarial networks

KW - Imbalanced learning

KW - Missing values imputation

UR - http://www.scopus.com/inward/record.url?scp=85178048792&partnerID=8YFLogxK

U2 - 10.1016/j.compbiomed.2023.107687

DO - 10.1016/j.compbiomed.2023.107687

M3 - Article

C2 - 38007974

AN - SCOPUS:85178048792

SN - 0010-4825

VL - 168

JO - Computers in Biology and Medicine

JF - Computers in Biology and Medicine

M1 - 107687

ER -

A joint learning method for incomplete and imbalanced data in electronic health record based on generative adversarial networks

摘要

访问文件

其它文件与链接

指纹

引用此