TY - JOUR
T1 - A joint learning method for incomplete and imbalanced data in electronic health record based on generative adversarial networks
AU - Weng, Xutao
AU - Song, Hong
AU - Lin, Yucong
AU - Wu, You
AU - Zhang, Xi
AU - Liu, Bowen
AU - Yang, Jian
N1 - Publisher Copyright:
© 2023
PY - 2024/1
Y1 - 2024/1
N2 - Electronic health records (EHR), present challenges of incomplete and imbalanced data in clinical predictions. Previous studies addressed these two issues with two-step separately, which caused the decrease in the performance of prediction tasks. In this paper, we propose a unified framework to simultaneously addresses the challenges of incomplete and imbalanced data in EHR. Based on the framework, we develop a model called Missing Value Imputation and Imbalanced Learning Generative Adversarial Network (MVIIL-GAN). We use MVIIL-GAN to perform joint learning on the imputation process of high missing rate data and the conditional generation process of EHR data. The joint learning is achieved by introducing two discriminators to distinguish the fake data from the generated data at sample-level and variable-level. MVIIL-GAN integrate the missing values imputation and data generation in one step, improving the consistency of parameter optimization and the performance of prediction tasks. We evaluate our framework using the public dataset MIMIC-IV with high missing rates data and imbalanced data. Experimental results show that MVIIL-GAN outperforms existing methods in prediction performance. The implementation of MVIIL-GAN can be found at https://github.com/Peroxidess/MVIIL-GAN.
AB - Electronic health records (EHR), present challenges of incomplete and imbalanced data in clinical predictions. Previous studies addressed these two issues with two-step separately, which caused the decrease in the performance of prediction tasks. In this paper, we propose a unified framework to simultaneously addresses the challenges of incomplete and imbalanced data in EHR. Based on the framework, we develop a model called Missing Value Imputation and Imbalanced Learning Generative Adversarial Network (MVIIL-GAN). We use MVIIL-GAN to perform joint learning on the imputation process of high missing rate data and the conditional generation process of EHR data. The joint learning is achieved by introducing two discriminators to distinguish the fake data from the generated data at sample-level and variable-level. MVIIL-GAN integrate the missing values imputation and data generation in one step, improving the consistency of parameter optimization and the performance of prediction tasks. We evaluate our framework using the public dataset MIMIC-IV with high missing rates data and imbalanced data. Experimental results show that MVIIL-GAN outperforms existing methods in prediction performance. The implementation of MVIIL-GAN can be found at https://github.com/Peroxidess/MVIIL-GAN.
KW - Electronic health records
KW - Generative adversarial networks
KW - Imbalanced learning
KW - Missing values imputation
UR - http://www.scopus.com/inward/record.url?scp=85178048792&partnerID=8YFLogxK
U2 - 10.1016/j.compbiomed.2023.107687
DO - 10.1016/j.compbiomed.2023.107687
M3 - Article
C2 - 38007974
AN - SCOPUS:85178048792
SN - 0010-4825
VL - 168
JO - Computers in Biology and Medicine
JF - Computers in Biology and Medicine
M1 - 107687
ER -