TY - JOUR
T1 - The prediction of hepatitis e through ensemble learning
AU - Peng, Tu
AU - Chen, Xiaoya
AU - Wan, Ming
AU - Jin, Lizhu
AU - Wang, Xiaofeng
AU - Du, Xuejie
AU - Ge, Hui
AU - Yang, Xu
N1 - Publisher Copyright:
© 2020, MDPI AG. All rights reserved.
PY - 2020/1/1
Y1 - 2020/1/1
N2 - According to the World Health Organization, about 20 million people are infected with Hepatitis E every year. In 2015, there were 44,000 deaths due to HEV infection worldwide. Food, water and climate are key factors that affect the outbreak of Hepatitis E. This paper presents an ensemble learning model for Hepatitis E prediction by studying the correlation between historical epidemic cases of hepatitis E and environmental factors (water quality and meteorological data). Environmental factors include many features, and ones that are most relevant to HEV are selected and input into the ensemble learning model composed by Gradient Boosting Decision Tree (GBDT) and Random Forest for training and prediction. Three indicators, root mean square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE), are used to evaluate the effectiveness of the ensemble learning model against the classical time series prediction model. It is concluded that the ensemble learning model has a better prediction effect than the classical model, and the prediction effectiveness can be improved by exploiting water quality and meteorological factors (radiation, air pressure, precipitation).
AB - According to the World Health Organization, about 20 million people are infected with Hepatitis E every year. In 2015, there were 44,000 deaths due to HEV infection worldwide. Food, water and climate are key factors that affect the outbreak of Hepatitis E. This paper presents an ensemble learning model for Hepatitis E prediction by studying the correlation between historical epidemic cases of hepatitis E and environmental factors (water quality and meteorological data). Environmental factors include many features, and ones that are most relevant to HEV are selected and input into the ensemble learning model composed by Gradient Boosting Decision Tree (GBDT) and Random Forest for training and prediction. Three indicators, root mean square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE), are used to evaluate the effectiveness of the ensemble learning model against the classical time series prediction model. It is concluded that the ensemble learning model has a better prediction effect than the classical model, and the prediction effectiveness can be improved by exploiting water quality and meteorological factors (radiation, air pressure, precipitation).
KW - Ensemble learning
KW - Hepatitis E
KW - Prediction
UR - http://www.scopus.com/inward/record.url?scp=85098562073&partnerID=8YFLogxK
U2 - 10.3390/ijerph18010159
DO - 10.3390/ijerph18010159
M3 - Article
C2 - 33379298
AN - SCOPUS:85098562073
SN - 1661-7827
VL - 18
SP - 1
EP - 18
JO - International Journal of Environmental Research and Public Health
JF - International Journal of Environmental Research and Public Health
IS - 1
M1 - 159
ER -