TY - JOUR
T1 - CRLEDD
T2 - Regularized Causalities Learning for Early Detection of Diseases Using Electronic Health Record (EHR) Data
AU - Bian, Jiang
AU - Yang, Sijia
AU - Xiong, Haoyi
AU - Wang, Licheng
AU - Fu, Yanjie
AU - Sun, Zeyi
AU - Guo, Zhishan
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2021/8
Y1 - 2021/8
N2 - The availability of Electronic Health Records (EHR) in health care settings has provided tremendous opportunities for early disease detection. While many supervised learning models have been adopted for EHR-based disease early detection, the ill-posed inverse problem in the parameter learning has imposed a significant challenge on improving the accuracy of these algorithms. In this paper, we propose CRLEDD - Causality-Regularized Learning for Early Detection of Disease, an algorithm to improve the performance of Linear Discriminant Analysis (LDA) on top of diagnosis-frequency vector data representation. While most existing regularization methods exploit sparsity regularization to improve detection performance, CRLEDD provides a unique perspective by ensuring positive semi-definiteness of the sparsified precision matrix used in LDA which is different from the regular regularization method (e.g., L2 regularization). To achieve this goal, CRLEDD employs Graphical Lasso to estimate the precision matrix in the ill-posed settings for enhanced accuracy of LDA classifiers. We perform extensive evaluation of CRLEDD using a large-scale real-world EHR dataset to predict mental health disorders (e.g., depression and anxiety) of college students from 10 universities in the U.S. We compare CRLEDD with other regularized LDA and downstream classifiers. The result shows that CRLEDD outperforms all baselines in terms of accuracy and F1 scores.
AB - The availability of Electronic Health Records (EHR) in health care settings has provided tremendous opportunities for early disease detection. While many supervised learning models have been adopted for EHR-based disease early detection, the ill-posed inverse problem in the parameter learning has imposed a significant challenge on improving the accuracy of these algorithms. In this paper, we propose CRLEDD - Causality-Regularized Learning for Early Detection of Disease, an algorithm to improve the performance of Linear Discriminant Analysis (LDA) on top of diagnosis-frequency vector data representation. While most existing regularization methods exploit sparsity regularization to improve detection performance, CRLEDD provides a unique perspective by ensuring positive semi-definiteness of the sparsified precision matrix used in LDA which is different from the regular regularization method (e.g., L2 regularization). To achieve this goal, CRLEDD employs Graphical Lasso to estimate the precision matrix in the ill-posed settings for enhanced accuracy of LDA classifiers. We perform extensive evaluation of CRLEDD using a large-scale real-world EHR dataset to predict mental health disorders (e.g., depression and anxiety) of college students from 10 universities in the U.S. We compare CRLEDD with other regularized LDA and downstream classifiers. The result shows that CRLEDD outperforms all baselines in terms of accuracy and F1 scores.
KW - Classification algorithms
KW - detection algorithms
KW - linear discriminant analysis
UR - https://www.scopus.com/pages/publications/85099547992
U2 - 10.1109/TETCI.2020.3010017
DO - 10.1109/TETCI.2020.3010017
M3 - Article
AN - SCOPUS:85099547992
SN - 2471-285X
VL - 5
SP - 541
EP - 553
JO - IEEE Transactions on Emerging Topics in Computational Intelligence
JF - IEEE Transactions on Emerging Topics in Computational Intelligence
IS - 4
M1 - 9163317
ER -