TY - JOUR
T1 - Improving Data Analytics with Fast and Adaptive Regularization
AU - Luo, Zhaojing
AU - Cai, Shaofeng
AU - Chen, Gang
AU - Gao, Jinyang
AU - Lee, Wang Chien
AU - Ngiam, Kee Yuan
AU - Zhang, Meihui
N1 - Publisher Copyright:
© 1989-2012 IEEE.
PY - 2021/2/1
Y1 - 2021/2/1
N2 - Deep Learning and Machine Learning models have recently been shown to be effective in many real world applications. While these models achieve increasingly better predictive performance, their structures have also become much more complex. A common and difficult problem for complex models is overfitting. Regularization is used to penalize the complexity of the model in order to avoid overfitting. However, in most learning frameworks, regularization function is usually set with some hyper-parameters where the best setting is difficult to find. In this paper, we propose an adaptive regularization method, as part of a large end-to-end healthcare data analytics software stack, which effectively addresses the above difficulty. First, we propose a general adaptive regularization method based on Gaussian Mixture (GM) to learn the best regularization function according to the observed parameters. Second, we develop an effective update algorithm which integrates Expectation Maximization (EM) with Stochastic Gradient Descent (SGD). Third, we design a lazy update and sparse update algorithm to reduce the computational cost by 4x and 20x, respectively. The overall regularization framework is fast, adaptive, and easy-to-use. We validate the effectiveness of our regularization method through an extensive experimental study over 14 standard benchmark datasets and three kinds of deep learning/machine learning models. The results illustrate that our proposed adaptive regularization method achieves significant improvement over state-of-the-art regularization methods.
AB - Deep Learning and Machine Learning models have recently been shown to be effective in many real world applications. While these models achieve increasingly better predictive performance, their structures have also become much more complex. A common and difficult problem for complex models is overfitting. Regularization is used to penalize the complexity of the model in order to avoid overfitting. However, in most learning frameworks, regularization function is usually set with some hyper-parameters where the best setting is difficult to find. In this paper, we propose an adaptive regularization method, as part of a large end-to-end healthcare data analytics software stack, which effectively addresses the above difficulty. First, we propose a general adaptive regularization method based on Gaussian Mixture (GM) to learn the best regularization function according to the observed parameters. Second, we develop an effective update algorithm which integrates Expectation Maximization (EM) with Stochastic Gradient Descent (SGD). Third, we design a lazy update and sparse update algorithm to reduce the computational cost by 4x and 20x, respectively. The overall regularization framework is fast, adaptive, and easy-to-use. We validate the effectiveness of our regularization method through an extensive experimental study over 14 standard benchmark datasets and three kinds of deep learning/machine learning models. The results illustrate that our proposed adaptive regularization method achieves significant improvement over state-of-the-art regularization methods.
KW - Adaptive regularization
KW - complex analytics
KW - data analytics
KW - data science
KW - knowledge discovery and data mining
UR - http://www.scopus.com/inward/record.url?scp=85099483484&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2019.2916683
DO - 10.1109/TKDE.2019.2916683
M3 - Article
AN - SCOPUS:85099483484
SN - 1041-4347
VL - 33
SP - 551
EP - 568
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 2
M1 - 8713578
ER -