TY - JOUR
T1 - Rafiki
T2 - 45th International Conference on Very Large Data Bases, VLDB 2019
AU - Wang, Wei
AU - Gao, Jinyang
AU - Zhang, Meihui
AU - Wang, Sheng
AU - Chen, Gang
AU - Ng, Teck Khim
AU - Ooi, Beng Chin
AU - Shao, Jie
AU - Reyad, Moaz
N1 - Publisher Copyright:
© 2018 VLDB Endowment 21508097/18/07.
PY - 2018
Y1 - 2018
N2 - Big data analytics is gaining massive momentum in the last few years. Applying machine learning models to big data has become an implicit requirement or an expectation for most analysis tasks, especially on high-stakes applications. Typical applications include sentiment analysis against reviews for analyzing on-line products, image classification in food logging applications for monitoring user's daily intake, and stock movement prediction. Extending traditional database systems to support the above analysis is intriguing but challenging. First, it is almost impossible to implement all machine learning models in the database engines. Second, expert knowledge is required to optimize the training and inference procedures in terms of efficiency and effectiveness, which imposes heavy burden on the system users. In this paper, we develop and present a system, called Rafiki, to provide the training and inference service of machine learning models. Rafiki provides distributed hyper-parameter tuning for the training service, and online ensemble modeling for the inference service which trades off between latency and accuracy. Experimental results confirm the efficiency, effectiveness, scalability and usability of Rafiki.
AB - Big data analytics is gaining massive momentum in the last few years. Applying machine learning models to big data has become an implicit requirement or an expectation for most analysis tasks, especially on high-stakes applications. Typical applications include sentiment analysis against reviews for analyzing on-line products, image classification in food logging applications for monitoring user's daily intake, and stock movement prediction. Extending traditional database systems to support the above analysis is intriguing but challenging. First, it is almost impossible to implement all machine learning models in the database engines. Second, expert knowledge is required to optimize the training and inference procedures in terms of efficiency and effectiveness, which imposes heavy burden on the system users. In this paper, we develop and present a system, called Rafiki, to provide the training and inference service of machine learning models. Rafiki provides distributed hyper-parameter tuning for the training service, and online ensemble modeling for the inference service which trades off between latency and accuracy. Experimental results confirm the efficiency, effectiveness, scalability and usability of Rafiki.
UR - http://www.scopus.com/inward/record.url?scp=85061759520&partnerID=8YFLogxK
U2 - 10.14778/3282495.3282499
DO - 10.14778/3282495.3282499
M3 - Conference article
AN - SCOPUS:85061759520
SN - 2150-8097
VL - 12
SP - 128
EP - 140
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
IS - 2
Y2 - 26 August 2017 through 30 August 2017
ER -