Online Runtime Prediction Method for Distributed Iterative Jobs

Xiaofei Yue, Lan Shi, Yuhai Zhao*, Hangxu Ji, Guoren Wang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

Predicting the runtime of distributed iterative jobs can help reduce the deployment cost of clusters and optimize their resource allocation and scheduling strategies, but the runtime depends on various factors which are difficult to be acquired before execution. In this paper, we propose a generalized online prediction method for the runtime of distributed iterative jobs, which is centered on a series of online machine learning models. The method consists of three phases: 1) estimating the number of iterations for the current iterative job. 2) predicting the runtime metrics of each iteration by an online polynomial regression model. 3) Runtime metrics sequence is analyzed using an LSTM trained with online learning to predict the runtime of each iteration. We conducted experiments on typical Flink iterative jobs, and the experimental results show that our method improves the accuracy by 4.79% compared to the state-of-the-art methods, while for the improvement in accuracy for delta iterative jobs is even more than 15%.

Original languageEnglish
Title of host publicationWeb Information Systems and Applications - 18th International Conference, WISA 2021, Proceedings
EditorsChunxiao Xing, Xiaoming Fu, Yong Zhang, Guigang Zhang, Chaolemen Borjigin
PublisherSpringer Science and Business Media Deutschland GmbH
Pages156-168
Number of pages13
ISBN (Print)9783030875701
DOIs
Publication statusPublished - 2021
Event18th International Conference on Web Information Systems and Applications, WISA 2021 - Kaifeng, China
Duration: 24 Sept 202126 Sept 2021

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12999 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference18th International Conference on Web Information Systems and Applications, WISA 2021
Country/TerritoryChina
CityKaifeng
Period24/09/2126/09/21

Keywords

  • Flink
  • Iterative job
  • LSTM
  • Online runtime prediction
  • Polynomial regression

Fingerprint

Dive into the research topics of 'Online Runtime Prediction Method for Distributed Iterative Jobs'. Together they form a unique fingerprint.

Cite this