AdaptiveConfig: Run-time configuration of cluster schedulers for cloud short-running jobs

Rui Han, Zan Zong, Lydia Y. Chen, Siyi Wang, Jianfeng Zhan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Citations (Scopus)

Abstract

Cluster schedulers provide flexible resource sharing mechanism for short-running jobs, which occupy a majority of cloud jobs. A scheduler's configuration decides how to allocate resources among jobs and hence it is crucial to their performances. Today's cloud platforms usually rely on cluster administrators to set this configuration, thus it is difficult to optimally configure the scheduler so as to minimize the latencies of heterogeneous and dynamically changing jobs in the cloud. In this paper, we introduce AdaptiveConfig, a run-time configurator for cluster schedulers that automatically adapts to the changing workload and resource status. This includes: (1) an estimator to calculate jobs' performances under different configurations and various scheduling scenarios. The key idea here is to transform a scheduler's resource allocation mechanisms and their variable influence factors (configuration parameters, scheduling constraints, available resources, and workload status) into business rules and facts in a rule engine, thereby reasoning about these correlated factors in job performance estimation. (2) A run-time optimizer that efficiently searches the configuration space to find the optimal configuration for the current workload. We implemented AdaptiveConfig on the popular YARN Capacity and Fair schedulers and demonstrate its effectiveness using workloads of Facebook jobs, i.e. considerably reducing latencies by 2.22 times (and up to 4.50 times) with low optimization overheads.

Original languageEnglish
Title of host publicationProceedings - 2018 IEEE 38th International Conference on Distributed Computing Systems, ICDCS 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1519-1526
Number of pages8
ISBN (Electronic)9781538668719
DOIs
Publication statusPublished - 19 Jul 2018
Externally publishedYes
Event38th IEEE International Conference on Distributed Computing Systems, ICDCS 2018 - Vienna, Austria
Duration: 2 Jul 20185 Jul 2018

Publication series

NameProceedings - International Conference on Distributed Computing Systems
Volume2018-July

Conference

Conference38th IEEE International Conference on Distributed Computing Systems, ICDCS 2018
Country/TerritoryAustria
CityVienna
Period2/07/185/07/18

Keywords

  • Cloud Computing
  • Cluster Scheduler
  • Run-Time Configuration
  • Short-Running Jobs

Fingerprint

Dive into the research topics of 'AdaptiveConfig: Run-time configuration of cluster schedulers for cloud short-running jobs'. Together they form a unique fingerprint.

Cite this