TY - GEN
T1 - Interference-Aware Component Scheduling for Reducing Tail Latency in Cloud Interactive Services
AU - Han, Rui
AU - Wang, Junwei
AU - Huang, Siguang
AU - Shao, Chenrong
AU - Zhan, Shulin
AU - Zhan, Jianfeng
AU - Vazquez-Poletti, Jose Luis
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/7/22
Y1 - 2015/7/22
N2 - Large-scale interactive services usually divide requests into multiple sub-requests and distribute them to a large number of server components for parallel execution. Hence the tail latency (i.e. The slowest component's latency) of these components determines the overall service latency. On a cloud platform, each component shares and competes node resources such as caches and I/O bandwidths with its co-located jobs, hence inevitably suffering from their performance interference. In this paper, we study the short-running jobs in a 12k-node Google cluster to illustrate the dynamic resource demands of these jobs, resulting in both individual components' latency variability over time and across different nodes and hence posing a major challenge to maintain low tail latency. Given this motivation, this paper introduces a dynamic and interference-aware scheduler for large-scale, parallel cloud services. At each scheduling interval, it collects workload and resource contention information of a running service, and predicts both the component latency on different nodes and the overall service performance. Based on the predicted performance, the scheduler identifies straggling components and conducts near-optimal component-node allocations to adapt to the changing workloads and performance interferences. We demonstrate that, using realistic workloads, the proposed approach achieves significant reductions in tail latency compared to the basic approach without scheduling.
AB - Large-scale interactive services usually divide requests into multiple sub-requests and distribute them to a large number of server components for parallel execution. Hence the tail latency (i.e. The slowest component's latency) of these components determines the overall service latency. On a cloud platform, each component shares and competes node resources such as caches and I/O bandwidths with its co-located jobs, hence inevitably suffering from their performance interference. In this paper, we study the short-running jobs in a 12k-node Google cluster to illustrate the dynamic resource demands of these jobs, resulting in both individual components' latency variability over time and across different nodes and hence posing a major challenge to maintain low tail latency. Given this motivation, this paper introduces a dynamic and interference-aware scheduler for large-scale, parallel cloud services. At each scheduling interval, it collects workload and resource contention information of a running service, and predicts both the component latency on different nodes and the overall service performance. Based on the predicted performance, the scheduler identifies straggling components and conducts near-optimal component-node allocations to adapt to the changing workloads and performance interferences. We demonstrate that, using realistic workloads, the proposed approach achieves significant reductions in tail latency compared to the basic approach without scheduling.
KW - Cloud interactive services
KW - Interference-aware scheduler
KW - component latency variability
KW - tail latency
UR - http://www.scopus.com/inward/record.url?scp=84944321452&partnerID=8YFLogxK
U2 - 10.1109/ICDCS.2015.88
DO - 10.1109/ICDCS.2015.88
M3 - Conference contribution
AN - SCOPUS:84944321452
T3 - Proceedings - International Conference on Distributed Computing Systems
SP - 744
EP - 745
BT - Proceedings - 2015 IEEE 35th International Conference on Distributed Computing Systems, ICDCS 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 35th IEEE International Conference on Distributed Computing Systems, ICDCS 2015
Y2 - 29 June 2015 through 2 July 2015
ER -