PCS: Predictive component-level scheduling for reducing tail latency in cloud online services

Rui Han; Junwei Wang; Siguang Huang; Chenrong Shao; Shulin Zhan; Jianfeng Zhan; Jose Luis Vazquez-Poletti

doi:10.1109/ICPP.2015.58

PCS: Predictive component-level scheduling for reducing tail latency in cloud online services

Rui Han, Junwei Wang, Siguang Huang, Chenrong Shao, Shulin Zhan, Jianfeng Zhan, Jose Luis Vazquez-Poletti

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

8 引用（Scopus）

摘要

Modern latency-critical online services often rely on composing results from a large number of server components. Hence the tail latency (e.g. The 99th percentile of response time), rather than the average, of these components determines the overall service performance. When hosted on a cloud environment, the components of a service typically co-locate with short batch jobs to increase machine utilizations, and share and contend resources such as caches and I/O bandwidths with them. The highly dynamic nature of batch jobs in terms of their workload types and input sizes causes continuously changing performance interference to individual components, hence leading to their latency variability and high tail latency. However, existing techniques either ignore such fine-grained component latency variability when managing service performance, or rely on executing redundant requests to reduce the tail latency, which adversely deteriorate the service performance when load gets heavier. In this paper, we propose PCS, a predictive and component-level scheduling framework to reduce tail latency for large-scale, parallel online services. It uses an analytical performance model to simultaneously predict the component latency and the overall service performance on different nodes. Based on the predicted performance, the scheduler identifies straggling components and conducts near-optimal component-node allocations to adapt to the changing performance interferences from batch jobs. We demonstrate that, using realistic workloads, the proposed scheduler reduces the component tail latency by an average of 67.05% and the average overall service latency by 64.16% compared with the state-of-the-art techniques on reducing tail latency.

源语言	英语
主期刊名	Proceedings - 2015 44th International Annual Conference on Parallel Processing, ICPP 2015
出版商	Institute of Electrical and Electronics Engineers Inc.
页	490-499
页数	10
ISBN（电子版）	9781467375870
DOI	https://doi.org/10.1109/ICPP.2015.58
出版状态	已出版 - 8 12月 2015
已对外发布	是
活动	44th International Conference on Parallel Processing, ICPP 2015 - Beijing, 中国期限: 1 9月 2015 → 4 9月 2015

出版系列

姓名	Proceedings of the International Conference on Parallel Processing
卷	2015-December
ISSN（印刷版）	0190-3918

会议

会议	44th International Conference on Parallel Processing, ICPP 2015
国家/地区	中国
市	Beijing
时期	1/09/15 → 4/09/15

访问文件

10.1109/ICPP.2015.58

其它文件与链接

链接到 Scopus 的出版物

引用此

Han, R., Wang, J., Huang, S., Shao, C., Zhan, S., Zhan, J., & Vazquez-Poletti, J. L. (2015). PCS: Predictive component-level scheduling for reducing tail latency in cloud online services. 在 Proceedings - 2015 44th International Annual Conference on Parallel Processing, ICPP 2015 (页码 490-499). 文章 7349604 (Proceedings of the International Conference on Parallel Processing; 卷 2015-December). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICPP.2015.58

Han, Rui ; Wang, Junwei ; Huang, Siguang 等. / PCS : Predictive component-level scheduling for reducing tail latency in cloud online services. Proceedings - 2015 44th International Annual Conference on Parallel Processing, ICPP 2015. Institute of Electrical and Electronics Engineers Inc., 2015. 页码 490-499 (Proceedings of the International Conference on Parallel Processing).

@inproceedings{42d9fb7007924e768bf305989ce976dd,

title = "PCS: Predictive component-level scheduling for reducing tail latency in cloud online services",

abstract = "Modern latency-critical online services often rely on composing results from a large number of server components. Hence the tail latency (e.g. The 99th percentile of response time), rather than the average, of these components determines the overall service performance. When hosted on a cloud environment, the components of a service typically co-locate with short batch jobs to increase machine utilizations, and share and contend resources such as caches and I/O bandwidths with them. The highly dynamic nature of batch jobs in terms of their workload types and input sizes causes continuously changing performance interference to individual components, hence leading to their latency variability and high tail latency. However, existing techniques either ignore such fine-grained component latency variability when managing service performance, or rely on executing redundant requests to reduce the tail latency, which adversely deteriorate the service performance when load gets heavier. In this paper, we propose PCS, a predictive and component-level scheduling framework to reduce tail latency for large-scale, parallel online services. It uses an analytical performance model to simultaneously predict the component latency and the overall service performance on different nodes. Based on the predicted performance, the scheduler identifies straggling components and conducts near-optimal component-node allocations to adapt to the changing performance interferences from batch jobs. We demonstrate that, using realistic workloads, the proposed scheduler reduces the component tail latency by an average of 67.05% and the average overall service latency by 64.16% compared with the state-of-the-art techniques on reducing tail latency.",

keywords = "Cloud online services, Component latency variability, Predictive scheduler, Tail latency",

author = "Rui Han and Junwei Wang and Siguang Huang and Chenrong Shao and Shulin Zhan and Jianfeng Zhan and Vazquez-Poletti, {Jose Luis}",

note = "Publisher Copyright: {\textcopyright} 2015 IEEE.; 44th International Conference on Parallel Processing, ICPP 2015 ; Conference date: 01-09-2015 Through 04-09-2015",

year = "2015",

month = dec,

day = "8",

doi = "10.1109/ICPP.2015.58",

language = "English",

series = "Proceedings of the International Conference on Parallel Processing",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "490--499",

booktitle = "Proceedings - 2015 44th International Annual Conference on Parallel Processing, ICPP 2015",

address = "United States",

}

Han, R, Wang, J, Huang, S, Shao, C, Zhan, S, Zhan, J & Vazquez-Poletti, JL 2015, PCS: Predictive component-level scheduling for reducing tail latency in cloud online services. 在 Proceedings - 2015 44th International Annual Conference on Parallel Processing, ICPP 2015., 7349604, Proceedings of the International Conference on Parallel Processing, 卷 2015-December, Institute of Electrical and Electronics Engineers Inc., 页码 490-499, 44th International Conference on Parallel Processing, ICPP 2015, Beijing, 中国, 1/09/15. https://doi.org/10.1109/ICPP.2015.58

PCS: Predictive component-level scheduling for reducing tail latency in cloud online services. / Han, Rui; Wang, Junwei; Huang, Siguang 等.
Proceedings - 2015 44th International Annual Conference on Parallel Processing, ICPP 2015. Institute of Electrical and Electronics Engineers Inc., 2015. 页码 490-499 7349604 (Proceedings of the International Conference on Parallel Processing; 卷 2015-December).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - PCS

T2 - 44th International Conference on Parallel Processing, ICPP 2015

AU - Han, Rui

AU - Wang, Junwei

AU - Huang, Siguang

AU - Shao, Chenrong

AU - Zhan, Shulin

AU - Zhan, Jianfeng

AU - Vazquez-Poletti, Jose Luis

PY - 2015/12/8

Y1 - 2015/12/8

N2 - Modern latency-critical online services often rely on composing results from a large number of server components. Hence the tail latency (e.g. The 99th percentile of response time), rather than the average, of these components determines the overall service performance. When hosted on a cloud environment, the components of a service typically co-locate with short batch jobs to increase machine utilizations, and share and contend resources such as caches and I/O bandwidths with them. The highly dynamic nature of batch jobs in terms of their workload types and input sizes causes continuously changing performance interference to individual components, hence leading to their latency variability and high tail latency. However, existing techniques either ignore such fine-grained component latency variability when managing service performance, or rely on executing redundant requests to reduce the tail latency, which adversely deteriorate the service performance when load gets heavier. In this paper, we propose PCS, a predictive and component-level scheduling framework to reduce tail latency for large-scale, parallel online services. It uses an analytical performance model to simultaneously predict the component latency and the overall service performance on different nodes. Based on the predicted performance, the scheduler identifies straggling components and conducts near-optimal component-node allocations to adapt to the changing performance interferences from batch jobs. We demonstrate that, using realistic workloads, the proposed scheduler reduces the component tail latency by an average of 67.05% and the average overall service latency by 64.16% compared with the state-of-the-art techniques on reducing tail latency.

AB - Modern latency-critical online services often rely on composing results from a large number of server components. Hence the tail latency (e.g. The 99th percentile of response time), rather than the average, of these components determines the overall service performance. When hosted on a cloud environment, the components of a service typically co-locate with short batch jobs to increase machine utilizations, and share and contend resources such as caches and I/O bandwidths with them. The highly dynamic nature of batch jobs in terms of their workload types and input sizes causes continuously changing performance interference to individual components, hence leading to their latency variability and high tail latency. However, existing techniques either ignore such fine-grained component latency variability when managing service performance, or rely on executing redundant requests to reduce the tail latency, which adversely deteriorate the service performance when load gets heavier. In this paper, we propose PCS, a predictive and component-level scheduling framework to reduce tail latency for large-scale, parallel online services. It uses an analytical performance model to simultaneously predict the component latency and the overall service performance on different nodes. Based on the predicted performance, the scheduler identifies straggling components and conducts near-optimal component-node allocations to adapt to the changing performance interferences from batch jobs. We demonstrate that, using realistic workloads, the proposed scheduler reduces the component tail latency by an average of 67.05% and the average overall service latency by 64.16% compared with the state-of-the-art techniques on reducing tail latency.

KW - Cloud online services

KW - Component latency variability

KW - Predictive scheduler

KW - Tail latency

UR - http://www.scopus.com/inward/record.url?scp=84976463179&partnerID=8YFLogxK

U2 - 10.1109/ICPP.2015.58

DO - 10.1109/ICPP.2015.58

M3 - Conference contribution

AN - SCOPUS:84976463179

T3 - Proceedings of the International Conference on Parallel Processing

SP - 490

EP - 499

BT - Proceedings - 2015 44th International Annual Conference on Parallel Processing, ICPP 2015

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 1 September 2015 through 4 September 2015

ER -

Han R, Wang J, Huang S, Shao C, Zhan S, Zhan J 等. PCS: Predictive component-level scheduling for reducing tail latency in cloud online services. 在 Proceedings - 2015 44th International Annual Conference on Parallel Processing, ICPP 2015. Institute of Electrical and Electronics Engineers Inc. 2015. 页码 490-499. 7349604. (Proceedings of the International Conference on Parallel Processing). doi: 10.1109/ICPP.2015.58

PCS: Predictive component-level scheduling for reducing tail latency in cloud online services

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此