Multi-job Merging Framework and Scheduling Optimization for Apache Flink

Hangxu Ji, Gang Wu*, Yuhai Zhao, Ye Yuan, Guoren Wang

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

With the popularization of big data technology, distributed computing systems are constantly evolving and maturing, making substantial contributions to the query and analysis of massive data. However, the insufficient utilization of system resources is an inherent problem of distributed computing engines. Particularly, when more jobs lead to execution blocking, the system schedules multiple jobs on a first-come-first-executed (FCFE) basis, even if there are still many remaining resources in the cluster. Therefore, the optimization of resource utilization is key to improving the efficiency of multi-job execution. We investigated the field of multi-job execution optimization, designed a multi-job merging framework and scheduling optimization algorithm, and implemented them in the latest generation of a distributed computing system, Apache Flink. In summary, the advantages of our work are highlighted as follows: (1) the framework enables Flink to support multi-job collection, merging and dynamic tuning of the execution sequence, and the selection of these functions are customizable. (2) with the multi-job merging and optimization, the total running time can be reduced by 31% compared with traditional sequential execution. (3) the multi-job scheduling optimization algorithm can bring 28% performance improvement, and in the average case can reduce the cluster idle resources by 61%.

源语言英语
主期刊名Database Systems for Advanced Applications - 26th International Conference, DASFAA 2021, Proceedings
编辑Christian S. Jensen, Ee-Peng Lim, De-Nian Yang, Wang-Chien Lee, Vincent S. Tseng, Vana Kalogeraki, Jen-Wei Huang, Chih-Ya Shen
出版商Springer Science and Business Media Deutschland GmbH
20-36
页数17
ISBN(印刷版)9783030731939
DOI
出版状态已出版 - 2021
活动26th International Conference on Database Systems for Advanced Applications, DASFAA 2021 - Taipei, 中国台湾
期限: 11 4月 202114 4月 2021

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
12681 LNCS
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议26th International Conference on Database Systems for Advanced Applications, DASFAA 2021
国家/地区中国台湾
Taipei
时期11/04/2114/04/21

指纹

探究 'Multi-job Merging Framework and Scheduling Optimization for Apache Flink' 的科研主题。它们共同构成独一无二的指纹。

引用此