Multi-job Merging Framework and Scheduling Optimization for Apache Flink

Hangxu Ji, Gang Wu*, Yuhai Zhao, Ye Yuan, Guoren Wang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

With the popularization of big data technology, distributed computing systems are constantly evolving and maturing, making substantial contributions to the query and analysis of massive data. However, the insufficient utilization of system resources is an inherent problem of distributed computing engines. Particularly, when more jobs lead to execution blocking, the system schedules multiple jobs on a first-come-first-executed (FCFE) basis, even if there are still many remaining resources in the cluster. Therefore, the optimization of resource utilization is key to improving the efficiency of multi-job execution. We investigated the field of multi-job execution optimization, designed a multi-job merging framework and scheduling optimization algorithm, and implemented them in the latest generation of a distributed computing system, Apache Flink. In summary, the advantages of our work are highlighted as follows: (1) the framework enables Flink to support multi-job collection, merging and dynamic tuning of the execution sequence, and the selection of these functions are customizable. (2) with the multi-job merging and optimization, the total running time can be reduced by 31% compared with traditional sequential execution. (3) the multi-job scheduling optimization algorithm can bring 28% performance improvement, and in the average case can reduce the cluster idle resources by 61%.

Original languageEnglish
Title of host publicationDatabase Systems for Advanced Applications - 26th International Conference, DASFAA 2021, Proceedings
EditorsChristian S. Jensen, Ee-Peng Lim, De-Nian Yang, Wang-Chien Lee, Vincent S. Tseng, Vana Kalogeraki, Jen-Wei Huang, Chih-Ya Shen
PublisherSpringer Science and Business Media Deutschland GmbH
Pages20-36
Number of pages17
ISBN (Print)9783030731939
DOIs
Publication statusPublished - 2021
Event26th International Conference on Database Systems for Advanced Applications, DASFAA 2021 - Taipei, Taiwan, Province of China
Duration: 11 Apr 202114 Apr 2021

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12681 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference26th International Conference on Database Systems for Advanced Applications, DASFAA 2021
Country/TerritoryTaiwan, Province of China
CityTaipei
Period11/04/2114/04/21

Keywords

  • Distributed computing
  • Flink
  • Multi-job merging
  • Scheduling optimization

Fingerprint

Dive into the research topics of 'Multi-job Merging Framework and Scheduling Optimization for Apache Flink'. Together they form a unique fingerprint.

Cite this