Octopus: An End-to-end Multi-DAG Scheduling Method Based on Deep Reinforcement Learning

Yi Chang*, Haosong Peng, Yufeng Zhan, Yuanqing Xia

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

With the rapid growth of cloud computing, more and more vendors are deploying their services to the cloud. Efficient job scheduling is essential for enhancing system operation performance. These services, represented as Directed Acyclic Graphs (DAGs), usually have intricate dependencies. Existing research has limitations in solving the multi-DAG job scheduling problem and often overlooks end-to-end scheduling directly from tasks to servers. For example, scheduling each job individually without considering the overall information of all jobs might lead to an extended total completion time. To address these issues, this paper proposes Octopus, an intelligent end-to-end multi-DAG jobs scheduling algorithm based on deep reinforcement learning. Octopus is designed to address the challenges of dynamic and large input dimensions in the multi-DAG scheduling problem. A graph neural network feature extraction module is designed to extract the topological structure of multi-DAG jobs. The improved kernel-based network is then used to handle dynamic inputs. Simulation experiments conducted on different scales of DAG jobs and servers demonstrate that our approach can reduce the overall completion time of multi-DAG jobs up to 30% compared to traditional scheduling methods.

Original languageEnglish
Title of host publicationProceedings of the 43rd Chinese Control Conference, CCC 2024
EditorsJing Na, Jian Sun
PublisherIEEE Computer Society
Pages2588-2593
Number of pages6
ISBN (Electronic)9789887581581
DOIs
Publication statusPublished - 2024
Event43rd Chinese Control Conference, CCC 2024 - Kunming, China
Duration: 28 Jul 202431 Jul 2024

Publication series

NameChinese Control Conference, CCC
ISSN (Print)1934-1768
ISSN (Electronic)2161-2927

Conference

Conference43rd Chinese Control Conference, CCC 2024
Country/TerritoryChina
CityKunming
Period28/07/2431/07/24

Keywords

  • Deep Reinforcement Learning
  • End-to-end Scheduling
  • Graph Neural Network
  • Job Scheduling

Fingerprint

Dive into the research topics of 'Octopus: An End-to-end Multi-DAG Scheduling Method Based on Deep Reinforcement Learning'. Together they form a unique fingerprint.

Cite this