Octopus: An End-to-end Multi-DAG Scheduling Method Based on Deep Reinforcement Learning

Yi Chang; Haosong Peng; Yufeng Zhan; Yuanqing Xia

doi:10.23919/CCC63176.2024.10662729

Octopus: An End-to-end Multi-DAG Scheduling Method Based on Deep Reinforcement Learning

Yi Chang^*, Haosong Peng, Yufeng Zhan, Yuanqing Xia

^*Corresponding author for this work

School of Automation

Beijing Institute of Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

With the rapid growth of cloud computing, more and more vendors are deploying their services to the cloud. Efficient job scheduling is essential for enhancing system operation performance. These services, represented as Directed Acyclic Graphs (DAGs), usually have intricate dependencies. Existing research has limitations in solving the multi-DAG job scheduling problem and often overlooks end-to-end scheduling directly from tasks to servers. For example, scheduling each job individually without considering the overall information of all jobs might lead to an extended total completion time. To address these issues, this paper proposes Octopus, an intelligent end-to-end multi-DAG jobs scheduling algorithm based on deep reinforcement learning. Octopus is designed to address the challenges of dynamic and large input dimensions in the multi-DAG scheduling problem. A graph neural network feature extraction module is designed to extract the topological structure of multi-DAG jobs. The improved kernel-based network is then used to handle dynamic inputs. Simulation experiments conducted on different scales of DAG jobs and servers demonstrate that our approach can reduce the overall completion time of multi-DAG jobs up to 30% compared to traditional scheduling methods.

Original language	English
Title of host publication	Proceedings of the 43rd Chinese Control Conference, CCC 2024
Editors	Jing Na, Jian Sun
Publisher	IEEE Computer Society
Pages	2588-2593
Number of pages	6
ISBN (Electronic)	9789887581581
DOIs	https://doi.org/10.23919/CCC63176.2024.10662729
Publication status	Published - 2024
Event	43rd Chinese Control Conference, CCC 2024 - Kunming, China Duration: 28 Jul 2024 → 31 Jul 2024

Publication series

Name	Chinese Control Conference, CCC
ISSN (Print)	1934-1768
ISSN (Electronic)	2161-2927

Conference

Conference	43rd Chinese Control Conference, CCC 2024
Country/Territory	China
City	Kunming
Period	28/07/24 → 31/07/24

Keywords

Deep Reinforcement Learning
End-to-end Scheduling
Graph Neural Network
Job Scheduling

Access to Document

10.23919/CCC63176.2024.10662729

Cite this

@inproceedings{891a6b86110e43968aa2f235107f4f3f,

title = "Octopus: An End-to-end Multi-DAG Scheduling Method Based on Deep Reinforcement Learning",

abstract = "With the rapid growth of cloud computing, more and more vendors are deploying their services to the cloud. Efficient job scheduling is essential for enhancing system operation performance. These services, represented as Directed Acyclic Graphs (DAGs), usually have intricate dependencies. Existing research has limitations in solving the multi-DAG job scheduling problem and often overlooks end-to-end scheduling directly from tasks to servers. For example, scheduling each job individually without considering the overall information of all jobs might lead to an extended total completion time. To address these issues, this paper proposes Octopus, an intelligent end-to-end multi-DAG jobs scheduling algorithm based on deep reinforcement learning. Octopus is designed to address the challenges of dynamic and large input dimensions in the multi-DAG scheduling problem. A graph neural network feature extraction module is designed to extract the topological structure of multi-DAG jobs. The improved kernel-based network is then used to handle dynamic inputs. Simulation experiments conducted on different scales of DAG jobs and servers demonstrate that our approach can reduce the overall completion time of multi-DAG jobs up to 30% compared to traditional scheduling methods.",

keywords = "Deep Reinforcement Learning, End-to-end Scheduling, Graph Neural Network, Job Scheduling",

author = "Yi Chang and Haosong Peng and Yufeng Zhan and Yuanqing Xia",

note = "Publisher Copyright: {\textcopyright} 2024 Technical Committee on Control Theory, Chinese Association of Automation.; 43rd Chinese Control Conference, CCC 2024 ; Conference date: 28-07-2024 Through 31-07-2024",

year = "2024",

doi = "10.23919/CCC63176.2024.10662729",

language = "English",

series = "Chinese Control Conference, CCC",

publisher = "IEEE Computer Society",

pages = "2588--2593",

editor = "Jing Na and Jian Sun",

booktitle = "Proceedings of the 43rd Chinese Control Conference, CCC 2024",

address = "United States",

}

Chang, Y, Peng, H, Zhan, Y & Xia, Y 2024, Octopus: An End-to-end Multi-DAG Scheduling Method Based on Deep Reinforcement Learning. in J Na & J Sun (eds), Proceedings of the 43rd Chinese Control Conference, CCC 2024. Chinese Control Conference, CCC, IEEE Computer Society, pp. 2588-2593, 43rd Chinese Control Conference, CCC 2024, Kunming, China, 28/07/24. https://doi.org/10.23919/CCC63176.2024.10662729

Octopus: An End-to-end Multi-DAG Scheduling Method Based on Deep Reinforcement Learning. / Chang, Yi; Peng, Haosong; Zhan, Yufeng et al.
Proceedings of the 43rd Chinese Control Conference, CCC 2024. ed. / Jing Na; Jian Sun. IEEE Computer Society, 2024. p. 2588-2593 (Chinese Control Conference, CCC).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Octopus

T2 - 43rd Chinese Control Conference, CCC 2024

AU - Chang, Yi

AU - Peng, Haosong

AU - Zhan, Yufeng

AU - Xia, Yuanqing

PY - 2024

Y1 - 2024

N2 - With the rapid growth of cloud computing, more and more vendors are deploying their services to the cloud. Efficient job scheduling is essential for enhancing system operation performance. These services, represented as Directed Acyclic Graphs (DAGs), usually have intricate dependencies. Existing research has limitations in solving the multi-DAG job scheduling problem and often overlooks end-to-end scheduling directly from tasks to servers. For example, scheduling each job individually without considering the overall information of all jobs might lead to an extended total completion time. To address these issues, this paper proposes Octopus, an intelligent end-to-end multi-DAG jobs scheduling algorithm based on deep reinforcement learning. Octopus is designed to address the challenges of dynamic and large input dimensions in the multi-DAG scheduling problem. A graph neural network feature extraction module is designed to extract the topological structure of multi-DAG jobs. The improved kernel-based network is then used to handle dynamic inputs. Simulation experiments conducted on different scales of DAG jobs and servers demonstrate that our approach can reduce the overall completion time of multi-DAG jobs up to 30% compared to traditional scheduling methods.

AB - With the rapid growth of cloud computing, more and more vendors are deploying their services to the cloud. Efficient job scheduling is essential for enhancing system operation performance. These services, represented as Directed Acyclic Graphs (DAGs), usually have intricate dependencies. Existing research has limitations in solving the multi-DAG job scheduling problem and often overlooks end-to-end scheduling directly from tasks to servers. For example, scheduling each job individually without considering the overall information of all jobs might lead to an extended total completion time. To address these issues, this paper proposes Octopus, an intelligent end-to-end multi-DAG jobs scheduling algorithm based on deep reinforcement learning. Octopus is designed to address the challenges of dynamic and large input dimensions in the multi-DAG scheduling problem. A graph neural network feature extraction module is designed to extract the topological structure of multi-DAG jobs. The improved kernel-based network is then used to handle dynamic inputs. Simulation experiments conducted on different scales of DAG jobs and servers demonstrate that our approach can reduce the overall completion time of multi-DAG jobs up to 30% compared to traditional scheduling methods.

KW - Deep Reinforcement Learning

KW - End-to-end Scheduling

KW - Graph Neural Network

KW - Job Scheduling

UR - http://www.scopus.com/inward/record.url?scp=85205488205&partnerID=8YFLogxK

U2 - 10.23919/CCC63176.2024.10662729

DO - 10.23919/CCC63176.2024.10662729

M3 - Conference contribution

AN - SCOPUS:85205488205

T3 - Chinese Control Conference, CCC

SP - 2588

EP - 2593

BT - Proceedings of the 43rd Chinese Control Conference, CCC 2024

A2 - Na, Jing

A2 - Sun, Jian

PB - IEEE Computer Society

Y2 - 28 July 2024 through 31 July 2024

ER -

Octopus: An End-to-end Multi-DAG Scheduling Method Based on Deep Reinforcement Learning

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this