Octopus: An End-to-end Multi-DAG Scheduling Method Based on Deep Reinforcement Learning

Yi Chang; Haosong Peng; Yufeng Zhan; Yuanqing Xia

doi:10.23919/CCC63176.2024.10662729

Octopus: An End-to-end Multi-DAG Scheduling Method Based on Deep Reinforcement Learning

Yi Chang^*, Haosong Peng, Yufeng Zhan, Yuanqing Xia

^*此作品的通讯作者

自动化学院

Beijing Institute of Technology

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

With the rapid growth of cloud computing, more and more vendors are deploying their services to the cloud. Efficient job scheduling is essential for enhancing system operation performance. These services, represented as Directed Acyclic Graphs (DAGs), usually have intricate dependencies. Existing research has limitations in solving the multi-DAG job scheduling problem and often overlooks end-to-end scheduling directly from tasks to servers. For example, scheduling each job individually without considering the overall information of all jobs might lead to an extended total completion time. To address these issues, this paper proposes Octopus, an intelligent end-to-end multi-DAG jobs scheduling algorithm based on deep reinforcement learning. Octopus is designed to address the challenges of dynamic and large input dimensions in the multi-DAG scheduling problem. A graph neural network feature extraction module is designed to extract the topological structure of multi-DAG jobs. The improved kernel-based network is then used to handle dynamic inputs. Simulation experiments conducted on different scales of DAG jobs and servers demonstrate that our approach can reduce the overall completion time of multi-DAG jobs up to 30% compared to traditional scheduling methods.

源语言	英语
主期刊名	Proceedings of the 43rd Chinese Control Conference, CCC 2024
编辑	Jing Na, Jian Sun
出版商	IEEE Computer Society
页	2588-2593
页数	6
ISBN（电子版）	9789887581581
DOI	https://doi.org/10.23919/CCC63176.2024.10662729
出版状态	已出版 - 2024
活动	43rd Chinese Control Conference, CCC 2024 - Kunming, 中国期限: 28 7月 2024 → 31 7月 2024

出版系列

姓名	Chinese Control Conference, CCC
ISSN（印刷版）	1934-1768
ISSN（电子版）	2161-2927

会议

会议	43rd Chinese Control Conference, CCC 2024
国家/地区	中国
市	Kunming
时期	28/07/24 → 31/07/24

访问文件

10.23919/CCC63176.2024.10662729

其它文件与链接

链接到 Scopus 的出版物

引用此

@inproceedings{891a6b86110e43968aa2f235107f4f3f,

title = "Octopus: An End-to-end Multi-DAG Scheduling Method Based on Deep Reinforcement Learning",

abstract = "With the rapid growth of cloud computing, more and more vendors are deploying their services to the cloud. Efficient job scheduling is essential for enhancing system operation performance. These services, represented as Directed Acyclic Graphs (DAGs), usually have intricate dependencies. Existing research has limitations in solving the multi-DAG job scheduling problem and often overlooks end-to-end scheduling directly from tasks to servers. For example, scheduling each job individually without considering the overall information of all jobs might lead to an extended total completion time. To address these issues, this paper proposes Octopus, an intelligent end-to-end multi-DAG jobs scheduling algorithm based on deep reinforcement learning. Octopus is designed to address the challenges of dynamic and large input dimensions in the multi-DAG scheduling problem. A graph neural network feature extraction module is designed to extract the topological structure of multi-DAG jobs. The improved kernel-based network is then used to handle dynamic inputs. Simulation experiments conducted on different scales of DAG jobs and servers demonstrate that our approach can reduce the overall completion time of multi-DAG jobs up to 30% compared to traditional scheduling methods.",

keywords = "Deep Reinforcement Learning, End-to-end Scheduling, Graph Neural Network, Job Scheduling",

author = "Yi Chang and Haosong Peng and Yufeng Zhan and Yuanqing Xia",

note = "Publisher Copyright: {\textcopyright} 2024 Technical Committee on Control Theory, Chinese Association of Automation.; 43rd Chinese Control Conference, CCC 2024 ; Conference date: 28-07-2024 Through 31-07-2024",

year = "2024",

doi = "10.23919/CCC63176.2024.10662729",

language = "English",

series = "Chinese Control Conference, CCC",

publisher = "IEEE Computer Society",

pages = "2588--2593",

editor = "Jing Na and Jian Sun",

booktitle = "Proceedings of the 43rd Chinese Control Conference, CCC 2024",

address = "United States",

}

Chang, Y, Peng, H, Zhan, Y & Xia, Y 2024, Octopus: An End-to-end Multi-DAG Scheduling Method Based on Deep Reinforcement Learning. 在 J Na & J Sun (编辑), Proceedings of the 43rd Chinese Control Conference, CCC 2024. Chinese Control Conference, CCC, IEEE Computer Society, 页码 2588-2593, 43rd Chinese Control Conference, CCC 2024, Kunming, 中国, 28/07/24. https://doi.org/10.23919/CCC63176.2024.10662729

Octopus: An End-to-end Multi-DAG Scheduling Method Based on Deep Reinforcement Learning. / Chang, Yi; Peng, Haosong; Zhan, Yufeng 等.
Proceedings of the 43rd Chinese Control Conference, CCC 2024. 编辑 / Jing Na; Jian Sun. IEEE Computer Society, 2024. 页码 2588-2593 (Chinese Control Conference, CCC).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Octopus

T2 - 43rd Chinese Control Conference, CCC 2024

AU - Chang, Yi

AU - Peng, Haosong

AU - Zhan, Yufeng

AU - Xia, Yuanqing

PY - 2024

Y1 - 2024

N2 - With the rapid growth of cloud computing, more and more vendors are deploying their services to the cloud. Efficient job scheduling is essential for enhancing system operation performance. These services, represented as Directed Acyclic Graphs (DAGs), usually have intricate dependencies. Existing research has limitations in solving the multi-DAG job scheduling problem and often overlooks end-to-end scheduling directly from tasks to servers. For example, scheduling each job individually without considering the overall information of all jobs might lead to an extended total completion time. To address these issues, this paper proposes Octopus, an intelligent end-to-end multi-DAG jobs scheduling algorithm based on deep reinforcement learning. Octopus is designed to address the challenges of dynamic and large input dimensions in the multi-DAG scheduling problem. A graph neural network feature extraction module is designed to extract the topological structure of multi-DAG jobs. The improved kernel-based network is then used to handle dynamic inputs. Simulation experiments conducted on different scales of DAG jobs and servers demonstrate that our approach can reduce the overall completion time of multi-DAG jobs up to 30% compared to traditional scheduling methods.

AB - With the rapid growth of cloud computing, more and more vendors are deploying their services to the cloud. Efficient job scheduling is essential for enhancing system operation performance. These services, represented as Directed Acyclic Graphs (DAGs), usually have intricate dependencies. Existing research has limitations in solving the multi-DAG job scheduling problem and often overlooks end-to-end scheduling directly from tasks to servers. For example, scheduling each job individually without considering the overall information of all jobs might lead to an extended total completion time. To address these issues, this paper proposes Octopus, an intelligent end-to-end multi-DAG jobs scheduling algorithm based on deep reinforcement learning. Octopus is designed to address the challenges of dynamic and large input dimensions in the multi-DAG scheduling problem. A graph neural network feature extraction module is designed to extract the topological structure of multi-DAG jobs. The improved kernel-based network is then used to handle dynamic inputs. Simulation experiments conducted on different scales of DAG jobs and servers demonstrate that our approach can reduce the overall completion time of multi-DAG jobs up to 30% compared to traditional scheduling methods.

KW - Deep Reinforcement Learning

KW - End-to-end Scheduling

KW - Graph Neural Network

KW - Job Scheduling

UR - http://www.scopus.com/inward/record.url?scp=85205488205&partnerID=8YFLogxK

U2 - 10.23919/CCC63176.2024.10662729

DO - 10.23919/CCC63176.2024.10662729

M3 - Conference contribution

AN - SCOPUS:85205488205

T3 - Chinese Control Conference, CCC

SP - 2588

EP - 2593

BT - Proceedings of the 43rd Chinese Control Conference, CCC 2024

A2 - Na, Jing

A2 - Sun, Jian

PB - IEEE Computer Society

Y2 - 28 July 2024 through 31 July 2024

ER -

Octopus: An End-to-end Multi-DAG Scheduling Method Based on Deep Reinforcement Learning

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此