TY - JOUR
T1 - 新型分布式计算系统中的异构任务调度框架
AU - Liu, Rui Qi
AU - Li, Bo Yang
AU - Gao, Yu Jin
AU - Li, Chang Sheng
AU - Zhao, Heng Tai
AU - Jin, Fu Sheng
AU - Li, Rong Hua
AU - Wang, Guo Ren
N1 - Publisher Copyright:
© Copyright 2022, Institute of Software, the Chinese Academy of Sciences. All rights reserved.
PY - 2022/3
Y1 - 2022/3
N2 - With the rapid development of big data and machine learning, the distributed big data computing engine for machine learning have emerged. These systems can support both batch distributed learning and incremental learning and verification, with low latency and high performance. However, some of them adopt a random task scheduling strategy, ignoring the performance differences of nodes, which easily lead to uneven load and performance degradation. At the same time, for some tasks, if the resource requirements are not met, the scheduling will fail. In response to these problems, a heterogeneous task scheduling framework is proposed, which can ensure the efficient execution and execution of tasks. Specifically, for the task scheduling module, the proposed framework proposes a probabilistic random scheduling strategy resource-Pick_kx and a definite smooth weighted round-robin algorithm around the heterogeneous computing resources of nodes. The resource-Pick_kx al-gorithm calculates the probability according to the performance of the node, and performs random scheduling with probability. The higher the probability of a node with high performance, the higher the possibility of task scheduling to this node. The smooth weighted round-robin algorithm sets the weights according to the node performance at the beginning, and smoothly weights during the scheduling process, so that the task is scheduled to the node with the highest performance. In addition, for task scenarios where resources do not meet the requirements, a container-based vertical expansion mechanism is proposed to customize task resources, create nodes to join the cluster, and complete task scheduling again. The performance of the framework is tested on benchmarks and public data sets through ex-periments. Compared with the current strategy, the performance of the proposed frame is improved by 10% to 20%.
AB - With the rapid development of big data and machine learning, the distributed big data computing engine for machine learning have emerged. These systems can support both batch distributed learning and incremental learning and verification, with low latency and high performance. However, some of them adopt a random task scheduling strategy, ignoring the performance differences of nodes, which easily lead to uneven load and performance degradation. At the same time, for some tasks, if the resource requirements are not met, the scheduling will fail. In response to these problems, a heterogeneous task scheduling framework is proposed, which can ensure the efficient execution and execution of tasks. Specifically, for the task scheduling module, the proposed framework proposes a probabilistic random scheduling strategy resource-Pick_kx and a definite smooth weighted round-robin algorithm around the heterogeneous computing resources of nodes. The resource-Pick_kx al-gorithm calculates the probability according to the performance of the node, and performs random scheduling with probability. The higher the probability of a node with high performance, the higher the possibility of task scheduling to this node. The smooth weighted round-robin algorithm sets the weights according to the node performance at the beginning, and smoothly weights during the scheduling process, so that the task is scheduled to the node with the highest performance. In addition, for task scenarios where resources do not meet the requirements, a container-based vertical expansion mechanism is proposed to customize task resources, create nodes to join the cluster, and complete task scheduling again. The performance of the framework is tested on benchmarks and public data sets through ex-periments. Compared with the current strategy, the performance of the proposed frame is improved by 10% to 20%.
KW - Autoscale
KW - Distributed computing
KW - Heterogeneous task
KW - Load balance
KW - Task scheduling
UR - http://www.scopus.com/inward/record.url?scp=85126992421&partnerID=8YFLogxK
U2 - 10.13328/j.cnki.jos.006451
DO - 10.13328/j.cnki.jos.006451
M3 - 文章
AN - SCOPUS:85126992421
SN - 1000-9825
VL - 33
SP - 1005
EP - 1017
JO - Ruan Jian Xue Bao/Journal of Software
JF - Ruan Jian Xue Bao/Journal of Software
IS - 3
ER -