A work-stealing scheduling framework supporting fault tolerance

Yizhuo Wang, Weixing Ji, Feng Shi, Qi Zuo

科研成果: 书/报告/会议事项章节会议稿件同行评审

8 引用 (Scopus)

摘要

Fault tolerance and load balancing are critical points for executing long-running parallel applications on multicore clusters. This paper addresses both fault tolerance and load balancing on multicore clusters by presenting a novel work-stealing task scheduling framework which supports hardware fault tolerance. In this framework, both transient and permanent faults are detected and recovered at task granularity. We incorporate task-based fault detection and recovery mechanisms into a hierarchical work-stealing scheme to establish the framework. This framework provides low-overhead fault-tolerance and optimal load balancing by fully exploiting task parallelism.

源语言英语
主期刊名Proceedings - Design, Automation and Test in Europe, DATE 2013
出版商Institute of Electrical and Electronics Engineers Inc.
695-700
页数6
ISBN(印刷版)9783981537000
DOI
出版状态已出版 - 2013
活动16th Design, Automation and Test in Europe Conference and Exhibition, DATE 2013 - Grenoble, 法国
期限: 18 3月 201322 3月 2013

出版系列

姓名Proceedings -Design, Automation and Test in Europe, DATE
ISSN(印刷版)1530-1591

会议

会议16th Design, Automation and Test in Europe Conference and Exhibition, DATE 2013
国家/地区法国
Grenoble
时期18/03/1322/03/13

指纹

探究 'A work-stealing scheduling framework supporting fault tolerance' 的科研主题。它们共同构成独一无二的指纹。

引用此