摘要
Fault tolerance and load balancing are critical points for executing long-running parallel applications on multicore clusters. This paper addresses both fault tolerance and load balancing on multicore clusters by presenting a novel work-stealing task scheduling framework which supports hardware fault tolerance. In this framework, both transient and permanent faults are detected and recovered at task granularity. We incorporate task-based fault detection and recovery mechanisms into a hierarchical work-stealing scheme to establish the framework. This framework provides low-overhead fault-tolerance and optimal load balancing by fully exploiting task parallelism.
源语言 | 英语 |
---|---|
主期刊名 | Proceedings - Design, Automation and Test in Europe, DATE 2013 |
出版商 | Institute of Electrical and Electronics Engineers Inc. |
页 | 695-700 |
页数 | 6 |
ISBN(印刷版) | 9783981537000 |
DOI | |
出版状态 | 已出版 - 2013 |
活动 | 16th Design, Automation and Test in Europe Conference and Exhibition, DATE 2013 - Grenoble, 法国 期限: 18 3月 2013 → 22 3月 2013 |
出版系列
姓名 | Proceedings -Design, Automation and Test in Europe, DATE |
---|---|
ISSN(印刷版) | 1530-1591 |
会议
会议 | 16th Design, Automation and Test in Europe Conference and Exhibition, DATE 2013 |
---|---|
国家/地区 | 法国 |
市 | Grenoble |
时期 | 18/03/13 → 22/03/13 |
指纹
探究 'A work-stealing scheduling framework supporting fault tolerance' 的科研主题。它们共同构成独一无二的指纹。引用此
Wang, Y., Ji, W., Shi, F., & Zuo, Q. (2013). A work-stealing scheduling framework supporting fault tolerance. 在 Proceedings - Design, Automation and Test in Europe, DATE 2013 (页码 695-700). 文章 6513596 (Proceedings -Design, Automation and Test in Europe, DATE). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.7873/date.2013.150