A work-stealing scheduling framework supporting fault tolerance

Yizhuo Wang, Weixing Ji, Feng Shi, Qi Zuo

科研成果: 书/报告/会议事项章节会议稿件同行评审

8 引用 (Scopus)
Plum Print visual indicator of research metrics
  • Citations
    • Citation Indexes: 8
  • Captures
    • Readers: 15
see details

摘要

Fault tolerance and load balancing are critical points for executing long-running parallel applications on multicore clusters. This paper addresses both fault tolerance and load balancing on multicore clusters by presenting a novel work-stealing task scheduling framework which supports hardware fault tolerance. In this framework, both transient and permanent faults are detected and recovered at task granularity. We incorporate task-based fault detection and recovery mechanisms into a hierarchical work-stealing scheme to establish the framework. This framework provides low-overhead fault-tolerance and optimal load balancing by fully exploiting task parallelism.

源语言英语
主期刊名Proceedings - Design, Automation and Test in Europe, DATE 2013
出版商Institute of Electrical and Electronics Engineers Inc.
695-700
页数6
ISBN(印刷版)9783981537000
DOI
出版状态已出版 - 2013
活动16th Design, Automation and Test in Europe Conference and Exhibition, DATE 2013 - Grenoble, 法国
期限: 18 3月 201322 3月 2013

出版系列

姓名Proceedings -Design, Automation and Test in Europe, DATE
ISSN(印刷版)1530-1591

会议

会议16th Design, Automation and Test in Europe Conference and Exhibition, DATE 2013
国家/地区法国
Grenoble
时期18/03/1322/03/13

指纹

探究 'A work-stealing scheduling framework supporting fault tolerance' 的科研主题。它们共同构成独一无二的指纹。

引用此

Wang, Y., Ji, W., Shi, F., & Zuo, Q. (2013). A work-stealing scheduling framework supporting fault tolerance. 在 Proceedings - Design, Automation and Test in Europe, DATE 2013 (页码 695-700). 文章 6513596 (Proceedings -Design, Automation and Test in Europe, DATE). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.7873/date.2013.150