Task-based parallel programming model supporting fault tolerance

Yi Zhuo Wang*, Xu Chen, Wei Xing Ji, Yan Su, Xiao Jun Wang, Feng Shi

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Task-Based parallel programming model has become the mainstream parallel programming model to improve the performance of parallel computer systems by exploiting task parallelism. This paper presents a novel task-based parallel programming model which supports hardware fault tolerance. This model incorporates fault tolerance mechanisms into the task-based parallel programming model and aim to improve system performance and reliability. It uses task as the basic unit of scheduling, execution, fault detection and recovery, and supports fault tolerance in the application level. A buffer-commit computation model is used for transient fault tolerance and application-level diskless checkpointing technique is employed for permanent fault tolerance. A work-stealing scheduling scheme supporting fault tolerance is adopted to achieve dynamic load balancing. Experimental results show that the proposed model provides hardware fault tolerance with low performance overhead.

Original languageEnglish
Pages (from-to)1789-1804
Number of pages16
JournalRuan Jian Xue Bao/Journal of Software
Volume27
Issue number7
DOIs
Publication statusPublished - 1 Jul 2016

Keywords

  • Fault tolerance
  • Load balancing
  • Parallel programming
  • Task parallelism
  • Work-stealing scheduling

Fingerprint

Dive into the research topics of 'Task-based parallel programming model supporting fault tolerance'. Together they form a unique fingerprint.

Cite this