Fast Failure Recovery in Vertex-Centric Distributed Graph Processing Systems

Wei Lu, Yanyan Shen, Tongtong Wang, Meihui Zhang*, H. V. Jagadish, Xiaoyong Du

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

12 引用 (Scopus)

摘要

There is a growing need for distributed graph processing systems to have many more compute nodes processing graph-based Big Data applications, which, however, increases the chance of node failures. To address the issue, we propose a novel recovery scheme to accelerate the recovery process by parallelizing the recomputation. Once a failure occurs, all recomputations are confined to subgraphs that originally reside in the failed compute nodes. When the recovery starts, these subgraphs are reassigned to another set of compute nodes, where the recomputation over these subgraphs are conducted in parallel. To minimize the recovery latency, we also develop a reassignment strategy, from these subgraphs to the replaced compute nodes, by properly leveraging the computation and communication cost. We integrate the proposed recovery scheme into Giraph system, a widely used graph processing system. The experimental results over a variety of real graph datasets demonstrate that our proposed recovery scheme outperforms existing recovery methods by up to 30x on a cluster of 40 compute nodes.

源语言英语
文章编号8371278
页(从-至)733-746
页数14
期刊IEEE Transactions on Knowledge and Data Engineering
31
4
DOI
出版状态已出版 - 1 4月 2019

指纹

探究 'Fast Failure Recovery in Vertex-Centric Distributed Graph Processing Systems' 的科研主题。它们共同构成独一无二的指纹。

引用此