RGraph: 基于RDMA的高效分布式图数据处理系统

Peng Jie Cui, Ye Yuan*, Cen Hao Li, Can Zhang, Guo Ren Wang

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

5 引用 (Scopus)

摘要

Graph is a significant data structure which describes the relationship between entries, and it is widely used in information science, physics, biology, environmental ecology and other scientific fields. Nowadays, with the growing magnitude of graph data, processing large-scale graph data using distributed system has become the popular, many specialized distributed systems, including Pregel, GraphX, PowerGraph, and Gemini have been proposed. However, compared with the current state-of-the-art shared-memory graph processing systems, these specialized distributed graph processing systems do not deliver satisfactory or stable performance advantages in processing real-world graph datasets. Several representative distributed graph processing systems are analyzed, and the major challenges that affect their performance are summarized. This study proposes RGraph, an effective distributed graph processing system based on RDMA. The key idea of RGraph is improving performance on top of making full use of the advantages of RDMA. For graph partition, RGraph adopts chunk-based partition to avoid destroying the native locality of the real-world graph, so as to ensure the locality-preserving vertex accesses. For workload, RGraph proposes a task migration mechanism based on RDMA one-side READ and a fine-grained task preemption method among threads to ensure the dynamic load balance for inter-node and intra-node, so that all computing resources can be fully utilized. For communication, RGraph effectively encapsulates IB verbs and implements a concurrent RDMA communication stack satisfied graph computing semantics. Compared with traditional MPI, RGraph’s communication stack can reduce the latency up to 2.1 times for servers’ communication. Finally, five real-world large-scale graph datasets and one synthetic dataset are used to evaluation RGraph on an HPC cluster with eight servers, and the experiment shows that RGraph has obvious performance advantages. Compared with Powergraph, RGraph has 10.1-16.8 times performance improvement. And compared with the existing state-of-the-art CPU- based distributed graph processing system, RGraph still has 2.89-5.12 times performance improvement. Meanwhile, RGraph can still guarantee stable performance advantage on extremely skewed power-law graph.

投稿的翻译标题RGraph: Effective Distributed Graph Data Processing System Based on RDMA
源语言繁体中文
页(从-至)1018-1042
页数25
期刊Ruan Jian Xue Bao/Journal of Software
33
3
DOI
出版状态已出版 - 3月 2022

关键词

  • Distributed
  • Dynamic load balance
  • Graph processing system
  • High performance
  • RDMA
  • RDMA communication model

指纹

探究 'RGraph: 基于RDMA的高效分布式图数据处理系统' 的科研主题。它们共同构成独一无二的指纹。

引用此