TY - JOUR
T1 - RGraph
T2 - 基于RDMA的高效分布式图数据处理系统
AU - Cui, Peng Jie
AU - Yuan, Ye
AU - Li, Cen Hao
AU - Zhang, Can
AU - Wang, Guo Ren
N1 - Publisher Copyright:
© Copyright 2022, Institute of Software, the Chinese Academy of Sciences. All rights reserved.
PY - 2022/3
Y1 - 2022/3
N2 - Graph is a significant data structure which describes the relationship between entries, and it is widely used in information science, physics, biology, environmental ecology and other scientific fields. Nowadays, with the growing magnitude of graph data, processing large-scale graph data using distributed system has become the popular, many specialized distributed systems, including Pregel, GraphX, PowerGraph, and Gemini have been proposed. However, compared with the current state-of-the-art shared-memory graph processing systems, these specialized distributed graph processing systems do not deliver satisfactory or stable performance advantages in processing real-world graph datasets. Several representative distributed graph processing systems are analyzed, and the major challenges that affect their performance are summarized. This study proposes RGraph, an effective distributed graph processing system based on RDMA. The key idea of RGraph is improving performance on top of making full use of the advantages of RDMA. For graph partition, RGraph adopts chunk-based partition to avoid destroying the native locality of the real-world graph, so as to ensure the locality-preserving vertex accesses. For workload, RGraph proposes a task migration mechanism based on RDMA one-side READ and a fine-grained task preemption method among threads to ensure the dynamic load balance for inter-node and intra-node, so that all computing resources can be fully utilized. For communication, RGraph effectively encapsulates IB verbs and implements a concurrent RDMA communication stack satisfied graph computing semantics. Compared with traditional MPI, RGraph’s communication stack can reduce the latency up to 2.1 times for servers’ communication. Finally, five real-world large-scale graph datasets and one synthetic dataset are used to evaluation RGraph on an HPC cluster with eight servers, and the experiment shows that RGraph has obvious performance advantages. Compared with Powergraph, RGraph has 10.1-16.8 times performance improvement. And compared with the existing state-of-the-art CPU- based distributed graph processing system, RGraph still has 2.89-5.12 times performance improvement. Meanwhile, RGraph can still guarantee stable performance advantage on extremely skewed power-law graph.
AB - Graph is a significant data structure which describes the relationship between entries, and it is widely used in information science, physics, biology, environmental ecology and other scientific fields. Nowadays, with the growing magnitude of graph data, processing large-scale graph data using distributed system has become the popular, many specialized distributed systems, including Pregel, GraphX, PowerGraph, and Gemini have been proposed. However, compared with the current state-of-the-art shared-memory graph processing systems, these specialized distributed graph processing systems do not deliver satisfactory or stable performance advantages in processing real-world graph datasets. Several representative distributed graph processing systems are analyzed, and the major challenges that affect their performance are summarized. This study proposes RGraph, an effective distributed graph processing system based on RDMA. The key idea of RGraph is improving performance on top of making full use of the advantages of RDMA. For graph partition, RGraph adopts chunk-based partition to avoid destroying the native locality of the real-world graph, so as to ensure the locality-preserving vertex accesses. For workload, RGraph proposes a task migration mechanism based on RDMA one-side READ and a fine-grained task preemption method among threads to ensure the dynamic load balance for inter-node and intra-node, so that all computing resources can be fully utilized. For communication, RGraph effectively encapsulates IB verbs and implements a concurrent RDMA communication stack satisfied graph computing semantics. Compared with traditional MPI, RGraph’s communication stack can reduce the latency up to 2.1 times for servers’ communication. Finally, five real-world large-scale graph datasets and one synthetic dataset are used to evaluation RGraph on an HPC cluster with eight servers, and the experiment shows that RGraph has obvious performance advantages. Compared with Powergraph, RGraph has 10.1-16.8 times performance improvement. And compared with the existing state-of-the-art CPU- based distributed graph processing system, RGraph still has 2.89-5.12 times performance improvement. Meanwhile, RGraph can still guarantee stable performance advantage on extremely skewed power-law graph.
KW - Distributed
KW - Dynamic load balance
KW - Graph processing system
KW - High performance
KW - RDMA
KW - RDMA communication model
UR - http://www.scopus.com/inward/record.url?scp=85126962179&partnerID=8YFLogxK
U2 - 10.13328/j.cnki.jos.006449
DO - 10.13328/j.cnki.jos.006449
M3 - 文章
AN - SCOPUS:85126962179
SN - 1000-9825
VL - 33
SP - 1018
EP - 1042
JO - Ruan Jian Xue Bao/Journal of Software
JF - Ruan Jian Xue Bao/Journal of Software
IS - 3
ER -