ByteGAP: A Non-continuous Distributed Graph Computing System using Persistent Memory

Miaomiao Cheng; Jiujian Chen; Cheng Zhao; Cheng Chen; Yongmin Hu; Xiaoliang Cong; Liang Qin; Hexiang Lin; Rong Hua Li; Guoren Wang; Shuai Zhang; Lei Zhang

ByteGAP: A Non-continuous Distributed Graph Computing System using Persistent Memory

Miaomiao Cheng, Jiujian Chen, Cheng Zhao^*, Cheng Chen, Yongmin Hu, Xiaoliang Cong, Liang Qin, Hexiang Lin, Rong Hua Li, Guoren Wang, Shuai Zhang, Lei Zhang

^*Corresponding author for this work

School of Computer Science and Technology

Research output: Contribution to journal › Conference article › peer-review

Abstract

Graph computing systems play a critical role in a variety of industrial applications. This study examines ByteDance's graph computing system workload, which challenges the conventional notion of a one-shot, lightweight graph computing task that can scale to trillions of edges. The workload includes both small and large-scale tasks separated by a 1000-second runtime threshold. The majority of the workload is dominated by small-scale tasks submitted arbitrarily, but with high time-sensitive requirements. Large-scale tasks make up the bulk of computing resources and occur periodically. Therefore, the graph computing system must be capable of pausing running tasks and prioritizing more critical ones. In this paper, we introduce ByteGAP, a non-continuous graph computing system that leverages PMEM's unique features, such as durability, byte-addressability, memory-like access, lower latency, and high capacity. The non-continuous approach uses checkpointing mechanisms to achieve effective fault detection and recovery. ByteGAP provides two key contributions: (1) lightweight distributed checkpointing based on PMEM, (2) efficient dual-mode PMEM management for optimizing PMEM read and write operations. Moreover, we present a comprehensive evaluation method that demonstrates the system's ability to handle the challenges associated with large-scale computing tasks. The findings lay the foundation for future research in distributed graph computing systems and advocate for a non-continuous approach to graph computing.

Original language	English
Journal	CEUR Workshop Proceedings
Volume	3462
Publication status	Published - 2023
Event	Joint Workshops at the 49th International Conference on Very Large Data Bases, VLDBW 2023 - Vancouver, Canada Duration: 28 Aug 2023 → 1 Sept 2023

Keywords

graph
non-continuous graph processing
persistent memory

Cite this

Cheng, M., Chen, J., Zhao, C., Chen, C., Hu, Y., Cong, X., Qin, L., Lin, H., Li, R. H., Wang, G., Zhang, S., & Zhang, L. (2023). ByteGAP: A Non-continuous Distributed Graph Computing System using Persistent Memory. CEUR Workshop Proceedings, 3462.

@article{fd1ef560ee9c4535a983072eaa9c669d,

title = "ByteGAP: A Non-continuous Distributed Graph Computing System using Persistent Memory",

abstract = "Graph computing systems play a critical role in a variety of industrial applications. This study examines ByteDance's graph computing system workload, which challenges the conventional notion of a one-shot, lightweight graph computing task that can scale to trillions of edges. The workload includes both small and large-scale tasks separated by a 1000-second runtime threshold. The majority of the workload is dominated by small-scale tasks submitted arbitrarily, but with high time-sensitive requirements. Large-scale tasks make up the bulk of computing resources and occur periodically. Therefore, the graph computing system must be capable of pausing running tasks and prioritizing more critical ones. In this paper, we introduce ByteGAP, a non-continuous graph computing system that leverages PMEM's unique features, such as durability, byte-addressability, memory-like access, lower latency, and high capacity. The non-continuous approach uses checkpointing mechanisms to achieve effective fault detection and recovery. ByteGAP provides two key contributions: (1) lightweight distributed checkpointing based on PMEM, (2) efficient dual-mode PMEM management for optimizing PMEM read and write operations. Moreover, we present a comprehensive evaluation method that demonstrates the system's ability to handle the challenges associated with large-scale computing tasks. The findings lay the foundation for future research in distributed graph computing systems and advocate for a non-continuous approach to graph computing.",

keywords = "graph, non-continuous graph processing, persistent memory",

author = "Miaomiao Cheng and Jiujian Chen and Cheng Zhao and Cheng Chen and Yongmin Hu and Xiaoliang Cong and Liang Qin and Hexiang Lin and Li, {Rong Hua} and Guoren Wang and Shuai Zhang and Lei Zhang",

note = "Publisher Copyright: {\textcopyright} 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).; Joint Workshops at the 49th International Conference on Very Large Data Bases, VLDBW 2023 ; Conference date: 28-08-2023 Through 01-09-2023",

year = "2023",

language = "English",

volume = "3462",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR-WS",

}

TY - JOUR

T1 - ByteGAP

T2 - Joint Workshops at the 49th International Conference on Very Large Data Bases, VLDBW 2023

AU - Cheng, Miaomiao

AU - Chen, Jiujian

AU - Zhao, Cheng

AU - Chen, Cheng

AU - Hu, Yongmin

AU - Cong, Xiaoliang

AU - Qin, Liang

AU - Lin, Hexiang

AU - Li, Rong Hua

AU - Wang, Guoren

AU - Zhang, Shuai

AU - Zhang, Lei

PY - 2023

Y1 - 2023

N2 - Graph computing systems play a critical role in a variety of industrial applications. This study examines ByteDance's graph computing system workload, which challenges the conventional notion of a one-shot, lightweight graph computing task that can scale to trillions of edges. The workload includes both small and large-scale tasks separated by a 1000-second runtime threshold. The majority of the workload is dominated by small-scale tasks submitted arbitrarily, but with high time-sensitive requirements. Large-scale tasks make up the bulk of computing resources and occur periodically. Therefore, the graph computing system must be capable of pausing running tasks and prioritizing more critical ones. In this paper, we introduce ByteGAP, a non-continuous graph computing system that leverages PMEM's unique features, such as durability, byte-addressability, memory-like access, lower latency, and high capacity. The non-continuous approach uses checkpointing mechanisms to achieve effective fault detection and recovery. ByteGAP provides two key contributions: (1) lightweight distributed checkpointing based on PMEM, (2) efficient dual-mode PMEM management for optimizing PMEM read and write operations. Moreover, we present a comprehensive evaluation method that demonstrates the system's ability to handle the challenges associated with large-scale computing tasks. The findings lay the foundation for future research in distributed graph computing systems and advocate for a non-continuous approach to graph computing.

AB - Graph computing systems play a critical role in a variety of industrial applications. This study examines ByteDance's graph computing system workload, which challenges the conventional notion of a one-shot, lightweight graph computing task that can scale to trillions of edges. The workload includes both small and large-scale tasks separated by a 1000-second runtime threshold. The majority of the workload is dominated by small-scale tasks submitted arbitrarily, but with high time-sensitive requirements. Large-scale tasks make up the bulk of computing resources and occur periodically. Therefore, the graph computing system must be capable of pausing running tasks and prioritizing more critical ones. In this paper, we introduce ByteGAP, a non-continuous graph computing system that leverages PMEM's unique features, such as durability, byte-addressability, memory-like access, lower latency, and high capacity. The non-continuous approach uses checkpointing mechanisms to achieve effective fault detection and recovery. ByteGAP provides two key contributions: (1) lightweight distributed checkpointing based on PMEM, (2) efficient dual-mode PMEM management for optimizing PMEM read and write operations. Moreover, we present a comprehensive evaluation method that demonstrates the system's ability to handle the challenges associated with large-scale computing tasks. The findings lay the foundation for future research in distributed graph computing systems and advocate for a non-continuous approach to graph computing.

KW - graph

KW - non-continuous graph processing

KW - persistent memory

UR - http://www.scopus.com/inward/record.url?scp=85171288216&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85171288216

SN - 1613-0073

VL - 3462

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

Y2 - 28 August 2023 through 1 September 2023

ER -

ByteGAP: A Non-continuous Distributed Graph Computing System using Persistent Memory

Abstract

Keywords

Other files and links

Fingerprint

Cite this