TY - JOUR
T1 - ByteGAP
T2 - Joint Workshops at the 49th International Conference on Very Large Data Bases, VLDBW 2023
AU - Cheng, Miaomiao
AU - Chen, Jiujian
AU - Zhao, Cheng
AU - Chen, Cheng
AU - Hu, Yongmin
AU - Cong, Xiaoliang
AU - Qin, Liang
AU - Lin, Hexiang
AU - Li, Rong Hua
AU - Wang, Guoren
AU - Zhang, Shuai
AU - Zhang, Lei
N1 - Publisher Copyright:
© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
PY - 2023
Y1 - 2023
N2 - Graph computing systems play a critical role in a variety of industrial applications. This study examines ByteDance's graph computing system workload, which challenges the conventional notion of a one-shot, lightweight graph computing task that can scale to trillions of edges. The workload includes both small and large-scale tasks separated by a 1000-second runtime threshold. The majority of the workload is dominated by small-scale tasks submitted arbitrarily, but with high time-sensitive requirements. Large-scale tasks make up the bulk of computing resources and occur periodically. Therefore, the graph computing system must be capable of pausing running tasks and prioritizing more critical ones. In this paper, we introduce ByteGAP, a non-continuous graph computing system that leverages PMEM's unique features, such as durability, byte-addressability, memory-like access, lower latency, and high capacity. The non-continuous approach uses checkpointing mechanisms to achieve effective fault detection and recovery. ByteGAP provides two key contributions: (1) lightweight distributed checkpointing based on PMEM, (2) efficient dual-mode PMEM management for optimizing PMEM read and write operations. Moreover, we present a comprehensive evaluation method that demonstrates the system's ability to handle the challenges associated with large-scale computing tasks. The findings lay the foundation for future research in distributed graph computing systems and advocate for a non-continuous approach to graph computing.
AB - Graph computing systems play a critical role in a variety of industrial applications. This study examines ByteDance's graph computing system workload, which challenges the conventional notion of a one-shot, lightweight graph computing task that can scale to trillions of edges. The workload includes both small and large-scale tasks separated by a 1000-second runtime threshold. The majority of the workload is dominated by small-scale tasks submitted arbitrarily, but with high time-sensitive requirements. Large-scale tasks make up the bulk of computing resources and occur periodically. Therefore, the graph computing system must be capable of pausing running tasks and prioritizing more critical ones. In this paper, we introduce ByteGAP, a non-continuous graph computing system that leverages PMEM's unique features, such as durability, byte-addressability, memory-like access, lower latency, and high capacity. The non-continuous approach uses checkpointing mechanisms to achieve effective fault detection and recovery. ByteGAP provides two key contributions: (1) lightweight distributed checkpointing based on PMEM, (2) efficient dual-mode PMEM management for optimizing PMEM read and write operations. Moreover, we present a comprehensive evaluation method that demonstrates the system's ability to handle the challenges associated with large-scale computing tasks. The findings lay the foundation for future research in distributed graph computing systems and advocate for a non-continuous approach to graph computing.
KW - graph
KW - non-continuous graph processing
KW - persistent memory
UR - http://www.scopus.com/inward/record.url?scp=85171288216&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85171288216
SN - 1613-0073
VL - 3462
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
Y2 - 28 August 2023 through 1 September 2023
ER -