All-in-one: Graph processing in RDBMSs revisited

Kangfei Zhao, Jeffrey Xu Yu

科研成果: 书/报告/会议事项章节会议稿件同行评审

33 引用 (Scopus)

摘要

To support analytics on massive graphs such as online social networks, RDF, Semantic Web, etc. many new graph algorithms are designed to query graphs for a specific problem, and many distributed graph processing systems are developed to support graph querying by programming. In this paper, we focus on RDBMS, which has been well studied over decades to manage large datasets, and we revisit the issue how RDBMS can support graph processing at the SQL level. Our work is motivated by the fact that there are many relations stored in RDBMS that are closely related to a graph in real applications and need to be used together to query the graph, and RDBMS is a system that can query and manage data while data may be updated over time. To support graph processing, in this work, we propose 4 new relational algebra operations, MM-join, MV-join, anti-join, and union-by-update. Here, MM-join and MV-join are join operations between two matrices and between a matrix and a vector, respectively, followed by aggregation computing over groups, given a matrix/vector can be represented by a relation. Both deal with the semiring by which many graph algorithms can be supported. The anti-join removes nodes/edges in a graph when they are unnecessary for the following computing. The union-by-update addresses value updates to compute PageRank, for example. The 4 new relational algebra operations can be defined by the 6 basic relational algebra operations with group-by & aggregation. We revisit SQL recursive queries and show that the 4 operations with others are ensured to have a fixpoint, following the techniques studied in DATALOG, and enhance the recursive with clause in SQL'99. We conduct extensive performance studies to test 10 graph algorithms using 9 large real graphs in 3 major RDBMSs. We show that RDBMSs are capable of dealing with graph processing in reasonable time. The focus of this work is at SQL level. There is high potential to improve the efficiency by main-memory RDBMSs, efficient join processing in parallel, and new storage management.

源语言英语
主期刊名SIGMOD 2017 - Proceedings of the 2017 ACM International Conference on Management of Data
出版商Association for Computing Machinery
1165-1180
页数16
ISBN(电子版)9781450341974
DOI
出版状态已出版 - 9 5月 2017
已对外发布
活动2017 ACM SIGMOD International Conference on Management of Data, SIGMOD 2017 - Chicago, 美国
期限: 14 5月 201719 5月 2017

出版系列

姓名Proceedings of the ACM SIGMOD International Conference on Management of Data
Part F127746
ISSN(印刷版)0730-8078

会议

会议2017 ACM SIGMOD International Conference on Management of Data, SIGMOD 2017
国家/地区美国
Chicago
时期14/05/1719/05/17

指纹

探究 'All-in-one: Graph processing in RDBMSs revisited' 的科研主题。它们共同构成独一无二的指纹。

引用此