All-in-one: Graph processing in RDBMSs revisited

Kangfei Zhao; Jeffrey Xu Yu

doi:10.1145/3035918.3035943

All-in-one: Graph processing in RDBMSs revisited

Kangfei Zhao, Jeffrey Xu Yu

Chinese University of Hong Kong

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

33 Citations (Scopus)

Abstract

To support analytics on massive graphs such as online social networks, RDF, Semantic Web, etc. many new graph algorithms are designed to query graphs for a specific problem, and many distributed graph processing systems are developed to support graph querying by programming. In this paper, we focus on RDBMS, which has been well studied over decades to manage large datasets, and we revisit the issue how RDBMS can support graph processing at the SQL level. Our work is motivated by the fact that there are many relations stored in RDBMS that are closely related to a graph in real applications and need to be used together to query the graph, and RDBMS is a system that can query and manage data while data may be updated over time. To support graph processing, in this work, we propose 4 new relational algebra operations, MM-join, MV-join, anti-join, and union-by-update. Here, MM-join and MV-join are join operations between two matrices and between a matrix and a vector, respectively, followed by aggregation computing over groups, given a matrix/vector can be represented by a relation. Both deal with the semiring by which many graph algorithms can be supported. The anti-join removes nodes/edges in a graph when they are unnecessary for the following computing. The union-by-update addresses value updates to compute PageRank, for example. The 4 new relational algebra operations can be defined by the 6 basic relational algebra operations with group-by & aggregation. We revisit SQL recursive queries and show that the 4 operations with others are ensured to have a fixpoint, following the techniques studied in DATALOG, and enhance the recursive with clause in SQL'99. We conduct extensive performance studies to test 10 graph algorithms using 9 large real graphs in 3 major RDBMSs. We show that RDBMSs are capable of dealing with graph processing in reasonable time. The focus of this work is at SQL level. There is high potential to improve the efficiency by main-memory RDBMSs, efficient join processing in parallel, and new storage management.

Original language	English
Title of host publication	SIGMOD 2017 - Proceedings of the 2017 ACM International Conference on Management of Data
Publisher	Association for Computing Machinery
Pages	1165-1180
Number of pages	16
ISBN (Electronic)	9781450341974
DOIs	https://doi.org/10.1145/3035918.3035943
Publication status	Published - 9 May 2017
Externally published	Yes
Event	2017 ACM SIGMOD International Conference on Management of Data, SIGMOD 2017 - Chicago, United States Duration: 14 May 2017 → 19 May 2017

Publication series

Name	Proceedings of the ACM SIGMOD International Conference on Management of Data
Volume	Part F127746
ISSN (Print)	0730-8078

Conference

Conference	2017 ACM SIGMOD International Conference on Management of Data, SIGMOD 2017
Country/Territory	United States
City	Chicago
Period	14/05/17 → 19/05/17

Access to Document

10.1145/3035918.3035943

Cite this

@inproceedings{ca752c27011e4502a14d81fc20c260e3,

title = "All-in-one: Graph processing in RDBMSs revisited",

abstract = "To support analytics on massive graphs such as online social networks, RDF, Semantic Web, etc. many new graph algorithms are designed to query graphs for a specific problem, and many distributed graph processing systems are developed to support graph querying by programming. In this paper, we focus on RDBMS, which has been well studied over decades to manage large datasets, and we revisit the issue how RDBMS can support graph processing at the SQL level. Our work is motivated by the fact that there are many relations stored in RDBMS that are closely related to a graph in real applications and need to be used together to query the graph, and RDBMS is a system that can query and manage data while data may be updated over time. To support graph processing, in this work, we propose 4 new relational algebra operations, MM-join, MV-join, anti-join, and union-by-update. Here, MM-join and MV-join are join operations between two matrices and between a matrix and a vector, respectively, followed by aggregation computing over groups, given a matrix/vector can be represented by a relation. Both deal with the semiring by which many graph algorithms can be supported. The anti-join removes nodes/edges in a graph when they are unnecessary for the following computing. The union-by-update addresses value updates to compute PageRank, for example. The 4 new relational algebra operations can be defined by the 6 basic relational algebra operations with group-by & aggregation. We revisit SQL recursive queries and show that the 4 operations with others are ensured to have a fixpoint, following the techniques studied in DATALOG, and enhance the recursive with clause in SQL'99. We conduct extensive performance studies to test 10 graph algorithms using 9 large real graphs in 3 major RDBMSs. We show that RDBMSs are capable of dealing with graph processing in reasonable time. The focus of this work is at SQL level. There is high potential to improve the efficiency by main-memory RDBMSs, efficient join processing in parallel, and new storage management.",

author = "Kangfei Zhao and Yu, {Jeffrey Xu}",

note = "Publisher Copyright: {\textcopyright} 2017 ACM.; 2017 ACM SIGMOD International Conference on Management of Data, SIGMOD 2017 ; Conference date: 14-05-2017 Through 19-05-2017",

year = "2017",

month = may,

day = "9",

doi = "10.1145/3035918.3035943",

language = "English",

series = "Proceedings of the ACM SIGMOD International Conference on Management of Data",

publisher = "Association for Computing Machinery",

pages = "1165--1180",

booktitle = "SIGMOD 2017 - Proceedings of the 2017 ACM International Conference on Management of Data",

}

Zhao, K & Yu, JX 2017, All-in-one: Graph processing in RDBMSs revisited. in SIGMOD 2017 - Proceedings of the 2017 ACM International Conference on Management of Data. Proceedings of the ACM SIGMOD International Conference on Management of Data, vol. Part F127746, Association for Computing Machinery, pp. 1165-1180, 2017 ACM SIGMOD International Conference on Management of Data, SIGMOD 2017, Chicago, United States, 14/05/17. https://doi.org/10.1145/3035918.3035943

All-in-one: Graph processing in RDBMSs revisited. / Zhao, Kangfei; Yu, Jeffrey Xu.
SIGMOD 2017 - Proceedings of the 2017 ACM International Conference on Management of Data. Association for Computing Machinery, 2017. p. 1165-1180 (Proceedings of the ACM SIGMOD International Conference on Management of Data; Vol. Part F127746).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - All-in-one

T2 - 2017 ACM SIGMOD International Conference on Management of Data, SIGMOD 2017

AU - Zhao, Kangfei

AU - Yu, Jeffrey Xu

PY - 2017/5/9

Y1 - 2017/5/9

N2 - To support analytics on massive graphs such as online social networks, RDF, Semantic Web, etc. many new graph algorithms are designed to query graphs for a specific problem, and many distributed graph processing systems are developed to support graph querying by programming. In this paper, we focus on RDBMS, which has been well studied over decades to manage large datasets, and we revisit the issue how RDBMS can support graph processing at the SQL level. Our work is motivated by the fact that there are many relations stored in RDBMS that are closely related to a graph in real applications and need to be used together to query the graph, and RDBMS is a system that can query and manage data while data may be updated over time. To support graph processing, in this work, we propose 4 new relational algebra operations, MM-join, MV-join, anti-join, and union-by-update. Here, MM-join and MV-join are join operations between two matrices and between a matrix and a vector, respectively, followed by aggregation computing over groups, given a matrix/vector can be represented by a relation. Both deal with the semiring by which many graph algorithms can be supported. The anti-join removes nodes/edges in a graph when they are unnecessary for the following computing. The union-by-update addresses value updates to compute PageRank, for example. The 4 new relational algebra operations can be defined by the 6 basic relational algebra operations with group-by & aggregation. We revisit SQL recursive queries and show that the 4 operations with others are ensured to have a fixpoint, following the techniques studied in DATALOG, and enhance the recursive with clause in SQL'99. We conduct extensive performance studies to test 10 graph algorithms using 9 large real graphs in 3 major RDBMSs. We show that RDBMSs are capable of dealing with graph processing in reasonable time. The focus of this work is at SQL level. There is high potential to improve the efficiency by main-memory RDBMSs, efficient join processing in parallel, and new storage management.

AB - To support analytics on massive graphs such as online social networks, RDF, Semantic Web, etc. many new graph algorithms are designed to query graphs for a specific problem, and many distributed graph processing systems are developed to support graph querying by programming. In this paper, we focus on RDBMS, which has been well studied over decades to manage large datasets, and we revisit the issue how RDBMS can support graph processing at the SQL level. Our work is motivated by the fact that there are many relations stored in RDBMS that are closely related to a graph in real applications and need to be used together to query the graph, and RDBMS is a system that can query and manage data while data may be updated over time. To support graph processing, in this work, we propose 4 new relational algebra operations, MM-join, MV-join, anti-join, and union-by-update. Here, MM-join and MV-join are join operations between two matrices and between a matrix and a vector, respectively, followed by aggregation computing over groups, given a matrix/vector can be represented by a relation. Both deal with the semiring by which many graph algorithms can be supported. The anti-join removes nodes/edges in a graph when they are unnecessary for the following computing. The union-by-update addresses value updates to compute PageRank, for example. The 4 new relational algebra operations can be defined by the 6 basic relational algebra operations with group-by & aggregation. We revisit SQL recursive queries and show that the 4 operations with others are ensured to have a fixpoint, following the techniques studied in DATALOG, and enhance the recursive with clause in SQL'99. We conduct extensive performance studies to test 10 graph algorithms using 9 large real graphs in 3 major RDBMSs. We show that RDBMSs are capable of dealing with graph processing in reasonable time. The focus of this work is at SQL level. There is high potential to improve the efficiency by main-memory RDBMSs, efficient join processing in parallel, and new storage management.

UR - http://www.scopus.com/inward/record.url?scp=85021199142&partnerID=8YFLogxK

U2 - 10.1145/3035918.3035943

DO - 10.1145/3035918.3035943

M3 - Conference contribution

AN - SCOPUS:85021199142

T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data

SP - 1165

EP - 1180

BT - SIGMOD 2017 - Proceedings of the 2017 ACM International Conference on Management of Data

PB - Association for Computing Machinery

Y2 - 14 May 2017 through 19 May 2017

ER -

All-in-one: Graph processing in RDBMSs revisited

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this