I/O efficient: Computing sccs in massive graphs

  • Zhiwei Zhang
  • , Jeffrey Xu Yu
  • , Lu Qin
  • , Lijun Chang
  • , Xuemin Lin

科研成果: 书/报告/会议事项章节会议稿件同行评审

18 引用 (Scopus)

摘要

A strongly connected component (SCC) is a maximal subgraph of a directed graph G in which every pair of nodes are reachable from each other in the SCC. With such a property, a general directed graph can be represented by a directed acyclic graph (DAG) by contracting an SCC of G to a node in DAG. In many real applications that need graph pattern matching, topological sorting, or reachability query processing, the best way to deal with a general directed graph is to deal with its DAG representation. Therefore, finding all SCCs in a directed graph G is a critical operation. The existing in-memory algorithms based on depth first search (DFS) can find all SCCs in linear time w.r.t. the size of a graph. However, when a graph cannot resident entirely in the main memory, the existing external or semi-external algorithms to find all SCCs have limitation to achieve high I/O efficiency. In this paper, we study new I/O efficient semi-external algorithms to find all SCCs for a massive directed graph G that cannot reside in main memory entirely. To overcome the deficiency of the existing DFSbased semi-external algorithm that heavily relies on a total order, we explore a weak order based on which we investigate new algorithms. We propose a new two phase algorithm, namely, tree construction and tree search. In the tree construction phase, a spanning tree of G can be constructed in bounded sequential scans of G. In the tree search phase, it needs to sequentially scan the graph once to find all SCCs. In addition, we propose a new single phase algorithm, which combines the tree construction and tree search phases into a single phase, with three new optimization techniques. They are early acceptance, early rejection, and batch processing. By the single phase algorithm with the new optimization techniques, we can significantly reduce the number of I/Os and CPU cost. We conduct extensive experimental studies using 4 real datasets including a massive real dataset, and several synthetic datasets to confirm the I/O efficiency of our approaches.

源语言英语
主期刊名SIGMOD 2013 - International Conference on Management of Data
出版商Association for Computing Machinery
181-192
页数12
ISBN(印刷版)9781450320375
DOI
出版状态已出版 - 2013
已对外发布
活动2013 ACM SIGMOD Conference on Management of Data, SIGMOD 2013 - New York, NY, 美国
期限: 22 6月 201327 6月 2013

出版系列

姓名Proceedings of the ACM SIGMOD International Conference on Management of Data
ISSN(印刷版)0730-8078

会议

会议2013 ACM SIGMOD Conference on Management of Data, SIGMOD 2013
国家/地区美国
New York, NY
时期22/06/1327/06/13

指纹

探究 'I/O efficient: Computing sccs in massive graphs' 的科研主题。它们共同构成独一无二的指纹。

引用此