Contract & Expand: I/O Efficient SCCs Computing

Zhiwei Zhang, Lu Qin, Jeffrey Xu Yu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Citations (Scopus)

Abstract

As an important branch of big data processing, big graph processing is becoming increasingly popular in recent years. Strongly connected component (SCC) computation is a fundamental graph operation on directed graphs, where an SCC is a maximal subgraph S of a directed graph G in which every pair of nodes is reachable from each other in S. By contracting each SCC into a node, a large general directed graph can be represented by a small directed acyclic graph (DAG). In the literature, there are I/O efficient semi-external algorithms to compute all SCCs of a graph G, by assuming that all nodes of a graph G can fit in the main memory. However, many real graphs are large and even the nodes cannot reside entirely in the main memory. In this paper, we study new I/O efficient external algorithms to find all SCCs for a directed graph G whose nodes cannot fit entirely in the main memory. To overcome the deficiency of the existing external graph contraction based approach that usually cannot stop in finite iterations, and the external DFS based approach that will generate a large number of random I/Os, we explore a new contraction-expansion based approach. In the graph contraction phase, instead of contracting the whole graph as the contraction based approach, we only contract the nodes of a graph, which are much more selective. The contraction phase stops when all nodes of the graph can fit in the main memory, such that the semi-external algorithm can be used in SCC computation. In the graph expansion phase, as the graph is expanded in the reverse order as it is contracted, the SCCs of all nodes in the graph are computed. Both graph contraction phase and graph expansion phase use only I/O efficient sequential scans and external sorts of nodes/edges in the graph. Our algorithm leverages the efficiency of the semi-external SCC computation algorithm and usually stops in a small number of iterations. We further optimize our approach by reducing the size of nodes and edges of the contracted graph in each iteration. We conduct extensive experimental studies using both real and synthetic web-scale graphs to confirm the I/O efficiency of our approaches.

Original languageEnglish
Title of host publication2014 IEEE 30th International Conference on Data Engineering, ICDE 2014
PublisherIEEE Computer Society
Pages208-219
Number of pages12
ISBN (Print)9781479925544
DOIs
Publication statusPublished - 2014
Externally publishedYes
Event30th IEEE International Conference on Data Engineering, ICDE 2014 - Chicago, IL, United States
Duration: 31 Mar 20144 Apr 2014

Publication series

NameProceedings - International Conference on Data Engineering
ISSN (Print)1084-4627

Conference

Conference30th IEEE International Conference on Data Engineering, ICDE 2014
Country/TerritoryUnited States
CityChicago, IL
Period31/03/144/04/14

Fingerprint

Dive into the research topics of 'Contract & Expand: I/O Efficient SCCs Computing'. Together they form a unique fingerprint.

Cite this