TY - GEN
T1 - I/O cost minimization
T2 - 15th International Conference on Extending Database Technology, EDBT 2012
AU - Zhang, Zhiwei
AU - Yu, Jeffrey Xu
AU - Qin, Lu
AU - Zhu, Qing
AU - Zhou, Xiaofang
PY - 2012
Y1 - 2012
N2 - Given a directed graph G, a reachability query (u, v) asks whether there exists a path from a node u to a node v in G. The existing studies support reachability queries using indexing techniques, where both the graph and the index are required to reside in main memory. However, they cannot handle reachability queries on massive graphs, when the graph and the index cannot be entirely held in memory because of the high I/O cost. In this paper, we focus on how to minimize the I/O cost when answering reachability queries on massive graphs that cannot reside entirely in memory. First, we propose a new Yes-Label scheme, as a complement of the No-Label used in GRAIL [23], to reduce the number of intermediate results generated. Second, we show how to minimize the number of I/Os using a heap-on-disk data structure when traversing a graph. We also propose new methods to partition the heap-on-disk, in order to ensure that only sequential I/Os are performed. Third, we analyze our approaches and show how to extend our approaches to answer multiple reachability queries effectively. Finally, we conducted extensive performance studies on both large synthetic and large real graphs, and confirm the efficiency of our approaches.
AB - Given a directed graph G, a reachability query (u, v) asks whether there exists a path from a node u to a node v in G. The existing studies support reachability queries using indexing techniques, where both the graph and the index are required to reside in main memory. However, they cannot handle reachability queries on massive graphs, when the graph and the index cannot be entirely held in memory because of the high I/O cost. In this paper, we focus on how to minimize the I/O cost when answering reachability queries on massive graphs that cannot reside entirely in memory. First, we propose a new Yes-Label scheme, as a complement of the No-Label used in GRAIL [23], to reduce the number of intermediate results generated. Second, we show how to minimize the number of I/Os using a heap-on-disk data structure when traversing a graph. We also propose new methods to partition the heap-on-disk, in order to ensure that only sequential I/Os are performed. Third, we analyze our approaches and show how to extend our approaches to answer multiple reachability queries effectively. Finally, we conducted extensive performance studies on both large synthetic and large real graphs, and confirm the efficiency of our approaches.
UR - http://www.scopus.com/inward/record.url?scp=84863536693&partnerID=8YFLogxK
U2 - 10.1145/2247596.2247651
DO - 10.1145/2247596.2247651
M3 - Conference contribution
AN - SCOPUS:84863536693
SN - 9781450307901
T3 - ACM International Conference Proceeding Series
SP - 468
EP - 479
BT - Advances in Database Technology - EDBT 2012
Y2 - 27 March 2012 through 30 March 2012
ER -