Exploiting decoding computational locality to improve the I/O performance of an XOR-coded storage cluster under concurrent failures

Shiyi Li, Shenggang Wan, Di Chen, Qiang Cao, Changsheng Xie, Xubin He, Yuhua Guo, Ping Huang

科研成果: 书/报告/会议事项章节会议稿件同行评审

9 引用 (Scopus)

摘要

In today's large data centers, hundreds to thousands of nodes are deployed as storage clusters to provide cloud and big data storage service, where failures are not rare. Therefore, efficient data redundancy technologies are needed to ensure data availability and reliability. Compared to traditional technology based on replication, erasure codes which tolerate multiple failures provide availability and reliability at a much lower cost. However, those erasure-coded, particularly XOR-coded storage clusters, suffer from performance problem caused by degraded reads under concurrent node failures. With the traditional centralized decoding method, a large amount of extra data has to be transmitted over the network to service degraded reads. In particular, the degraded reads in XOR-coded stripes with concurrent failures result in notably high network traffic. To address this problem, we propose a novel decoding approach called Local Decoding First or LDF for short. Via exploiting decoding computational locality of XOR-coded storage clusters, LDF significantly reduces the required network traffic and hence reduces the access latency of degraded reads, thus improving I/O throughput. A prototype of LDF with two typical XOR codes has been implemented in the popular distributed file system HDFS on a storage cluster composed of 40 nodes. The experimental results show that LDF dramatically reduces the network traffic under concurrent node failures and thus improves both the I/O throughput and access latency.

源语言英语
主期刊名Proceedings - 2014 IEEE 33rd International Symposium on Reliable Distributed Systems, SRDS 2014
出版商IEEE Computer Society
125-135
页数11
ISBN(电子版)9781479955848
DOI
出版状态已出版 - 2014
已对外发布
活动33rd IEEE International Symposium on Reliable Distributed Systems, SRDS 2014 - Nara, 日本
期限: 6 10月 20149 10月 2014

出版系列

姓名Proceedings of the IEEE Symposium on Reliable Distributed Systems
2014-January
ISSN(印刷版)1060-9857

会议

会议33rd IEEE International Symposium on Reliable Distributed Systems, SRDS 2014
国家/地区日本
Nara
时期6/10/149/10/14

指纹

探究 'Exploiting decoding computational locality to improve the I/O performance of an XOR-coded storage cluster under concurrent failures' 的科研主题。它们共同构成独一无二的指纹。

引用此