Fast Repair for Single Failure in Erasure Coding-Based Distributed Storage Systems

2014 IEEE 33rd International Symposium on Reliable Distributed Systems Pub Date : 2014-10-06 DOI:10.1109/SRDS.2014.21

Huayu Zhang, Hui Li, Bing Zhu, Jun Chen

{"title":"Fast Repair for Single Failure in Erasure Coding-Based Distributed Storage Systems","authors":"Huayu Zhang, Hui Li, Bing Zhu, Jun Chen","doi":"10.1109/SRDS.2014.21","DOIUrl":null,"url":null,"abstract":"In order to guarantee data reliability in distributed storage systems, erasure codes are widely used for the desirable storage properties. Nevertheless, the codes have one drawback that overmuch data are needed to repair a failure, resulting in both large bandwidth consuming in the network and high calculation pressure on the replacement node. For repair bandwidth problem, researchers derive the tradeoffs between storage and repair traffic from network coding and propose regenerating codes. However, the constructions of regenerating codes complicate the systems as well as recovery calculation. Hence, this paper proposes a distributed repair method based on general erasure codes to mitigate the burden of both recovery computation and network traffic. We observe that distributing recovery computation among helpers can distract the whole calculation procedure and accelerate repair speed in practical systems. Furthermore, by combining this technique with network topology, we introduce a novel repair tree to minimize repair traffic. Repair tree is also derived from network coding. The performance of the repair tree is preliminarily analyzed and evaluated, which infers that the storage-bandwidth bound of regenerating codes can be broken under this model.","PeriodicalId":440331,"journal":{"name":"2014 IEEE 33rd International Symposium on Reliable Distributed Systems","volume":"352 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 33rd International Symposium on Reliable Distributed Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SRDS.2014.21","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

In order to guarantee data reliability in distributed storage systems, erasure codes are widely used for the desirable storage properties. Nevertheless, the codes have one drawback that overmuch data are needed to repair a failure, resulting in both large bandwidth consuming in the network and high calculation pressure on the replacement node. For repair bandwidth problem, researchers derive the tradeoffs between storage and repair traffic from network coding and propose regenerating codes. However, the constructions of regenerating codes complicate the systems as well as recovery calculation. Hence, this paper proposes a distributed repair method based on general erasure codes to mitigate the burden of both recovery computation and network traffic. We observe that distributing recovery computation among helpers can distract the whole calculation procedure and accelerate repair speed in practical systems. Furthermore, by combining this technique with network topology, we introduce a novel repair tree to minimize repair traffic. Repair tree is also derived from network coding. The performance of the repair tree is preliminarily analyzed and evaluated, which infers that the storage-bandwidth bound of regenerating codes can be broken under this model.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于Erasure编码的分布式存储系统单次故障快速修复

为了保证分布式存储系统中数据的可靠性，擦除码被广泛地应用于分布式存储系统中。但是，这种代码有一个缺点，即修复故障需要大量的数据，导致网络带宽消耗大，替换节点的计算压力大。针对修复带宽问题，研究人员从网络编码中推导出存储和修复流量的权衡，并提出再生码。然而，再生码的构造使系统和恢复计算变得复杂。因此，本文提出了一种基于通用擦除码的分布式修复方法，以减轻恢复计算和网络流量的负担。研究发现，在实际系统中，将恢复计算分配给辅助程序可以分散整个计算过程，加快修复速度。此外，通过将该技术与网络拓扑结构相结合，我们引入了一种新的修复树来最小化修复流量。修复树也是由网络编码衍生而来。对修复树的性能进行了初步分析和评价，得出在该模型下可突破再生码的存储带宽界限。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2014 IEEE 33rd International Symposium on Reliable Distributed Systems

自引率

0.00%

发文量

期刊最新文献

Modeling Reliability Requirements in Coordinated Node and Link Mapping Fast Repair for Single Failure in Erasure Coding-Based Distributed Storage Systems A Distributed NameNode Cluster for a Highly-Available Hadoop Distributed File System A Convex Hull Query Processing Method in MANETs LO-FA-MO: Fault Detection and Systemic Awareness for the QUonG Computing System