Fast Repair for Single Failure in Erasure Coding-Based Distributed Storage Systems

Huayu Zhang, Hui Li, Bing Zhu, Jun Chen
{"title":"Fast Repair for Single Failure in Erasure Coding-Based Distributed Storage Systems","authors":"Huayu Zhang, Hui Li, Bing Zhu, Jun Chen","doi":"10.1109/SRDS.2014.21","DOIUrl":null,"url":null,"abstract":"In order to guarantee data reliability in distributed storage systems, erasure codes are widely used for the desirable storage properties. Nevertheless, the codes have one drawback that overmuch data are needed to repair a failure, resulting in both large bandwidth consuming in the network and high calculation pressure on the replacement node. For repair bandwidth problem, researchers derive the tradeoffs between storage and repair traffic from network coding and propose regenerating codes. However, the constructions of regenerating codes complicate the systems as well as recovery calculation. Hence, this paper proposes a distributed repair method based on general erasure codes to mitigate the burden of both recovery computation and network traffic. We observe that distributing recovery computation among helpers can distract the whole calculation procedure and accelerate repair speed in practical systems. Furthermore, by combining this technique with network topology, we introduce a novel repair tree to minimize repair traffic. Repair tree is also derived from network coding. The performance of the repair tree is preliminarily analyzed and evaluated, which infers that the storage-bandwidth bound of regenerating codes can be broken under this model.","PeriodicalId":440331,"journal":{"name":"2014 IEEE 33rd International Symposium on Reliable Distributed Systems","volume":"352 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 33rd International Symposium on Reliable Distributed Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SRDS.2014.21","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

In order to guarantee data reliability in distributed storage systems, erasure codes are widely used for the desirable storage properties. Nevertheless, the codes have one drawback that overmuch data are needed to repair a failure, resulting in both large bandwidth consuming in the network and high calculation pressure on the replacement node. For repair bandwidth problem, researchers derive the tradeoffs between storage and repair traffic from network coding and propose regenerating codes. However, the constructions of regenerating codes complicate the systems as well as recovery calculation. Hence, this paper proposes a distributed repair method based on general erasure codes to mitigate the burden of both recovery computation and network traffic. We observe that distributing recovery computation among helpers can distract the whole calculation procedure and accelerate repair speed in practical systems. Furthermore, by combining this technique with network topology, we introduce a novel repair tree to minimize repair traffic. Repair tree is also derived from network coding. The performance of the repair tree is preliminarily analyzed and evaluated, which infers that the storage-bandwidth bound of regenerating codes can be broken under this model.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于Erasure编码的分布式存储系统单次故障快速修复
为了保证分布式存储系统中数据的可靠性,擦除码被广泛地应用于分布式存储系统中。但是,这种代码有一个缺点,即修复故障需要大量的数据,导致网络带宽消耗大,替换节点的计算压力大。针对修复带宽问题,研究人员从网络编码中推导出存储和修复流量的权衡,并提出再生码。然而,再生码的构造使系统和恢复计算变得复杂。因此,本文提出了一种基于通用擦除码的分布式修复方法,以减轻恢复计算和网络流量的负担。研究发现,在实际系统中,将恢复计算分配给辅助程序可以分散整个计算过程,加快修复速度。此外,通过将该技术与网络拓扑结构相结合,我们引入了一种新的修复树来最小化修复流量。修复树也是由网络编码衍生而来。对修复树的性能进行了初步分析和评价,得出在该模型下可突破再生码的存储带宽界限。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Modeling Reliability Requirements in Coordinated Node and Link Mapping Fast Repair for Single Failure in Erasure Coding-Based Distributed Storage Systems A Distributed NameNode Cluster for a Highly-Available Hadoop Distributed File System A Convex Hull Query Processing Method in MANETs LO-FA-MO: Fault Detection and Systemic Awareness for the QUonG Computing System
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1