{"title":"Fast Repair for Single Failure in Erasure Coding-Based Distributed Storage Systems","authors":"Huayu Zhang, Hui Li, Bing Zhu, Jun Chen","doi":"10.1109/SRDS.2014.21","DOIUrl":null,"url":null,"abstract":"In order to guarantee data reliability in distributed storage systems, erasure codes are widely used for the desirable storage properties. Nevertheless, the codes have one drawback that overmuch data are needed to repair a failure, resulting in both large bandwidth consuming in the network and high calculation pressure on the replacement node. For repair bandwidth problem, researchers derive the tradeoffs between storage and repair traffic from network coding and propose regenerating codes. However, the constructions of regenerating codes complicate the systems as well as recovery calculation. Hence, this paper proposes a distributed repair method based on general erasure codes to mitigate the burden of both recovery computation and network traffic. We observe that distributing recovery computation among helpers can distract the whole calculation procedure and accelerate repair speed in practical systems. Furthermore, by combining this technique with network topology, we introduce a novel repair tree to minimize repair traffic. Repair tree is also derived from network coding. The performance of the repair tree is preliminarily analyzed and evaluated, which infers that the storage-bandwidth bound of regenerating codes can be broken under this model.","PeriodicalId":440331,"journal":{"name":"2014 IEEE 33rd International Symposium on Reliable Distributed Systems","volume":"352 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 33rd International Symposium on Reliable Distributed Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SRDS.2014.21","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
In order to guarantee data reliability in distributed storage systems, erasure codes are widely used for the desirable storage properties. Nevertheless, the codes have one drawback that overmuch data are needed to repair a failure, resulting in both large bandwidth consuming in the network and high calculation pressure on the replacement node. For repair bandwidth problem, researchers derive the tradeoffs between storage and repair traffic from network coding and propose regenerating codes. However, the constructions of regenerating codes complicate the systems as well as recovery calculation. Hence, this paper proposes a distributed repair method based on general erasure codes to mitigate the burden of both recovery computation and network traffic. We observe that distributing recovery computation among helpers can distract the whole calculation procedure and accelerate repair speed in practical systems. Furthermore, by combining this technique with network topology, we introduce a novel repair tree to minimize repair traffic. Repair tree is also derived from network coding. The performance of the repair tree is preliminarily analyzed and evaluated, which infers that the storage-bandwidth bound of regenerating codes can be broken under this model.