宽局部可恢复码(lrc)的实际设计考虑

IF 2.1 3区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE ACM Transactions on Storage Pub Date : 2023-11-14 DOI:10.1145/3626198

Saurabh Kadekodi, Shashwat Silas, David Clausen, Arif Merchant

{"title":"宽局部可恢复码(lrc)的实际设计考虑","authors":"Saurabh Kadekodi, Shashwat Silas, David Clausen, Arif Merchant","doi":"10.1145/3626198","DOIUrl":null,"url":null,"abstract":"Most of the data in large-scale storage clusters is erasure coded. At exascale, optimizing erasure codes for low storage overhead, efficient reconstruction, and easy deployment is of critical importance. Locally recoverable codes (LRCs) have deservedly gained central importance in this field, because they can balance many of these requirements. In our work, we study wide LRCs; LRCs with large number of blocks per stripe and low storage overhead. These codes are a natural next step for practitioners to unlock higher storage savings, but they come with their own challenges. Of particular interest is their reliability, since wider stripes are prone to more simultaneous failures.We conduct a practically minded analysis of several popular and novel LRCs. We find that wide LRC reliability is a subtle phenomenon that is sensitive to several design choices, some of which are overlooked by theoreticians, and others by practitioners. Based on these insights, we construct novel LRCs called Uniform Cauchy LRCs, which show excellent performance in simulations and a 33% improvement in reliability on unavailability events observed by a wide LRC deployed in a Google storage cluster. We also show that these codes are easy to deploy in a manner that improves their robustness to common maintenance events. Along the way, we also give a remarkably simple and novel construction of distance-optimal LRCs (other constructions are also known), which may be of interest to theory-minded readers.","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"71 7","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2023-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Practical Design Considerations for Wide Locally Recoverable Codes (LRCs)\",\"authors\":\"Saurabh Kadekodi, Shashwat Silas, David Clausen, Arif Merchant\",\"doi\":\"10.1145/3626198\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Most of the data in large-scale storage clusters is erasure coded. At exascale, optimizing erasure codes for low storage overhead, efficient reconstruction, and easy deployment is of critical importance. Locally recoverable codes (LRCs) have deservedly gained central importance in this field, because they can balance many of these requirements. In our work, we study wide LRCs; LRCs with large number of blocks per stripe and low storage overhead. These codes are a natural next step for practitioners to unlock higher storage savings, but they come with their own challenges. Of particular interest is their reliability, since wider stripes are prone to more simultaneous failures.We conduct a practically minded analysis of several popular and novel LRCs. We find that wide LRC reliability is a subtle phenomenon that is sensitive to several design choices, some of which are overlooked by theoreticians, and others by practitioners. Based on these insights, we construct novel LRCs called Uniform Cauchy LRCs, which show excellent performance in simulations and a 33% improvement in reliability on unavailability events observed by a wide LRC deployed in a Google storage cluster. We also show that these codes are easy to deploy in a manner that improves their robustness to common maintenance events. Along the way, we also give a remarkably simple and novel construction of distance-optimal LRCs (other constructions are also known), which may be of interest to theory-minded readers.\",\"PeriodicalId\":49113,\"journal\":{\"name\":\"ACM Transactions on Storage\",\"volume\":\"71 7\",\"pages\":\"\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2023-11-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Storage\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3626198\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Storage","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3626198","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 1

摘要

大规模存储集群中的大部分数据都是擦除编码。在百亿亿级，优化擦除码以实现低存储开销、高效重构和易于部署是至关重要的。本地可恢复代码(lrc)在这个领域理所当然地获得了中心重要性，因为它们可以平衡许多这些需求。在我们的工作中，我们研究了广泛的lrc;每个条带具有大量块和低存储开销的lrc。这些代码是从业者解锁更高存储节省的自然下一步，但它们也带来了自己的挑战。特别令人感兴趣的是它们的可靠性，因为更宽的条纹更容易同时发生故障。我们对几种流行的和新颖的lrc进行了实际的分析。我们发现，宽LRC可靠性是一种微妙的现象，它对几种设计选择很敏感，其中一些被理论家忽视，而另一些被实践者忽视。基于这些见解，我们构建了一种新的LRC，称为统一柯西LRC，它在模拟中表现出优异的性能，在谷歌存储集群中部署的广泛LRC观察到的不可用事件的可靠性提高了33%。我们还展示了这些代码很容易部署，从而提高了它们对常见维护事件的健壮性。在此过程中，我们还给出了距离最优lrc的一个非常简单和新颖的结构(其他结构也已知)，这可能会引起有理论头脑的读者的兴趣。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Practical Design Considerations for Wide Locally Recoverable Codes (LRCs)

Most of the data in large-scale storage clusters is erasure coded. At exascale, optimizing erasure codes for low storage overhead, efficient reconstruction, and easy deployment is of critical importance. Locally recoverable codes (LRCs) have deservedly gained central importance in this field, because they can balance many of these requirements. In our work, we study wide LRCs; LRCs with large number of blocks per stripe and low storage overhead. These codes are a natural next step for practitioners to unlock higher storage savings, but they come with their own challenges. Of particular interest is their reliability, since wider stripes are prone to more simultaneous failures.

We conduct a practically minded analysis of several popular and novel LRCs. We find that wide LRC reliability is a subtle phenomenon that is sensitive to several design choices, some of which are overlooked by theoreticians, and others by practitioners. Based on these insights, we construct novel LRCs called Uniform Cauchy LRCs, which show excellent performance in simulations and a 33% improvement in reliability on unavailability events observed by a wide LRC deployed in a Google storage cluster. We also show that these codes are easy to deploy in a manner that improves their robustness to common maintenance events. Along the way, we also give a remarkably simple and novel construction of distance-optimal LRCs (other constructions are also known), which may be of interest to theory-minded readers.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Storage COMPUTER SCIENCE, HARDWARE & ARCHITECTURE-COMPUTER SCIENCE, SOFTWARE ENGINEERING

CiteScore

4.20

自引率

5.90%

发文量

审稿时长

>12 weeks

期刊介绍： The ACM Transactions on Storage (TOS) is a new journal with an intent to publish original archival papers in the area of storage and closely related disciplines. Articles that appear in TOS will tend either to present new techniques and concepts or to report novel experiences and experiments with practical systems. Storage is a broad and multidisciplinary area that comprises of network protocols, resource management, data backup, replication, recovery, devices, security, and theory of data coding, densities, and low-power. Potential synergies among these fields are expected to open up new research directions.