Archiving cold data in warehouses with clustered network coding

Proceedings of the Eleventh European Conference on Computer Systems Pub Date : 2014-04-14 DOI:10.1145/2592798.2592816

Fabien André, Anne-Marie Kermarrec, E. L. Merrer, Nicolas Le Scouarnec, G. Straub, Alexandre van Kempen

{"title":"Archiving cold data in warehouses with clustered network coding","authors":"Fabien André, Anne-Marie Kermarrec, E. L. Merrer, Nicolas Le Scouarnec, G. Straub, Alexandre van Kempen","doi":"10.1145/2592798.2592816","DOIUrl":null,"url":null,"abstract":"Modern storage systems now typically combine plain replication and erasure codes to reliably store large amount of data in datacenters. Plain replication allows a fast access to popular data, while erasure codes, e.g., Reed-Solomon codes, provide a storage-efficient alternative for archiving less popular data. Although erasure codes are now increasingly employed in real systems, they experience high overhead during maintenance, i.e., upon failures, typically requiring files to be decoded before being encoded again to repair the encoded blocks stored at the faulty node. In this paper, we propose a novel erasure code system, tailored for networked archival systems. The efficiency of our approach relies on the joint use of random codes and a clustered placement strategy. Our repair protocol leverages network coding techniques to reduce by 50% the amount of data transferred during maintenance, by repairing several cluster files simultaneously. We demonstrate both through an analysis and extensive experimental study conducted on a public testbed that our approach significantly decreases both the bandwidth overhead during the maintenance process and the time to repair lost data. We also show that using a non-systematic code does not impact the throughput, and comes only at the price of a higher CPU usage. Based on these results, we evaluate the impact of this higher CPU consumption on different configurations of data coldness by determining whether the cluster's network bandwidth dedicated to repair or CPU dedicated to decoding saturates first.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"11 1","pages":"21:1-21:14"},"PeriodicalIF":0.0000,"publicationDate":"2014-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Eleventh European Conference on Computer Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2592798.2592816","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 26

Abstract

Modern storage systems now typically combine plain replication and erasure codes to reliably store large amount of data in datacenters. Plain replication allows a fast access to popular data, while erasure codes, e.g., Reed-Solomon codes, provide a storage-efficient alternative for archiving less popular data. Although erasure codes are now increasingly employed in real systems, they experience high overhead during maintenance, i.e., upon failures, typically requiring files to be decoded before being encoded again to repair the encoded blocks stored at the faulty node. In this paper, we propose a novel erasure code system, tailored for networked archival systems. The efficiency of our approach relies on the joint use of random codes and a clustered placement strategy. Our repair protocol leverages network coding techniques to reduce by 50% the amount of data transferred during maintenance, by repairing several cluster files simultaneously. We demonstrate both through an analysis and extensive experimental study conducted on a public testbed that our approach significantly decreases both the bandwidth overhead during the maintenance process and the time to repair lost data. We also show that using a non-systematic code does not impact the throughput, and comes only at the price of a higher CPU usage. Based on these results, we evaluate the impact of this higher CPU consumption on different configurations of data coldness by determining whether the cluster's network bandwidth dedicated to repair or CPU dedicated to decoding saturates first.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用集群网络编码对仓库中的冷数据进行归档

现代存储系统现在通常结合了纯复制和擦除代码，以可靠地存储数据中心的大量数据。普通复制允许快速访问流行数据，而擦除代码，例如Reed-Solomon代码，为归档不太流行的数据提供了一种存储效率高的替代方案。尽管现在在实际系统中越来越多地使用擦除码，但在维护期间，即在出现故障时，通常需要先对文件进行解码，然后再对存储在故障节点上的编码块进行修复。在本文中，我们提出了一个新的擦除码系统，为网络化的档案系统量身定制。我们的方法的效率依赖于随机代码和聚类放置策略的联合使用。我们的修复协议利用网络编码技术，通过同时修复多个集群文件，在维护期间减少50%的数据传输量。我们通过在公共测试平台上进行的分析和广泛的实验研究证明，我们的方法显着降低了维护过程中的带宽开销和修复丢失数据的时间。我们还表明，使用非系统代码不会影响吞吐量，只会以更高的CPU使用率为代价。基于这些结果，我们通过确定集群专用于修复的网络带宽还是专用于解码的CPU首先饱和，来评估这种较高的CPU消耗对不同数据冷度配置的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the Eleventh European Conference on Computer Systems

自引率

0.00%

发文量