On Data Parallelism of Erasure Coding in Distributed Storage Systems

Jun Li, Baochun Li
{"title":"On Data Parallelism of Erasure Coding in Distributed Storage Systems","authors":"Jun Li, Baochun Li","doi":"10.1109/ICDCS.2017.191","DOIUrl":null,"url":null,"abstract":"Deployed in various distributed storage systems, erasure coding has demonstrated its advantages of low storage overhead and high failure tolerance. Typically in an erasure-coded distributed storage system, systematic maximum distance seperable (MDS) codes are chosen since the optimal storage overhead can be achieved and meanwhile data can be read directly without decoding operations. However, data parallelism of existing MDS codes is limited, because we can only read data from some specific servers in parallel without decoding operations. In this paper, we propose Carousel codes, designed to allow data to be read from an arbitrary number of servers in parallel without decoding, while preserving the optimal storage overhead of MDS codes. Furthermore, Carousel codes can achieve the optimal network traffic to reconstruct an unavailable block. We have implemented a prototype of Carousel codes on Apache Hadoop. Our experimental results have demonstrated that Carousel codes can make MapReduce jobs finish with almost 50% less time and reduce data access latency significantly, with a comparable throughput in the encoding and decoding operations and no additional sacrifice of failure tolerance or the network overhead to reconstruct unavailable data.","PeriodicalId":127689,"journal":{"name":"2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS.2017.191","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

Deployed in various distributed storage systems, erasure coding has demonstrated its advantages of low storage overhead and high failure tolerance. Typically in an erasure-coded distributed storage system, systematic maximum distance seperable (MDS) codes are chosen since the optimal storage overhead can be achieved and meanwhile data can be read directly without decoding operations. However, data parallelism of existing MDS codes is limited, because we can only read data from some specific servers in parallel without decoding operations. In this paper, we propose Carousel codes, designed to allow data to be read from an arbitrary number of servers in parallel without decoding, while preserving the optimal storage overhead of MDS codes. Furthermore, Carousel codes can achieve the optimal network traffic to reconstruct an unavailable block. We have implemented a prototype of Carousel codes on Apache Hadoop. Our experimental results have demonstrated that Carousel codes can make MapReduce jobs finish with almost 50% less time and reduce data access latency significantly, with a comparable throughput in the encoding and decoding operations and no additional sacrifice of failure tolerance or the network overhead to reconstruct unavailable data.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
论分布式存储系统中擦除编码的数据并行性
在各种分布式存储系统中,擦除编码已显示出其低存储开销和高故障容错性的优势。通常情况下,在消除编码分布式存储系统中,会选择系统化的最大距离可分离(MDS)编码,因为这样可以达到最佳存储开销,同时无需解码操作即可直接读取数据。然而,现有 MDS 代码的数据并行性是有限的,因为我们只能从某些特定服务器并行读取数据,而无需进行解码操作。在本文中,我们提出了旋转木马代码(Carousel codes),目的是在保留 MDS 代码最佳存储开销的前提下,允许从任意数量的服务器并行读取数据而无需解码。此外,旋转木马代码还能以最佳网络流量重建不可用的数据块。我们在 Apache Hadoop 上实现了 Carousel 代码的原型。我们的实验结果表明,在编码和解码操作吞吐量相当、不额外牺牲故障容忍度或网络开销以重建不可用数据的情况下,Carousel代码能使MapReduce作业完成的时间缩短近50%,并显著减少数据访问延迟。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Proximity Awareness Approach to Enhance Propagation Delay on the Bitcoin Peer-to-Peer Network ACTiCLOUD: Enabling the Next Generation of Cloud Applications The Internet of Things and Multiagent Systems: Decentralized Intelligence in Distributed Computing Decentralised Runtime Monitoring for Access Control Systems in Cloud Federations The Case for Using Content-Centric Networking for Distributing High-Energy Physics Software
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1