Optimal Node Selection for Data Regeneration in Heterogeneous Distributed Storage Systems

Qingyuan Gong, Jiaqi Wang, Dongsheng Wei, Jin Wang, Xin Wang
{"title":"Optimal Node Selection for Data Regeneration in Heterogeneous Distributed Storage Systems","authors":"Qingyuan Gong, Jiaqi Wang, Dongsheng Wei, Jin Wang, Xin Wang","doi":"10.1109/ICPP.2015.48","DOIUrl":null,"url":null,"abstract":"Distributed storage systems introduce redundancy to protect data from node failures. After a storage node fails, the lost data should be regenerated at a replacement storage node as soon as possible to maintain the same level of redundancy. Minimizing such a regeneration time is critical to the reliability of distributed storage systems. Existing work commits to reduce the regeneration time by either minimizing the regenerating traffic, or adjusting the regenerating traffic patterns, whereas nodes participating the regeneration are generally assumed to be given beforehand. However, real-world distributed storage systems usually exhibit heterogeneous link capacities, and the regeneration time is highly related to the selection of the participating nodes. In this paper, we consider the minimization of the regeneration time by selecting the participating nodes in heterogeneous networks. We propose optimal node selection algorithms respectively for two cases: 1) the newcomer is not given, 2) both the newcomer and the providers are not given. Analysis shows that the optimal regeneration time can be achieved in each case. We then consider the effect of flexible amount of data blocks from each provider on the regeneration time, and apply this observation to enhance our schemes. Experiment results show that our node selection schemes can significantly reduce the regeneration time, especially in practical networks with heterogeneous link capacities, compared with the scheme based on random node selection.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 44th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2015.48","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

Abstract

Distributed storage systems introduce redundancy to protect data from node failures. After a storage node fails, the lost data should be regenerated at a replacement storage node as soon as possible to maintain the same level of redundancy. Minimizing such a regeneration time is critical to the reliability of distributed storage systems. Existing work commits to reduce the regeneration time by either minimizing the regenerating traffic, or adjusting the regenerating traffic patterns, whereas nodes participating the regeneration are generally assumed to be given beforehand. However, real-world distributed storage systems usually exhibit heterogeneous link capacities, and the regeneration time is highly related to the selection of the participating nodes. In this paper, we consider the minimization of the regeneration time by selecting the participating nodes in heterogeneous networks. We propose optimal node selection algorithms respectively for two cases: 1) the newcomer is not given, 2) both the newcomer and the providers are not given. Analysis shows that the optimal regeneration time can be achieved in each case. We then consider the effect of flexible amount of data blocks from each provider on the regeneration time, and apply this observation to enhance our schemes. Experiment results show that our node selection schemes can significantly reduce the regeneration time, especially in practical networks with heterogeneous link capacities, compared with the scheme based on random node selection.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
异构分布式存储系统数据再生的最优节点选择
分布式存储系统通过引入冗余来保护数据免受节点故障的影响。当存储节点出现故障时,需要尽快在替换的存储节点上恢复丢失的数据,以保持原有的冗余度。最小化再生时间对于提高分布式存储系统的可靠性至关重要。现有工作通过最小化再生流量或调整再生流量模式来减少再生时间,而参与再生的节点通常是预先给定的。然而,现实中的分布式存储系统通常表现为异构链路容量,再生时间与参与节点的选择高度相关。本文通过选择异构网络中的参与节点来考虑再生时间的最小化。本文分别针对两种情况提出了最优节点选择算法:1)新来者不给定,2)新来者和提供者都不给定。分析表明,在每种情况下均可获得最佳再生时间。然后,我们考虑了来自每个提供者的数据块的灵活数量对再生时间的影响,并应用这一观察结果来改进我们的方案。实验结果表明,与基于随机节点选择的方案相比,我们的节点选择方案可以显著缩短再生时间,特别是在具有异构链路容量的实际网络中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Elastic and Efficient Virtual Network Provisioning for Cloud-Based Multi-tier Applications Design and Implementation of a Highly Efficient DGEMM for 64-Bit ARMv8 Multi-core Processors Leveraging Error Compensation to Minimize Time Deviation in Parallel Multi-core Simulations Crowdsourcing Sensing Workloads of Heterogeneous Tasks: A Distributed Fairness-Aware Approach TAPS: Software Defined Task-Level Deadline-Aware Preemptive Flow Scheduling in Data Centers
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1