基于文件奇偶校验的重复数据删除存储系统可靠性研究

Suzhen Wu, Huagao Luan, Bo Mao, Hong Jiang, Gen Niu, Hui Rao, Fang Yu, Jindong Zhou
{"title":"基于文件奇偶校验的重复数据删除存储系统可靠性研究","authors":"Suzhen Wu, Huagao Luan, Bo Mao, Hong Jiang, Gen Niu, Hui Rao, Fang Yu, Jindong Zhou","doi":"10.1109/SRDS.2018.00028","DOIUrl":null,"url":null,"abstract":"The reliability issue in deduplication-based storage systems has not received adequate attention. Existing approaches introduce data redundancy after files have been deduplicated, either by replication on critical data chunks, i.e., chunks with high reference count, or RAID schemes on unique data chunks, which means that these schemes are based on individual unique data chunks rather than individual files. This can leave individual files vulnerable to losses, particularly in the presence of transient and unrecoverable data chunk errors such as latent sector errors. To address this file reliability issue, this paper proposes a Per-File Parity (short for PFP) scheme to improve the reliability of deduplication-based storage systems. PFP computes the XOR parity within parity groups of data chunks of each file after the chunking process but before the data chunks are deduplicated. Therefore, PFP can provide parity redundancy protection for all files by intra-file recovery and a higher-level protection for data chunks with high reference counts by inter-file recovery. Our reliability analysis and extensive data-driven, failure-injection based experiments conducted on a prototype implementation of PFP show that PFP significantly outperforms the existing redundancy solutions, DTR and RCR, in system reliability, tolerating multiple data chunk failures and guaranteeing file availability upon multiple data chunk failures. Moreover, a performance evaluation shows that PFP only incurs an average of 5.7% performance degradation to the deduplication-based storage system.","PeriodicalId":219374,"journal":{"name":"2018 IEEE 37th Symposium on Reliable Distributed Systems (SRDS)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Improving Reliability of Deduplication-Based Storage Systems with Per-File Parity\",\"authors\":\"Suzhen Wu, Huagao Luan, Bo Mao, Hong Jiang, Gen Niu, Hui Rao, Fang Yu, Jindong Zhou\",\"doi\":\"10.1109/SRDS.2018.00028\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The reliability issue in deduplication-based storage systems has not received adequate attention. Existing approaches introduce data redundancy after files have been deduplicated, either by replication on critical data chunks, i.e., chunks with high reference count, or RAID schemes on unique data chunks, which means that these schemes are based on individual unique data chunks rather than individual files. This can leave individual files vulnerable to losses, particularly in the presence of transient and unrecoverable data chunk errors such as latent sector errors. To address this file reliability issue, this paper proposes a Per-File Parity (short for PFP) scheme to improve the reliability of deduplication-based storage systems. PFP computes the XOR parity within parity groups of data chunks of each file after the chunking process but before the data chunks are deduplicated. Therefore, PFP can provide parity redundancy protection for all files by intra-file recovery and a higher-level protection for data chunks with high reference counts by inter-file recovery. Our reliability analysis and extensive data-driven, failure-injection based experiments conducted on a prototype implementation of PFP show that PFP significantly outperforms the existing redundancy solutions, DTR and RCR, in system reliability, tolerating multiple data chunk failures and guaranteeing file availability upon multiple data chunk failures. Moreover, a performance evaluation shows that PFP only incurs an average of 5.7% performance degradation to the deduplication-based storage system.\",\"PeriodicalId\":219374,\"journal\":{\"name\":\"2018 IEEE 37th Symposium on Reliable Distributed Systems (SRDS)\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE 37th Symposium on Reliable Distributed Systems (SRDS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SRDS.2018.00028\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 37th Symposium on Reliable Distributed Systems (SRDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SRDS.2018.00028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

基于重复数据删除的存储系统的可靠性问题没有得到足够的重视。现有的方法在文件重复数据删除后引入数据冗余,要么通过在关键数据块(即具有高引用计数的块)上的复制,要么通过在唯一数据块上的RAID方案,这意味着这些方案基于单个唯一数据块,而不是单个文件。这可能使单个文件容易丢失,特别是在存在瞬时和不可恢复的数据块错误(如潜在扇区错误)的情况下。为了解决这个文件可靠性问题,本文提出了一个文件奇偶校验(PFP)方案来提高基于重复数据删除的存储系统的可靠性。PFP计算每个文件的数据块的奇偶组内的异或奇偶,在分块过程之后,但在数据块重复数据删除之前。因此,PFP可以通过文件内恢复为所有文件提供奇偶校验冗余保护,并通过文件间恢复为具有高引用计数的数据块提供更高级别的保护。我们对PFP的原型实现进行了可靠性分析和广泛的数据驱动、基于故障注入的实验,结果表明,PFP在系统可靠性、容忍多个数据块故障和保证多个数据块故障时文件可用性方面显著优于现有的冗余解决方案DTR和RCR。性能评估表明,PFP对基于重复数据删除的存储系统的平均性能下降仅为5.7%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Improving Reliability of Deduplication-Based Storage Systems with Per-File Parity
The reliability issue in deduplication-based storage systems has not received adequate attention. Existing approaches introduce data redundancy after files have been deduplicated, either by replication on critical data chunks, i.e., chunks with high reference count, or RAID schemes on unique data chunks, which means that these schemes are based on individual unique data chunks rather than individual files. This can leave individual files vulnerable to losses, particularly in the presence of transient and unrecoverable data chunk errors such as latent sector errors. To address this file reliability issue, this paper proposes a Per-File Parity (short for PFP) scheme to improve the reliability of deduplication-based storage systems. PFP computes the XOR parity within parity groups of data chunks of each file after the chunking process but before the data chunks are deduplicated. Therefore, PFP can provide parity redundancy protection for all files by intra-file recovery and a higher-level protection for data chunks with high reference counts by inter-file recovery. Our reliability analysis and extensive data-driven, failure-injection based experiments conducted on a prototype implementation of PFP show that PFP significantly outperforms the existing redundancy solutions, DTR and RCR, in system reliability, tolerating multiple data chunk failures and guaranteeing file availability upon multiple data chunk failures. Moreover, a performance evaluation shows that PFP only incurs an average of 5.7% performance degradation to the deduplication-based storage system.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Enabling State Estimation for Fault Identification in Water Distribution Systems Under Large Disasters Mobile Cloud-of-Clouds Storage Made Efficient: A Network Coding Based Approach Collective Attestation: for a Stronger Security in Embedded Networks Impact of Man-In-The-Middle Attacks on Ethereum PubSub-SGX: Exploiting Trusted Execution Environments for Privacy-Preserving Publish/Subscribe Systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1