Fast Variable-Grained Resemblance Data Deduplication For Cloud Storage

Xuming Ye, Jia Tang, Wenlong Tian, Ruixuan Li, Weijun Xiao, Yuqing Geng, Zhiyong Xu
{"title":"Fast Variable-Grained Resemblance Data Deduplication For Cloud Storage","authors":"Xuming Ye, Jia Tang, Wenlong Tian, Ruixuan Li, Weijun Xiao, Yuqing Geng, Zhiyong Xu","doi":"10.1109/nas51552.2021.9605398","DOIUrl":null,"url":null,"abstract":"With the prevalence of cloud storage, data deduplication has been a widely used technology by removing cross users’ duplicate data and saving network bandwidth. Nevertheless, traditional data deduplication hardly detects duplicate data among resemblance chunks. Currently, a resemblance data deduplication, called Finesse, has been proposed to detect and remove the duplicate data among similar chunks efficiently. However, we observe that the chunks following the similar chunk have a high chance of resembling data locality property, and vice versa. Processing these adjacent similar chunks in small average chunk size level increases the metadata, which deteriorates the deduplication system performance. Moreover, existing resemblance data deduplication schemes ignore the performance impact from metadata. Therefore, we propose a fast variable-grained resemblance data deduplication for cloud storage. It dynamically combines the adjacent resemblance chunks or unique chunks or breaks those chunks, located at the transition region between resemblance chunks and unique chunks. Finally, we implement a prototype and conduct a serial of experiments on real-world datasets. The results show that our method dramatically reduces the metadata size while achieving the high deduplication ratio.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/nas51552.2021.9605398","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

With the prevalence of cloud storage, data deduplication has been a widely used technology by removing cross users’ duplicate data and saving network bandwidth. Nevertheless, traditional data deduplication hardly detects duplicate data among resemblance chunks. Currently, a resemblance data deduplication, called Finesse, has been proposed to detect and remove the duplicate data among similar chunks efficiently. However, we observe that the chunks following the similar chunk have a high chance of resembling data locality property, and vice versa. Processing these adjacent similar chunks in small average chunk size level increases the metadata, which deteriorates the deduplication system performance. Moreover, existing resemblance data deduplication schemes ignore the performance impact from metadata. Therefore, we propose a fast variable-grained resemblance data deduplication for cloud storage. It dynamically combines the adjacent resemblance chunks or unique chunks or breaks those chunks, located at the transition region between resemblance chunks and unique chunks. Finally, we implement a prototype and conduct a serial of experiments on real-world datasets. The results show that our method dramatically reduces the metadata size while achieving the high deduplication ratio.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
云存储快速变粒度相似性重复数据删除
随着云存储的普及,重复数据删除技术通过消除跨用户的重复数据,节省网络带宽而得到广泛应用。然而,传统的重复数据删除很难检测到相似块之间的重复数据。目前,人们提出了一种相似数据重复删除技术,即Finesse,以有效地检测和删除相似数据块之间的重复数据。然而,我们观察到,在相似块之后的块有很高的机会具有相似的数据局部性属性,反之亦然。这些相邻的相似块以较小的平均块大小处理,会增加元数据,从而降低重复数据删除系统的性能。此外,现有的相似性重复数据删除方案忽略了元数据对性能的影响。因此,我们提出了一种快速的云存储可变粒度相似性重复数据删除方法。它动态地将相邻的相似块或唯一块合并或分割,这些块位于相似块和唯一块之间的过渡区域。最后,我们实现了一个原型,并在现实世界的数据集上进行了一系列实验。结果表明,该方法在实现高重复数据删除率的同时,显著减小了元数据的大小。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
NVSwap: Latency-Aware Paging using Non-Volatile Main Memory Deflection-Aware Routing Algorithm in Network on Chip against Soft Errors and Crosstalk Faults PLMC: A Predictable Tail Latency Mode Coordinator for Shared NVMe SSD with Multiple Hosts Efficient NVM Crash Consistency by Mitigating Resource Contention Characterizing AI Model Inference Applications Running in the SGX Environment
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1