利用相似性和局部性来增强重复数据删除的指纹预取

Yongtao Zhou, Yuhui Deng, Junjie Xie
{"title":"利用相似性和局部性来增强重复数据删除的指纹预取","authors":"Yongtao Zhou, Yuhui Deng, Junjie Xie","doi":"10.1109/PADSW.2014.7097802","DOIUrl":null,"url":null,"abstract":"Data deduplication has been widely used at data backup system due to the significantly reduced requirements of storage capacity and network bandwidth. However, the performance of data deduplication gradually decreases with the growth of deduplicated data. This is because the volume of fingerprints grows significantly with the increase of backup data, and a large portion of fingerprints have to be stored on disk drives. This incurs frequent disk accesses to locate fingerprints and blocks the process of data deduplication. Furthermore, the fingerprints belonging to the same file may be discretely stored on disk drives. This generates random and small disk accesses, and results in significant performance degradation when the fingerprints are referred. Additionally, a single fingerprint may appear only once during a backup process. This results in very low cache hit ratio due to lacking temporal locality. This paper proposes to employ file similarity to enhance the fingerprint prefetching, thus improving the cache hit ratio and the performance of data deduplication. Furthermore, the fingerprints are arranged sequently in terms of the backup data stream to maintain the locality and promote the performance. Experimental results demonstrate that the proposed idea can effectively reduce the number of fingerprint accesses going to disk drives, decrease the query overhead of fingerprints, thus significantly alleviating the disk bottleneck of data deduplication.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Leverage similarity and locality to enhance fingerprint prefetching of data deduplication\",\"authors\":\"Yongtao Zhou, Yuhui Deng, Junjie Xie\",\"doi\":\"10.1109/PADSW.2014.7097802\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data deduplication has been widely used at data backup system due to the significantly reduced requirements of storage capacity and network bandwidth. However, the performance of data deduplication gradually decreases with the growth of deduplicated data. This is because the volume of fingerprints grows significantly with the increase of backup data, and a large portion of fingerprints have to be stored on disk drives. This incurs frequent disk accesses to locate fingerprints and blocks the process of data deduplication. Furthermore, the fingerprints belonging to the same file may be discretely stored on disk drives. This generates random and small disk accesses, and results in significant performance degradation when the fingerprints are referred. Additionally, a single fingerprint may appear only once during a backup process. This results in very low cache hit ratio due to lacking temporal locality. This paper proposes to employ file similarity to enhance the fingerprint prefetching, thus improving the cache hit ratio and the performance of data deduplication. Furthermore, the fingerprints are arranged sequently in terms of the backup data stream to maintain the locality and promote the performance. Experimental results demonstrate that the proposed idea can effectively reduce the number of fingerprint accesses going to disk drives, decrease the query overhead of fingerprints, thus significantly alleviating the disk bottleneck of data deduplication.\",\"PeriodicalId\":421740,\"journal\":{\"name\":\"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PADSW.2014.7097802\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PADSW.2014.7097802","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

摘要

由于重复数据删除对存储容量和网络带宽的要求大大降低,因此在数据备份系统中得到了广泛的应用。但随着重复数据删除数据量的增长,重复数据删除的性能会逐渐降低。这是因为随着备份数据的增加,指纹的数量会显著增加,而且有很大一部分指纹需要存储在磁盘驱动器上。这会导致频繁访问磁盘以定位指纹,并导致重复数据删除进程受阻。此外,属于同一文件的指纹可能被分散地存储在磁盘驱动器上。这会产生随机和小的磁盘访问,并且在引用指纹时导致显著的性能下降。此外,单个指纹在备份过程中可能只出现一次。由于缺乏时间局部性,这将导致非常低的缓存命中率。本文提出利用文件相似度增强指纹预取,从而提高缓存命中率和重复数据删除性能。此外,指纹按照备份数据流顺序排列,保持了局部性,提高了性能。实验结果表明,该方法可以有效减少指纹访问磁盘驱动器的次数,降低指纹查询开销,从而显著缓解重复数据删除的磁盘瓶颈。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Leverage similarity and locality to enhance fingerprint prefetching of data deduplication
Data deduplication has been widely used at data backup system due to the significantly reduced requirements of storage capacity and network bandwidth. However, the performance of data deduplication gradually decreases with the growth of deduplicated data. This is because the volume of fingerprints grows significantly with the increase of backup data, and a large portion of fingerprints have to be stored on disk drives. This incurs frequent disk accesses to locate fingerprints and blocks the process of data deduplication. Furthermore, the fingerprints belonging to the same file may be discretely stored on disk drives. This generates random and small disk accesses, and results in significant performance degradation when the fingerprints are referred. Additionally, a single fingerprint may appear only once during a backup process. This results in very low cache hit ratio due to lacking temporal locality. This paper proposes to employ file similarity to enhance the fingerprint prefetching, thus improving the cache hit ratio and the performance of data deduplication. Furthermore, the fingerprints are arranged sequently in terms of the backup data stream to maintain the locality and promote the performance. Experimental results demonstrate that the proposed idea can effectively reduce the number of fingerprint accesses going to disk drives, decrease the query overhead of fingerprints, thus significantly alleviating the disk bottleneck of data deduplication.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Optimal bandwidth allocation with dynamic multi-path routing for non-critical traffic in AFDX networks Sensor-free corner shape detection by wireless networks Accelerated variance reduction methods on GPU Fault-Tolerant bi-directional communications in web-based applications Performance analysis of HPC applications with irregular tree data structures
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1