DAM:一种数据所有权感知的多层重复数据删除方案

Yujuan Tan, D. Feng, Zhichao Yan, Guohui Zhou
{"title":"DAM:一种数据所有权感知的多层重复数据删除方案","authors":"Yujuan Tan, D. Feng, Zhichao Yan, Guohui Zhou","doi":"10.1109/NAS.2010.57","DOIUrl":null,"url":null,"abstract":"Beyond the storage savings brought by chunk-level de-duplication in backup and archiving systems, a prominent challenge facing this technology is how to efficiently and effectively identify the duplicate chunks. Most of the chunk fingerprints used to identify individual chunks are stored on disks due to the limited main memory capacity. Checking for chunk fingerprint match on disk for every input chunk is known to be a severe performance bottleneck for the backup process. On the other hand, our intuitions and analyses of real backup data both indicate that duplicate chunks tend to strongly concentrate according to the data ownership. Motivated by this observation and to avoid or alleviate the aforementioned backup performance bottleneck, we propose DAM, a dataownership-aware multi-layered de-duplication scheme that exploits the data chunks’ ownership and uses a tri-layered de-duplication approach to narrow the search space for duplicate chunks to reduce the total disk accesses. Our experimental results with real world datasets on DAM show it reduces the disk accesses by an average of 60.8% and shortens the de-duplication time by an average of 46.3%.","PeriodicalId":284549,"journal":{"name":"2010 IEEE Fifth International Conference on Networking, Architecture, and Storage","volume":"464 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"DAM: A DataOwnership-Aware Multi-layered De-duplication Scheme\",\"authors\":\"Yujuan Tan, D. Feng, Zhichao Yan, Guohui Zhou\",\"doi\":\"10.1109/NAS.2010.57\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Beyond the storage savings brought by chunk-level de-duplication in backup and archiving systems, a prominent challenge facing this technology is how to efficiently and effectively identify the duplicate chunks. Most of the chunk fingerprints used to identify individual chunks are stored on disks due to the limited main memory capacity. Checking for chunk fingerprint match on disk for every input chunk is known to be a severe performance bottleneck for the backup process. On the other hand, our intuitions and analyses of real backup data both indicate that duplicate chunks tend to strongly concentrate according to the data ownership. Motivated by this observation and to avoid or alleviate the aforementioned backup performance bottleneck, we propose DAM, a dataownership-aware multi-layered de-duplication scheme that exploits the data chunks’ ownership and uses a tri-layered de-duplication approach to narrow the search space for duplicate chunks to reduce the total disk accesses. Our experimental results with real world datasets on DAM show it reduces the disk accesses by an average of 60.8% and shortens the de-duplication time by an average of 46.3%.\",\"PeriodicalId\":284549,\"journal\":{\"name\":\"2010 IEEE Fifth International Conference on Networking, Architecture, and Storage\",\"volume\":\"464 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-07-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE Fifth International Conference on Networking, Architecture, and Storage\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NAS.2010.57\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE Fifth International Conference on Networking, Architecture, and Storage","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NAS.2010.57","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

在备份和归档系统中,除了块级重复数据删除带来的存储节省之外,该技术面临的一个突出挑战是如何高效地识别重复的块。由于主内存容量有限,大多数用于识别单个块的块指纹都存储在磁盘上。对于备份过程来说,检查每个输入块在磁盘上的块指纹匹配是一个严重的性能瓶颈。另一方面,我们的直觉和对真实备份数据的分析都表明,根据数据所有权,重复块倾向于强烈集中。基于这一观察结果,为了避免或缓解上述备份性能瓶颈,我们提出了DAM,这是一种数据所有权感知的多层重复数据删除方案,它利用数据块的所有权,并使用三层重复数据删除方法来缩小重复数据块的搜索空间,以减少磁盘访问总量。我们在DAM上对真实数据集的实验结果表明,它平均减少了60.8%的磁盘访问,平均缩短了46.3%的重复数据删除时间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
DAM: A DataOwnership-Aware Multi-layered De-duplication Scheme
Beyond the storage savings brought by chunk-level de-duplication in backup and archiving systems, a prominent challenge facing this technology is how to efficiently and effectively identify the duplicate chunks. Most of the chunk fingerprints used to identify individual chunks are stored on disks due to the limited main memory capacity. Checking for chunk fingerprint match on disk for every input chunk is known to be a severe performance bottleneck for the backup process. On the other hand, our intuitions and analyses of real backup data both indicate that duplicate chunks tend to strongly concentrate according to the data ownership. Motivated by this observation and to avoid or alleviate the aforementioned backup performance bottleneck, we propose DAM, a dataownership-aware multi-layered de-duplication scheme that exploits the data chunks’ ownership and uses a tri-layered de-duplication approach to narrow the search space for duplicate chunks to reduce the total disk accesses. Our experimental results with real world datasets on DAM show it reduces the disk accesses by an average of 60.8% and shortens the de-duplication time by an average of 46.3%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Heterogeneous Multi-core Parallel SGEMM Performance Testing and Analysis on Cell/B.E Processor Stabilizing Path Modification of Power-Aware On/Off Interconnection Networks Modelling Speculative Prefetching for Hybrid Storage Systems Binomial Probability Redundancy Strategy for Multimedia Transmission Fast and Memory-Efficient Traffic Classification with Deep Packet Inspection in CMP Architecture
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1