ADMAD: Application-Driven Metadata Aware De-duplication Archival Storage System

Chuanyi Liu, Yingping Lu, Chunhui Shi, Guanlin Lu, D. Du, Dong-Sheng Wang
{"title":"ADMAD: Application-Driven Metadata Aware De-duplication Archival Storage System","authors":"Chuanyi Liu, Yingping Lu, Chunhui Shi, Guanlin Lu, D. Du, Dong-Sheng Wang","doi":"10.1109/SNAPI.2008.11","DOIUrl":null,"url":null,"abstract":"There is a huge amount of duplicated or redundant data in current storage systems. So data de-duplication, which uses lossless data compression schemes to minimize the duplicated data at the inter-file level, has been receiving broad attention in recent years. But there are still research challenges in current approaches and storage systems, such as: how to chunking the files more efficiently and better leverage potential similarity and identity among dedicated applications; how to store the chunks effectively and reliably into secondary storage devices. In this paper, we propose ADMAD: an application-driven metadata aware de-duplication archival storage system, which makes use of certain meta-data information of different levels in the I/O path to direct the file partitioning into more meaningful data chunks (MC) to maximally reduce the inter-file level duplications. However, the chunks may be with different lengths and variable sizes, storing them into storage devices may result in a lot of fragments and involve a high percentage of random disk accesses, which is very inefficient. Therefore, in ADMAD, chunks are further packaged into fixed sized objects as the storage units to speed up the I/O performance as well as to ease the data management. Preliminary experiments have demonstrated that the proposed system can further reduce the required storage space when compared with current methods (from 20% to near 50% according to several datasets), and largely improves the writing performance (about 50%-70% in average).","PeriodicalId":335253,"journal":{"name":"2008 Fifth IEEE International Workshop on Storage Network Architecture and Parallel I/Os","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"66","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 Fifth IEEE International Workshop on Storage Network Architecture and Parallel I/Os","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SNAPI.2008.11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 66

Abstract

There is a huge amount of duplicated or redundant data in current storage systems. So data de-duplication, which uses lossless data compression schemes to minimize the duplicated data at the inter-file level, has been receiving broad attention in recent years. But there are still research challenges in current approaches and storage systems, such as: how to chunking the files more efficiently and better leverage potential similarity and identity among dedicated applications; how to store the chunks effectively and reliably into secondary storage devices. In this paper, we propose ADMAD: an application-driven metadata aware de-duplication archival storage system, which makes use of certain meta-data information of different levels in the I/O path to direct the file partitioning into more meaningful data chunks (MC) to maximally reduce the inter-file level duplications. However, the chunks may be with different lengths and variable sizes, storing them into storage devices may result in a lot of fragments and involve a high percentage of random disk accesses, which is very inefficient. Therefore, in ADMAD, chunks are further packaged into fixed sized objects as the storage units to speed up the I/O performance as well as to ease the data management. Preliminary experiments have demonstrated that the proposed system can further reduce the required storage space when compared with current methods (from 20% to near 50% according to several datasets), and largely improves the writing performance (about 50%-70% in average).
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
ADMAD:应用驱动的元数据感知重复数据删除归档存储系统
当前存储系统中存在大量重复或冗余数据。因此,数据重复数据删除技术近年来受到了广泛的关注,该技术利用无损的数据压缩方案来减少文件间的重复数据。但是,在现有的方法和存储系统中仍然存在研究挑战,例如:如何更有效地对文件进行分块,并更好地利用专用应用程序之间潜在的相似性和同一性;如何有效、可靠地将数据块存储到二级存储设备中。本文提出了一种应用驱动的元数据感知重复数据删除归档存储系统ADMAD,该系统利用I/O路径中不同级别的元数据信息,将文件分区引导为更有意义的数据块(MC),以最大限度地减少文件间级别的重复。但是,这些块可能具有不同的长度和大小,将它们存储到存储设备中可能会导致大量的碎片,并且涉及到高百分比的随机磁盘访问,这是非常低效的。因此,在ADMAD中,块被进一步打包成固定大小的对象作为存储单元,以提高I/O性能并简化数据管理。初步实验表明,与现有方法相比,该系统可以进一步减少所需的存储空间(根据多个数据集,从20%减少到接近50%),并大大提高了写入性能(平均约为50%-70%)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Partial-Distribution-Fault-Aware Protocol for Consistent Updates in Distributed Storage Systems DIG: Rapid Characterization of Modern Hard Disk Drive and Its Performance Implication Pre-allocation Size Adjusting Methods Depending on Growing File Size ADMAD: Application-Driven Metadata Aware De-duplication Archival Storage System A Model for Storage Processes in Network Environment and Its Implementation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1