ADMAD: Application-Driven Metadata Aware De-duplication Archival Storage System

2008 Fifth IEEE International Workshop on Storage Network Architecture and Parallel I/Os Pub Date : 2008-09-22 DOI:10.1109/SNAPI.2008.11

Chuanyi Liu, Yingping Lu, Chunhui Shi, Guanlin Lu, D. Du, Dong-Sheng Wang

{"title":"ADMAD: Application-Driven Metadata Aware De-duplication Archival Storage System","authors":"Chuanyi Liu, Yingping Lu, Chunhui Shi, Guanlin Lu, D. Du, Dong-Sheng Wang","doi":"10.1109/SNAPI.2008.11","DOIUrl":null,"url":null,"abstract":"There is a huge amount of duplicated or redundant data in current storage systems. So data de-duplication, which uses lossless data compression schemes to minimize the duplicated data at the inter-file level, has been receiving broad attention in recent years. But there are still research challenges in current approaches and storage systems, such as: how to chunking the files more efficiently and better leverage potential similarity and identity among dedicated applications; how to store the chunks effectively and reliably into secondary storage devices. In this paper, we propose ADMAD: an application-driven metadata aware de-duplication archival storage system, which makes use of certain meta-data information of different levels in the I/O path to direct the file partitioning into more meaningful data chunks (MC) to maximally reduce the inter-file level duplications. However, the chunks may be with different lengths and variable sizes, storing them into storage devices may result in a lot of fragments and involve a high percentage of random disk accesses, which is very inefficient. Therefore, in ADMAD, chunks are further packaged into fixed sized objects as the storage units to speed up the I/O performance as well as to ease the data management. Preliminary experiments have demonstrated that the proposed system can further reduce the required storage space when compared with current methods (from 20% to near 50% according to several datasets), and largely improves the writing performance (about 50%-70% in average).","PeriodicalId":335253,"journal":{"name":"2008 Fifth IEEE International Workshop on Storage Network Architecture and Parallel I/Os","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"66","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 Fifth IEEE International Workshop on Storage Network Architecture and Parallel I/Os","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SNAPI.2008.11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 66

Abstract

There is a huge amount of duplicated or redundant data in current storage systems. So data de-duplication, which uses lossless data compression schemes to minimize the duplicated data at the inter-file level, has been receiving broad attention in recent years. But there are still research challenges in current approaches and storage systems, such as: how to chunking the files more efficiently and better leverage potential similarity and identity among dedicated applications; how to store the chunks effectively and reliably into secondary storage devices. In this paper, we propose ADMAD: an application-driven metadata aware de-duplication archival storage system, which makes use of certain meta-data information of different levels in the I/O path to direct the file partitioning into more meaningful data chunks (MC) to maximally reduce the inter-file level duplications. However, the chunks may be with different lengths and variable sizes, storing them into storage devices may result in a lot of fragments and involve a high percentage of random disk accesses, which is very inefficient. Therefore, in ADMAD, chunks are further packaged into fixed sized objects as the storage units to speed up the I/O performance as well as to ease the data management. Preliminary experiments have demonstrated that the proposed system can further reduce the required storage space when compared with current methods (from 20% to near 50% according to several datasets), and largely improves the writing performance (about 50%-70% in average).

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

ADMAD:应用驱动的元数据感知重复数据删除归档存储系统

当前存储系统中存在大量重复或冗余数据。因此，数据重复数据删除技术近年来受到了广泛的关注，该技术利用无损的数据压缩方案来减少文件间的重复数据。但是，在现有的方法和存储系统中仍然存在研究挑战，例如:如何更有效地对文件进行分块，并更好地利用专用应用程序之间潜在的相似性和同一性;如何有效、可靠地将数据块存储到二级存储设备中。本文提出了一种应用驱动的元数据感知重复数据删除归档存储系统ADMAD，该系统利用I/O路径中不同级别的元数据信息，将文件分区引导为更有意义的数据块(MC)，以最大限度地减少文件间级别的重复。但是，这些块可能具有不同的长度和大小，将它们存储到存储设备中可能会导致大量的碎片，并且涉及到高百分比的随机磁盘访问，这是非常低效的。因此，在ADMAD中，块被进一步打包成固定大小的对象作为存储单元，以提高I/O性能并简化数据管理。初步实验表明，与现有方法相比，该系统可以进一步减少所需的存储空间(根据多个数据集，从20%减少到接近50%)，并大大提高了写入性能(平均约为50%-70%)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2008 Fifth IEEE International Workshop on Storage Network Architecture and Parallel I/Os

自引率

0.00%

发文量