Alexandria: A Proof-of-Concept Implementation and Evaluation of Generalised Data Deduplication

Lars Nielsen, Rasmus Vestergaard, N. Yazdani, Prasad Talasila, D. Lucani, M. Sipos
{"title":"Alexandria: A Proof-of-Concept Implementation and Evaluation of Generalised Data Deduplication","authors":"Lars Nielsen, Rasmus Vestergaard, N. Yazdani, Prasad Talasila, D. Lucani, M. Sipos","doi":"10.1109/GCWkshps45667.2019.9024368","DOIUrl":null,"url":null,"abstract":"The amount of data generated worldwide is expected to grow from 33 to 175 ZB by 2025 in part driven by the growth of Internet of Things (IoT) and cyber-physical systems (CPS). To cope with this enormous amount of data, new cloud storage techniques must be developed. Generalised Data Deduplication (GDD) is a new paradigm for reducing the cost of storage by systematically identifying near identical data chunks, storing their common component once, and a compact representation of the deviation to the original chunk for each chunk. This paper presents a system architecture for GDD and a proof-of-concept implementation. We evaluated the compression gain of Generalised Data Deduplication using three data sets of varying size and content and compared to the performance of the EXT4 and ZFS file systems, where the latter employs classic deduplication. We show that Generalised Data Deduplication provide up to 16.75% compression gain compared to both EXT4 and ZFS with data sets with less than 5 GB of data.","PeriodicalId":210825,"journal":{"name":"2019 IEEE Globecom Workshops (GC Wkshps)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Globecom Workshops (GC Wkshps)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GCWkshps45667.2019.9024368","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

The amount of data generated worldwide is expected to grow from 33 to 175 ZB by 2025 in part driven by the growth of Internet of Things (IoT) and cyber-physical systems (CPS). To cope with this enormous amount of data, new cloud storage techniques must be developed. Generalised Data Deduplication (GDD) is a new paradigm for reducing the cost of storage by systematically identifying near identical data chunks, storing their common component once, and a compact representation of the deviation to the original chunk for each chunk. This paper presents a system architecture for GDD and a proof-of-concept implementation. We evaluated the compression gain of Generalised Data Deduplication using three data sets of varying size and content and compared to the performance of the EXT4 and ZFS file systems, where the latter employs classic deduplication. We show that Generalised Data Deduplication provide up to 16.75% compression gain compared to both EXT4 and ZFS with data sets with less than 5 GB of data.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
亚历山大:广义数据重复删除的概念验证实现和评估
到2025年,全球产生的数据量预计将从33 ZB增长到175 ZB,部分原因是物联网(IoT)和网络物理系统(CPS)的增长。为了处理如此庞大的数据量,必须开发新的云存储技术。广义数据重复删除(GDD)是一种新的范式,通过系统地识别几乎相同的数据块,存储它们的公共组件一次,并为每个块提供与原始块的偏差的紧凑表示,从而降低存储成本。本文提出了GDD的系统架构和概念验证实现。我们使用三个不同大小和内容的数据集评估了广义数据重复数据删除的压缩增益,并与EXT4和ZFS文件系统的性能进行了比较,后者采用了经典的重复数据删除。我们表明,对于数据集小于5 GB的数据集,与EXT4和ZFS相比,广义数据重复数据删除提供了高达16.75%的压缩增益。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Timeliness Analysis of Service-Driven Collaborative Mobile Edge Computing in UAV Swarm 5G Enabled Mobile Healthcare for Ambulances Secure Quantized Sequential Detection in the Internet of Things with Eavesdroppers A Novel Indoor Coverage Measurement Scheme Based on FRFT and Gaussian Process Regression A Data-Driven Deep Neural Network Pruning Approach Towards Efficient Digital Signal Modulation Recognition
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1