CDAC: Content-Driven Deduplication-Aware Storage Cache

Yujuan Tan, Wen Xia, Jing Xie, Congcong Xu, Zhichao Yan, Hong Jiang, Yajun Zhao, Min Fu, Xianzhang Chen, Duo Liu
{"title":"CDAC: Content-Driven Deduplication-Aware Storage Cache","authors":"Yujuan Tan, Wen Xia, Jing Xie, Congcong Xu, Zhichao Yan, Hong Jiang, Yajun Zhao, Min Fu, Xianzhang Chen, Duo Liu","doi":"10.1109/MSST.2019.00008","DOIUrl":null,"url":null,"abstract":"Data deduplication, as a proven technology for effective data reduction in backup and archive storage systems, also demonstrates the promise in increasing the logical space capacity of storage caches by removing redundant data. However, our in-depth evaluation of the existing deduplication-aware caching algorithms reveals that they do improve the hit ratios compared to the caching algorithms without deduplication, especially when the cache block size is set to 4KB. But when the block size is larger than 4KB, a clear trend for modern storage systems, their hit ratios are significantly reduced. A slight increase in hit ratios due to deduplicationmay not be able to improve the overall storage performance because of the high overhead created by deduplication. To address this problem, in this paper we propose CDAC, a Content-driven Deduplication-Aware Cache, which focuses on exploiting the blocks' content redundancy and their intensity of content sharing among source addresses in cache management strategies. We have implemented CDAC based on LRU and ARC algorithms, called CDAC-LRU and CDAC-ARC respectively. Our extensive experimental results show that CDACLRU and CDAC-ARC outperform the state-of-the-art deduplication-aware caching algorithms, D-LRU and DARC, by up to 19.49X in read cache hit ratio, with an average of 1.95X under real-world traces when the cache size ranges from 20% to 80% of the working set size and the block size ranges from 4KB to 64 KB.","PeriodicalId":391517,"journal":{"name":"2019 35th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 35th Symposium on Mass Storage Systems and Technologies (MSST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MSST.2019.00008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Data deduplication, as a proven technology for effective data reduction in backup and archive storage systems, also demonstrates the promise in increasing the logical space capacity of storage caches by removing redundant data. However, our in-depth evaluation of the existing deduplication-aware caching algorithms reveals that they do improve the hit ratios compared to the caching algorithms without deduplication, especially when the cache block size is set to 4KB. But when the block size is larger than 4KB, a clear trend for modern storage systems, their hit ratios are significantly reduced. A slight increase in hit ratios due to deduplicationmay not be able to improve the overall storage performance because of the high overhead created by deduplication. To address this problem, in this paper we propose CDAC, a Content-driven Deduplication-Aware Cache, which focuses on exploiting the blocks' content redundancy and their intensity of content sharing among source addresses in cache management strategies. We have implemented CDAC based on LRU and ARC algorithms, called CDAC-LRU and CDAC-ARC respectively. Our extensive experimental results show that CDACLRU and CDAC-ARC outperform the state-of-the-art deduplication-aware caching algorithms, D-LRU and DARC, by up to 19.49X in read cache hit ratio, with an average of 1.95X under real-world traces when the cache size ranges from 20% to 80% of the working set size and the block size ranges from 4KB to 64 KB.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
CDAC:内容驱动的重复数据删除感知存储缓存
重复数据删除作为备份和归档存储系统中有效减少数据的技术,也证明了通过删除冗余数据来增加存储缓存的逻辑空间容量的前景。然而,我们对现有的支持重复数据删除的缓存算法进行了深入的评估,发现与没有重复数据删除的缓存算法相比,它们确实提高了命中率,特别是当缓存块大小设置为4KB时。但是当块大小大于4KB时(这是现代存储系统的一个明显趋势),它们的命中率会显著降低。由于重复数据删除导致的命中率略有提高,但可能无法提高整体存储性能,因为重复数据删除造成了很高的开销。为了解决这一问题,本文提出了CDAC,一种内容驱动的重复数据删除感知缓存,其重点是在缓存管理策略中利用块的内容冗余及其在源地址之间的内容共享强度。我们实现了基于LRU和ARC算法的CDAC,分别称为CDAC-LRU和CDAC-ARC。我们广泛的实验结果表明,CDACLRU和CDAC-ARC在读缓存命中率上优于最先进的重复数据删除感知缓存算法D-LRU和DARC,最高可达19.49倍,在实际跟踪中,当缓存大小为工作集大小的20%至80%,块大小为4KB至64kb时,平均为1.95倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Mitigate HDD Fail-Slow by Pro-actively Utilizing System-level Data Redundancy with Enhanced HDD Controllability and Observability Fighting with Unknowns: Estimating the Performance of Scalable Distributed Storage Systems with Minimal Measurement Data Towards Virtual Machine Image Management for Persistent Memory CDAC: Content-Driven Deduplication-Aware Storage Cache vNVML: An Efficient User Space Library for Virtualizing and Sharing Non-Volatile Memories
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1