Practical Bounds on Optimal Caching with Variable Object Sizes

Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems Pub Date : 2018-06-12 DOI:10.1145/3219617.3219627

Daniel S. Berger, Nathan Beckmann, Mor Harchol-Balter

{"title":"Practical Bounds on Optimal Caching with Variable Object Sizes","authors":"Daniel S. Berger, Nathan Beckmann, Mor Harchol-Balter","doi":"10.1145/3219617.3219627","DOIUrl":null,"url":null,"abstract":"Many recent caching systems aim to improve miss ratios, but there is no good sense among practitioners of how much further miss ratios can be improved. In other words, should the systems community continue working on this problem? Currently, there is no principled answer to this question. In practice, object sizes often vary by several orders of magnitude, where computing the optimal miss ratio (OPT) is known to be NP-hard. The few known results on caching with variable object sizes provide very weak bounds and are impractical to compute on traces of realistic length. We propose a new method to compute upper and lower bounds on OPT. Our key insight is to represent caching as a min-cost flow problem, hence we call our method the flow-based offline optimal (FOO). We prove that, under simple independence assumptions, FOO's bounds become tight as the number of objects goes to infinity. Indeed, FOO's error over 10M requests of production CDN and storage traces is negligible: at most 0.3%. FOO thus reveals, for the first time, the limits of caching with variable object sizes. While FOO is very accurate, it is computationally impractical on traces with hundreds of millions of requests. We therefore extend FOO to obtain more efficient bounds on OPT, which we call practical flow-based offline optimal (PFOO). We evaluate PFOO on several full production traces and use it to compare OPT to prior online policies. This analysis shows that current caching systems are in fact still far from optimal, suffering 11-43% more cache misses than OPT, whereas the best prior offline bounds suggest that there is essentially no room for improvement.","PeriodicalId":210440,"journal":{"name":"Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3219617.3219627","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

Abstract

Many recent caching systems aim to improve miss ratios, but there is no good sense among practitioners of how much further miss ratios can be improved. In other words, should the systems community continue working on this problem? Currently, there is no principled answer to this question. In practice, object sizes often vary by several orders of magnitude, where computing the optimal miss ratio (OPT) is known to be NP-hard. The few known results on caching with variable object sizes provide very weak bounds and are impractical to compute on traces of realistic length. We propose a new method to compute upper and lower bounds on OPT. Our key insight is to represent caching as a min-cost flow problem, hence we call our method the flow-based offline optimal (FOO). We prove that, under simple independence assumptions, FOO's bounds become tight as the number of objects goes to infinity. Indeed, FOO's error over 10M requests of production CDN and storage traces is negligible: at most 0.3%. FOO thus reveals, for the first time, the limits of caching with variable object sizes. While FOO is very accurate, it is computationally impractical on traces with hundreds of millions of requests. We therefore extend FOO to obtain more efficient bounds on OPT, which we call practical flow-based offline optimal (PFOO). We evaluate PFOO on several full production traces and use it to compare OPT to prior online policies. This analysis shows that current caching systems are in fact still far from optimal, suffering 11-43% more cache misses than OPT, whereas the best prior offline bounds suggest that there is essentially no room for improvement.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

可变对象大小的最优缓存的实际边界

许多最新的缓存系统都旨在提高未命中率，但是在从业者中没有很好的意识到未命中率可以进一步提高多少。换句话说，系统社区应该继续解决这个问题吗?目前，这个问题没有原则性的答案。在实践中，对象大小经常变化几个数量级，其中计算最佳脱靶率(OPT)被认为是np困难的。关于可变对象大小的缓存的几个已知结果提供了非常弱的边界，并且在实际长度的轨迹上计算是不切实际的。我们提出了一种计算OPT上界和下界的新方法。我们的关键观点是将缓存表示为最小成本流问题，因此我们称我们的方法为基于流的离线最优(FOO)。我们证明，在简单的独立性假设下，当对象的数量趋于无穷大时，FOO的边界变得紧密。实际上，FOO在生产CDN和存储跟踪的10M请求上的错误可以忽略不计:最多0.3%。因此，FOO首次揭示了使用可变对象大小进行缓存的限制。虽然FOO非常精确，但是对于数亿个请求的跟踪来说，它在计算上是不切实际的。因此，我们扩展FOO以获得更有效的OPT边界，我们称之为实际基于流的离线最优(PFOO)。我们在几个完整的生产轨迹上评估PFOO，并用它来比较OPT与之前的在线策略。这一分析表明，当前的缓存系统实际上仍远未达到最佳状态，比OPT的缓存丢失率高出11-43%，而最佳的先前离线边界表明基本上没有改进的余地。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems

自引率

0.00%

发文量

期刊最新文献

Session details: Networking Asymptotically Optimal Load Balancing Topologies On Resource Pooling and Separation for LRU Caching Working Set Size Estimation Techniques in Virtualized Environments: One Size Does not Fit All PreFix: Switch Failure Prediction in Datacenter Networks