袋鼠:在Flash上缓存数十亿微小对象

Q3 Computer Science Operating Systems Review (ACM) Pub Date : 2021-10-26 DOI:10.1145/3477132.3483568

Sara McAllister, Benjamin Berg, Julian Tutuncu-Macias, Juncheng Yang, S. Gunasekar, Jimmy Lu, Daniel S. Berger, Nathan Beckmann, G. Ganger

{"title":"袋鼠:在Flash上缓存数十亿微小对象","authors":"Sara McAllister, Benjamin Berg, Julian Tutuncu-Macias, Juncheng Yang, S. Gunasekar, Jimmy Lu, Daniel S. Berger, Nathan Beckmann, G. Ganger","doi":"10.1145/3477132.3483568","DOIUrl":null,"url":null,"abstract":"Many social-media and IoT services have very large working sets consisting of billions of tiny (≈100 B) objects. Large, flash-based caches are important to serving these working sets at acceptable monetary cost. However, caching tiny objects on flash is challenging for two reasons: (i) SSDs can read/write data only in multi-KB \"pages\" that are much larger than a single object, stressing the limited number of times flash can be written; and (ii) very few bits per cached object can be kept in DRAM without losing flash's cost advantage. Unfortunately, existing flash-cache designs fall short of addressing these challenges: write-optimized designs require too much DRAM, and DRAM-optimized designs require too many flash writes. We present Kangaroo, a new flash-cache design that optimizes both DRAM usage and flash writes to maximize cache performance while minimizing cost. Kangaroo combines a large, set-associative cache with a small, log-structured cache. The set-associative cache requires minimal DRAM, while the log-structured cache minimizes Kangaroo's flash writes. Experiments using traces from Facebook and Twitter show that Kangaroo achieves DRAM usage close to the best prior DRAM-optimized design, flash writes close to the best prior write-optimized design, and miss ratios better than both. Kangaroo's design is Pareto-optimal across a range of allowed write rates, DRAM sizes, and flash sizes, reducing misses by 29% over the state of the art. These results are corroborated with a test deployment of Kangaroo in a production flash cache at Facebook.","PeriodicalId":38935,"journal":{"name":"Operating Systems Review (ACM)","volume":"17 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"Kangaroo: Caching Billions of Tiny Objects on Flash\",\"authors\":\"Sara McAllister, Benjamin Berg, Julian Tutuncu-Macias, Juncheng Yang, S. Gunasekar, Jimmy Lu, Daniel S. Berger, Nathan Beckmann, G. Ganger\",\"doi\":\"10.1145/3477132.3483568\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many social-media and IoT services have very large working sets consisting of billions of tiny (≈100 B) objects. Large, flash-based caches are important to serving these working sets at acceptable monetary cost. However, caching tiny objects on flash is challenging for two reasons: (i) SSDs can read/write data only in multi-KB \\\"pages\\\" that are much larger than a single object, stressing the limited number of times flash can be written; and (ii) very few bits per cached object can be kept in DRAM without losing flash's cost advantage. Unfortunately, existing flash-cache designs fall short of addressing these challenges: write-optimized designs require too much DRAM, and DRAM-optimized designs require too many flash writes. We present Kangaroo, a new flash-cache design that optimizes both DRAM usage and flash writes to maximize cache performance while minimizing cost. Kangaroo combines a large, set-associative cache with a small, log-structured cache. The set-associative cache requires minimal DRAM, while the log-structured cache minimizes Kangaroo's flash writes. Experiments using traces from Facebook and Twitter show that Kangaroo achieves DRAM usage close to the best prior DRAM-optimized design, flash writes close to the best prior write-optimized design, and miss ratios better than both. Kangaroo's design is Pareto-optimal across a range of allowed write rates, DRAM sizes, and flash sizes, reducing misses by 29% over the state of the art. These results are corroborated with a test deployment of Kangaroo in a production flash cache at Facebook.\",\"PeriodicalId\":38935,\"journal\":{\"name\":\"Operating Systems Review (ACM)\",\"volume\":\"17 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Operating Systems Review (ACM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3477132.3483568\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Operating Systems Review (ACM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3477132.3483568","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 19

摘要

许多社交媒体和物联网服务都有非常大的工作集，由数十亿个微小(≈100 B)对象组成。大型的基于闪存的缓存对于以可接受的货币成本为这些工作集提供服务非常重要。然而，在闪存上缓存微小对象是具有挑战性的，原因有两个:(i) ssd只能在多kb的“页”中读取/写入数据，这比单个对象大得多，强调闪存可以写入的次数有限;(ii)在不失去闪存的成本优势的情况下，每个缓存对象可以保存在DRAM中的比特很少。不幸的是，现有的闪存缓存设计无法解决这些挑战:写入优化设计需要太多的DRAM，而DRAM优化设计需要太多的闪存写入。我们提出袋鼠，一种新的闪存缓存设计，优化了DRAM的使用和闪存写入，以最大限度地提高缓存性能，同时最大限度地降低成本。Kangaroo结合了一个大的、集关联的缓存和一个小的、日志结构的缓存。集合关联缓存需要最少的DRAM，而日志结构缓存则最小化了Kangaroo的闪存写入。使用Facebook和Twitter的痕迹进行的实验表明，Kangaroo实现了接近最佳先前DRAM优化设计的DRAM使用率，接近最佳先前写入优化设计的闪存写入，并且缺失率优于两者。袋鼠的设计在允许的写入速率、DRAM大小和闪存大小范围内都是帕累托最优的，比目前的技术水平减少了29%的失误。这些结果在Facebook的生产快闪缓存中得到了袋鼠测试部署的证实。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Kangaroo: Caching Billions of Tiny Objects on Flash

Many social-media and IoT services have very large working sets consisting of billions of tiny (≈100 B) objects. Large, flash-based caches are important to serving these working sets at acceptable monetary cost. However, caching tiny objects on flash is challenging for two reasons: (i) SSDs can read/write data only in multi-KB "pages" that are much larger than a single object, stressing the limited number of times flash can be written; and (ii) very few bits per cached object can be kept in DRAM without losing flash's cost advantage. Unfortunately, existing flash-cache designs fall short of addressing these challenges: write-optimized designs require too much DRAM, and DRAM-optimized designs require too many flash writes. We present Kangaroo, a new flash-cache design that optimizes both DRAM usage and flash writes to maximize cache performance while minimizing cost. Kangaroo combines a large, set-associative cache with a small, log-structured cache. The set-associative cache requires minimal DRAM, while the log-structured cache minimizes Kangaroo's flash writes. Experiments using traces from Facebook and Twitter show that Kangaroo achieves DRAM usage close to the best prior DRAM-optimized design, flash writes close to the best prior write-optimized design, and miss ratios better than both. Kangaroo's design is Pareto-optimal across a range of allowed write rates, DRAM sizes, and flash sizes, reducing misses by 29% over the state of the art. These results are corroborated with a test deployment of Kangaroo in a production flash cache at Facebook.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Operating Systems Review (ACM) Computer Science-Computer Networks and Communications

CiteScore

2.80

自引率

0.00%

发文量

期刊介绍： Operating Systems Review (OSR) is a publication of the ACM Special Interest Group on Operating Systems (SIGOPS), whose scope of interest includes: computer operating systems and architecture for multiprogramming, multiprocessing, and time sharing; resource management; evaluation and simulation; reliability, integrity, and security of data; communications among computing processors; and computer system modeling and analysis.

期刊最新文献

Disaggregated GPU Acceleration for Serverless Applications Navigating Performance-Efficiency Tradeoffs in Serverless Computing: Deduplication to the Rescue! Using Local Cache Coherence for Disaggregated Memory Systems Make It Real: An End-to-End Implementation of A Physically Disaggregated Data Center Memory disaggregation: why now and what are the challenges