Telescope: Telemetry at Terabyte Scale

arXiv - CS - Operating Systems Pub Date : 2023-11-17 DOI:arxiv-2311.10275

Alan Nair, Sandeep Kumar, Aravinda Prasad, Andy Rudoff, Sreenivas Subramoney

{"title":"Telescope: Telemetry at Terabyte Scale","authors":"Alan Nair, Sandeep Kumar, Aravinda Prasad, Andy Rudoff, Sreenivas Subramoney","doi":"arxiv-2311.10275","DOIUrl":null,"url":null,"abstract":"Data-hungry applications that require terabytes of memory have become\nwidespread in recent years. To meet the memory needs of these applications,\ndata centers are embracing tiered memory architectures with near and far memory\ntiers. Precise, efficient, and timely identification of hot and cold data and\ntheir placement in appropriate tiers is critical for performance in such\nsystems. Unfortunately, the existing state-of-the-art telemetry techniques for\nhot and cold data detection are ineffective at the terabyte scale. We propose Telescope, a novel technique that profiles different levels of the\napplication's page table tree for fast and efficient identification of hot and\ncold data. Telescope is based on the observation that, for a memory- and\nTLB-intensive workload, higher levels of a page table tree are also frequently\naccessed during a hardware page table walk. Hence, the hotness of the higher\nlevels of the page table tree essentially captures the hotness of its subtrees\nor address space sub-regions at a coarser granularity. We exploit this insight\nto quickly converge on even a few megabytes of hot data and efficiently\nidentify several gigabytes of cold data in terabyte-scale applications.\nImportantly, such a technique can seamlessly scale to petabyte-scale\napplications. Telescope's telemetry achieves 90%+ precision and recall at just 0.009%\nsingle CPU utilization for microbenchmarks with a 5 TB memory footprint. Memory\ntiering based on Telescope results in 5.6% to 34% throughput improvement for\nreal-world benchmarks with a 1-2 TB memory footprint compared to other\nstate-of-the-art telemetry techniques.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"23 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Operating Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2311.10275","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Data-hungry applications that require terabytes of memory have become widespread in recent years. To meet the memory needs of these applications, data centers are embracing tiered memory architectures with near and far memory tiers. Precise, efficient, and timely identification of hot and cold data and their placement in appropriate tiers is critical for performance in such systems. Unfortunately, the existing state-of-the-art telemetry techniques for hot and cold data detection are ineffective at the terabyte scale. We propose Telescope, a novel technique that profiles different levels of the application's page table tree for fast and efficient identification of hot and cold data. Telescope is based on the observation that, for a memory- and TLB-intensive workload, higher levels of a page table tree are also frequently accessed during a hardware page table walk. Hence, the hotness of the higher levels of the page table tree essentially captures the hotness of its subtrees or address space sub-regions at a coarser granularity. We exploit this insight to quickly converge on even a few megabytes of hot data and efficiently identify several gigabytes of cold data in terabyte-scale applications. Importantly, such a technique can seamlessly scale to petabyte-scale applications. Telescope's telemetry achieves 90%+ precision and recall at just 0.009% single CPU utilization for microbenchmarks with a 5 TB memory footprint. Memory tiering based on Telescope results in 5.6% to 34% throughput improvement for real-world benchmarks with a 1-2 TB memory footprint compared to other state-of-the-art telemetry techniques.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

望远镜:tb级遥测技术

近年来，需要tb内存的数据饥渴型应用程序变得非常普遍。为了满足这些应用程序的内存需求，数据中心正在采用具有近内存层和远内存层的分层内存架构。精确、高效、及时地识别热数据和冷数据，并将其放置在适当的层中，对于此类系统的性能至关重要。不幸的是，现有的最先进的遥测技术用于热数据和冷数据检测在太字节规模上是无效的。我们提出了Telescope，这是一种新颖的技术，它描述了应用程序页面表树的不同层次，以便快速有效地识别热数据和冷数据。Telescope基于以下观察:对于内存和tlb密集型工作负载，在硬件页表遍历期间也经常访问页表树的更高级别。因此，页表树较高层的热度实际上是以较粗粒度捕获其子树或地址空间子区域的热度。我们利用这种洞察力，快速地集中在几兆字节的热数据上，并在tb级应用程序中有效地识别几兆字节的冷数据。重要的是，这种技术可以无缝地扩展到pb级的应用程序。在5 TB内存占用的微基准测试中，Telescope的遥测技术实现了90%以上的精度和仅0.009%的单CPU利用率。与其他最先进的遥测技术相比，在1-2 TB内存占用的实际基准测试中，基于Telescope的内存分层使吞吐量提高了5.6%至34%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Operating Systems

自引率

0.00%

发文量

期刊最新文献

Analysis of Synchronization Mechanisms in Operating Systems Skip TLB flushes for reused pages within mmap's eBPF-mm: Userspace-guided memory management in Linux with eBPF BULKHEAD: Secure, Scalable, and Efficient Kernel Compartmentalization with PKS Rethinking Programmed I/O for Fast Devices, Cheap Cores, and Coherent Interconnects