Telescope: Telemetry at Terabyte Scale

Alan Nair, Sandeep Kumar, Aravinda Prasad, Andy Rudoff, Sreenivas Subramoney
{"title":"Telescope: Telemetry at Terabyte Scale","authors":"Alan Nair, Sandeep Kumar, Aravinda Prasad, Andy Rudoff, Sreenivas Subramoney","doi":"arxiv-2311.10275","DOIUrl":null,"url":null,"abstract":"Data-hungry applications that require terabytes of memory have become\nwidespread in recent years. To meet the memory needs of these applications,\ndata centers are embracing tiered memory architectures with near and far memory\ntiers. Precise, efficient, and timely identification of hot and cold data and\ntheir placement in appropriate tiers is critical for performance in such\nsystems. Unfortunately, the existing state-of-the-art telemetry techniques for\nhot and cold data detection are ineffective at the terabyte scale. We propose Telescope, a novel technique that profiles different levels of the\napplication's page table tree for fast and efficient identification of hot and\ncold data. Telescope is based on the observation that, for a memory- and\nTLB-intensive workload, higher levels of a page table tree are also frequently\naccessed during a hardware page table walk. Hence, the hotness of the higher\nlevels of the page table tree essentially captures the hotness of its subtrees\nor address space sub-regions at a coarser granularity. We exploit this insight\nto quickly converge on even a few megabytes of hot data and efficiently\nidentify several gigabytes of cold data in terabyte-scale applications.\nImportantly, such a technique can seamlessly scale to petabyte-scale\napplications. Telescope's telemetry achieves 90%+ precision and recall at just 0.009%\nsingle CPU utilization for microbenchmarks with a 5 TB memory footprint. Memory\ntiering based on Telescope results in 5.6% to 34% throughput improvement for\nreal-world benchmarks with a 1-2 TB memory footprint compared to other\nstate-of-the-art telemetry techniques.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"23 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Operating Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2311.10275","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Data-hungry applications that require terabytes of memory have become widespread in recent years. To meet the memory needs of these applications, data centers are embracing tiered memory architectures with near and far memory tiers. Precise, efficient, and timely identification of hot and cold data and their placement in appropriate tiers is critical for performance in such systems. Unfortunately, the existing state-of-the-art telemetry techniques for hot and cold data detection are ineffective at the terabyte scale. We propose Telescope, a novel technique that profiles different levels of the application's page table tree for fast and efficient identification of hot and cold data. Telescope is based on the observation that, for a memory- and TLB-intensive workload, higher levels of a page table tree are also frequently accessed during a hardware page table walk. Hence, the hotness of the higher levels of the page table tree essentially captures the hotness of its subtrees or address space sub-regions at a coarser granularity. We exploit this insight to quickly converge on even a few megabytes of hot data and efficiently identify several gigabytes of cold data in terabyte-scale applications. Importantly, such a technique can seamlessly scale to petabyte-scale applications. Telescope's telemetry achieves 90%+ precision and recall at just 0.009% single CPU utilization for microbenchmarks with a 5 TB memory footprint. Memory tiering based on Telescope results in 5.6% to 34% throughput improvement for real-world benchmarks with a 1-2 TB memory footprint compared to other state-of-the-art telemetry techniques.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
望远镜:tb级遥测技术
近年来,需要tb内存的数据饥渴型应用程序变得非常普遍。为了满足这些应用程序的内存需求,数据中心正在采用具有近内存层和远内存层的分层内存架构。精确、高效、及时地识别热数据和冷数据,并将其放置在适当的层中,对于此类系统的性能至关重要。不幸的是,现有的最先进的遥测技术用于热数据和冷数据检测在太字节规模上是无效的。我们提出了Telescope,这是一种新颖的技术,它描述了应用程序页面表树的不同层次,以便快速有效地识别热数据和冷数据。Telescope基于以下观察:对于内存和tlb密集型工作负载,在硬件页表遍历期间也经常访问页表树的更高级别。因此,页表树较高层的热度实际上是以较粗粒度捕获其子树或地址空间子区域的热度。我们利用这种洞察力,快速地集中在几兆字节的热数据上,并在tb级应用程序中有效地识别几兆字节的冷数据。重要的是,这种技术可以无缝地扩展到pb级的应用程序。在5 TB内存占用的微基准测试中,Telescope的遥测技术实现了90%以上的精度和仅0.009%的单CPU利用率。与其他最先进的遥测技术相比,在1-2 TB内存占用的实际基准测试中,基于Telescope的内存分层使吞吐量提高了5.6%至34%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Analysis of Synchronization Mechanisms in Operating Systems Skip TLB flushes for reused pages within mmap's eBPF-mm: Userspace-guided memory management in Linux with eBPF BULKHEAD: Secure, Scalable, and Efficient Kernel Compartmentalization with PKS Rethinking Programmed I/O for Fast Devices, Cheap Cores, and Coherent Interconnects
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1