Alan Nair, Sandeep Kumar, Aravinda Prasad, Andy Rudoff, Sreenivas Subramoney
{"title":"望远镜:tb级遥测技术","authors":"Alan Nair, Sandeep Kumar, Aravinda Prasad, Andy Rudoff, Sreenivas Subramoney","doi":"arxiv-2311.10275","DOIUrl":null,"url":null,"abstract":"Data-hungry applications that require terabytes of memory have become\nwidespread in recent years. To meet the memory needs of these applications,\ndata centers are embracing tiered memory architectures with near and far memory\ntiers. Precise, efficient, and timely identification of hot and cold data and\ntheir placement in appropriate tiers is critical for performance in such\nsystems. Unfortunately, the existing state-of-the-art telemetry techniques for\nhot and cold data detection are ineffective at the terabyte scale. We propose Telescope, a novel technique that profiles different levels of the\napplication's page table tree for fast and efficient identification of hot and\ncold data. Telescope is based on the observation that, for a memory- and\nTLB-intensive workload, higher levels of a page table tree are also frequently\naccessed during a hardware page table walk. Hence, the hotness of the higher\nlevels of the page table tree essentially captures the hotness of its subtrees\nor address space sub-regions at a coarser granularity. We exploit this insight\nto quickly converge on even a few megabytes of hot data and efficiently\nidentify several gigabytes of cold data in terabyte-scale applications.\nImportantly, such a technique can seamlessly scale to petabyte-scale\napplications. Telescope's telemetry achieves 90%+ precision and recall at just 0.009%\nsingle CPU utilization for microbenchmarks with a 5 TB memory footprint. Memory\ntiering based on Telescope results in 5.6% to 34% throughput improvement for\nreal-world benchmarks with a 1-2 TB memory footprint compared to other\nstate-of-the-art telemetry techniques.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"23 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Telescope: Telemetry at Terabyte Scale\",\"authors\":\"Alan Nair, Sandeep Kumar, Aravinda Prasad, Andy Rudoff, Sreenivas Subramoney\",\"doi\":\"arxiv-2311.10275\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data-hungry applications that require terabytes of memory have become\\nwidespread in recent years. To meet the memory needs of these applications,\\ndata centers are embracing tiered memory architectures with near and far memory\\ntiers. Precise, efficient, and timely identification of hot and cold data and\\ntheir placement in appropriate tiers is critical for performance in such\\nsystems. Unfortunately, the existing state-of-the-art telemetry techniques for\\nhot and cold data detection are ineffective at the terabyte scale. We propose Telescope, a novel technique that profiles different levels of the\\napplication's page table tree for fast and efficient identification of hot and\\ncold data. Telescope is based on the observation that, for a memory- and\\nTLB-intensive workload, higher levels of a page table tree are also frequently\\naccessed during a hardware page table walk. Hence, the hotness of the higher\\nlevels of the page table tree essentially captures the hotness of its subtrees\\nor address space sub-regions at a coarser granularity. We exploit this insight\\nto quickly converge on even a few megabytes of hot data and efficiently\\nidentify several gigabytes of cold data in terabyte-scale applications.\\nImportantly, such a technique can seamlessly scale to petabyte-scale\\napplications. Telescope's telemetry achieves 90%+ precision and recall at just 0.009%\\nsingle CPU utilization for microbenchmarks with a 5 TB memory footprint. Memory\\ntiering based on Telescope results in 5.6% to 34% throughput improvement for\\nreal-world benchmarks with a 1-2 TB memory footprint compared to other\\nstate-of-the-art telemetry techniques.\",\"PeriodicalId\":501333,\"journal\":{\"name\":\"arXiv - CS - Operating Systems\",\"volume\":\"23 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-11-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Operating Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2311.10275\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Operating Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2311.10275","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Data-hungry applications that require terabytes of memory have become
widespread in recent years. To meet the memory needs of these applications,
data centers are embracing tiered memory architectures with near and far memory
tiers. Precise, efficient, and timely identification of hot and cold data and
their placement in appropriate tiers is critical for performance in such
systems. Unfortunately, the existing state-of-the-art telemetry techniques for
hot and cold data detection are ineffective at the terabyte scale. We propose Telescope, a novel technique that profiles different levels of the
application's page table tree for fast and efficient identification of hot and
cold data. Telescope is based on the observation that, for a memory- and
TLB-intensive workload, higher levels of a page table tree are also frequently
accessed during a hardware page table walk. Hence, the hotness of the higher
levels of the page table tree essentially captures the hotness of its subtrees
or address space sub-regions at a coarser granularity. We exploit this insight
to quickly converge on even a few megabytes of hot data and efficiently
identify several gigabytes of cold data in terabyte-scale applications.
Importantly, such a technique can seamlessly scale to petabyte-scale
applications. Telescope's telemetry achieves 90%+ precision and recall at just 0.009%
single CPU utilization for microbenchmarks with a 5 TB memory footprint. Memory
tiering based on Telescope results in 5.6% to 34% throughput improvement for
real-world benchmarks with a 1-2 TB memory footprint compared to other
state-of-the-art telemetry techniques.