Skip TLB flushes for reused pages within mmap's

arXiv - CS - Operating Systems Pub Date : 2024-09-17 DOI:arxiv-2409.10946

Frederic Schimmelpfennig, André Brinkmann, Hossein Asadi, Reza Salkhordeh

{"title":"Skip TLB flushes for reused pages within mmap's","authors":"Frederic Schimmelpfennig, André Brinkmann, Hossein Asadi, Reza Salkhordeh","doi":"arxiv-2409.10946","DOIUrl":null,"url":null,"abstract":"Memory access efficiency is significantly enhanced by caching recent address\ntranslations in the CPUs' Translation Lookaside Buffers (TLBs). However, since\nthe operating system is not aware of which core is using a particular mapping,\nit flushes TLB entries across all cores where the application runs whenever\naddresses are unmapped, ensuring security and consistency. These TLB flushes,\nknown as TLB shootdowns, are costly and create a performance and scalability\nbottleneck. A key contributor to TLB shootdowns is memory-mapped I/O,\nparticularly during mmap-munmap cycles and page cache evictions. Often, the\nsame physical pages are reassigned to the same process post-eviction,\npresenting an opportunity for the operating system to reduce the frequency of\nTLB shootdowns. We demonstrate, that by slightly extending the mmap function,\nTLB shootdowns for these \"recycled pages\" can be avoided. Therefore we introduce and implement the \"fast page recycling\" (FPR) feature\nwithin the mmap system call. FPR-mmaps maintain security by only triggering TLB\nshootdowns when a page exits its recycling cycle and is allocated to a\ndifferent process. To ensure consistency when FPR-mmap pointers are used, we\nmade minor adjustments to virtual memory management to avoid the ABA problem.\nUnlike previous methods to mitigate shootdown effects, our approach does not\nrequire any hardware modifications and operates transparently within the\nexisting Linux virtual memory framework. Our evaluations across a variety of CPU, memory, and storage setups,\nincluding persistent memory and Optane SSDs, demonstrate that FPR delivers\nnotable performance gains, with improvements of up to 28% in real-world\napplications and 92% in micro-benchmarks. Additionally, we show that TLB\nshootdowns are a significant source of bottlenecks, previously misattributed to\nother components of the Linux kernel.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"14 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Operating Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10946","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Memory access efficiency is significantly enhanced by caching recent address translations in the CPUs' Translation Lookaside Buffers (TLBs). However, since the operating system is not aware of which core is using a particular mapping, it flushes TLB entries across all cores where the application runs whenever addresses are unmapped, ensuring security and consistency. These TLB flushes, known as TLB shootdowns, are costly and create a performance and scalability bottleneck. A key contributor to TLB shootdowns is memory-mapped I/O, particularly during mmap-munmap cycles and page cache evictions. Often, the same physical pages are reassigned to the same process post-eviction, presenting an opportunity for the operating system to reduce the frequency of TLB shootdowns. We demonstrate, that by slightly extending the mmap function, TLB shootdowns for these "recycled pages" can be avoided. Therefore we introduce and implement the "fast page recycling" (FPR) feature within the mmap system call. FPR-mmaps maintain security by only triggering TLB shootdowns when a page exits its recycling cycle and is allocated to a different process. To ensure consistency when FPR-mmap pointers are used, we made minor adjustments to virtual memory management to avoid the ABA problem. Unlike previous methods to mitigate shootdown effects, our approach does not require any hardware modifications and operates transparently within the existing Linux virtual memory framework. Our evaluations across a variety of CPU, memory, and storage setups, including persistent memory and Optane SSDs, demonstrate that FPR delivers notable performance gains, with improvements of up to 28% in real-world applications and 92% in micro-benchmarks. Additionally, we show that TLB shootdowns are a significant source of bottlenecks, previously misattributed to other components of the Linux kernel.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

跳过在 mmap 的

通过将最近的地址映射缓存在 CPU 的转换旁路缓冲区（TLB）中，可以大大提高内存访问效率。但是，由于操作系统不知道哪个内核在使用特定映射，因此每当地址未映射时，操作系统就会在应用程序运行的所有内核中刷新 TLB 条目，以确保安全性和一致性。这些 TLB 刷新（称为 TLB 崩溃）代价高昂，会造成性能和可扩展性瓶颈。造成 TLB 崩溃的一个关键因素是内存映射 I/O，特别是在毫米映射-单元映射周期和页面缓存驱逐期间。通常，相同的物理页面会在驱逐后重新分配给同一个进程，这为操作系统降低 TLB 崩溃频率提供了机会。我们证明，只要稍微扩展一下 mmap 功能，就能避免这些 "回收页面 "的 TLB 崩溃。因此，我们在 mmap 系统调用中引入并实现了 "快速页面回收"（FPR）功能。FPR 映射仅在页面退出循环并分配给不同进程时才触发 TLB 崩溃，从而保持了安全性。为了确保使用 FPR-map 指针时的一致性，我们对虚拟内存管理进行了细微调整，以避免 ABA 问题。我们对包括持久内存和 Optane SSD 在内的各种 CPU、内存和存储设置进行了评估，结果表明，FPR 带来了显著的性能提升，在实际应用中提升了 28%，在微基准测试中提升了 92%。此外，我们还证明了 TLBshootdowns 是瓶颈的一个重要来源，以前曾被错误地归咎于 Linux 内核的其他组件。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Operating Systems

自引率

0.00%

发文量

期刊最新文献

Analysis of Synchronization Mechanisms in Operating Systems Skip TLB flushes for reused pages within mmap's eBPF-mm: Userspace-guided memory management in Linux with eBPF BULKHEAD: Secure, Scalable, and Efficient Kernel Compartmentalization with PKS Rethinking Programmed I/O for Fast Devices, Cheap Cores, and Coherent Interconnects