Adaptive insertion policies for managing shared caches

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT) Pub Date : 2008-10-25 DOI:10.1145/1454115.1454145

A. Jaleel, William Hasenplaugh, Moinuddin K. Qureshi, Julien Sébot, S. Steely, J. Emer

{"title":"Adaptive insertion policies for managing shared caches","authors":"A. Jaleel, William Hasenplaugh, Moinuddin K. Qureshi, Julien Sébot, S. Steely, J. Emer","doi":"10.1145/1454115.1454145","DOIUrl":null,"url":null,"abstract":"Chip Multiprocessors (CMPs) allow different applications to concurrently execute on a single chip. When applications with differing demands for memory compete for a shared cache, the conventional LRU replacement policy can significantly degrade cache performance when the aggregate working set size is greater than the shared cache. In such cases, shared cache performance can be significantly improved by preserving the entire working set of applications that can co-exist in the cache and preserving some portion of the working set of the remaining applications. This paper investigates the use of adaptive insertion policies to manage shared caches. We show that directly extending the recently proposed dynamic insertion policy (DIP) is inadequate for shared caches since DIP is unaware of the characteristics of individual applications. We propose Thread-Aware Dynamic Insertion Policy (TADIP) that can take into account the memory requirements of each of the concurrently executing applications. Our evaluation with multi-programmed workloads for 2-core, 4-core, 8-core, and 16-core CMPs show that a TADIP-managed shared cache improves overall throughput by as much as 94%, 64%, 26%, and 16% respectively (on average 14%, 18%, 15%, and 17%) over the baseline LRU policy. The performance benefit of TADIP is 2.6x compared to DIP and 1.3x compared to the recently proposed Utility-based Cache Partitioning (UCP) scheme. We also show that a TADIP-managed shared cache provides performance benefits similar to doubling the size of an LRU-managed cache. Furthermore, TADIP requires a total storage overhead of less than two bytes per core, does not require changes to the existing cache structure, and performs similar to LRU for LRU friendly workloads.","PeriodicalId":186773,"journal":{"name":"2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"328","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1454115.1454145","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 328

Abstract

Chip Multiprocessors (CMPs) allow different applications to concurrently execute on a single chip. When applications with differing demands for memory compete for a shared cache, the conventional LRU replacement policy can significantly degrade cache performance when the aggregate working set size is greater than the shared cache. In such cases, shared cache performance can be significantly improved by preserving the entire working set of applications that can co-exist in the cache and preserving some portion of the working set of the remaining applications. This paper investigates the use of adaptive insertion policies to manage shared caches. We show that directly extending the recently proposed dynamic insertion policy (DIP) is inadequate for shared caches since DIP is unaware of the characteristics of individual applications. We propose Thread-Aware Dynamic Insertion Policy (TADIP) that can take into account the memory requirements of each of the concurrently executing applications. Our evaluation with multi-programmed workloads for 2-core, 4-core, 8-core, and 16-core CMPs show that a TADIP-managed shared cache improves overall throughput by as much as 94%, 64%, 26%, and 16% respectively (on average 14%, 18%, 15%, and 17%) over the baseline LRU policy. The performance benefit of TADIP is 2.6x compared to DIP and 1.3x compared to the recently proposed Utility-based Cache Partitioning (UCP) scheme. We also show that a TADIP-managed shared cache provides performance benefits similar to doubling the size of an LRU-managed cache. Furthermore, TADIP requires a total storage overhead of less than two bytes per core, does not require changes to the existing cache structure, and performs similar to LRU for LRU friendly workloads.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于管理共享缓存的自适应插入策略

芯片多处理器(cmp)允许不同的应用程序在单个芯片上并发执行。当对内存有不同需求的应用程序竞争共享缓存时，当聚合工作集的大小大于共享缓存时，传统的LRU替换策略会显著降低缓存性能。在这种情况下，通过保留可以共存于缓存中的整个应用程序工作集并保留其余应用程序工作集的某些部分，可以显著提高共享缓存性能。本文研究了使用自适应插入策略来管理共享缓存。我们表明，直接扩展最近提出的动态插入策略(DIP)对于共享缓存是不够的，因为DIP不知道单个应用程序的特征。我们提出了线程感知动态插入策略(TADIP)，它可以考虑每个并发执行的应用程序的内存需求。我们对2核、4核、8核和16核cmp的多编程工作负载进行的评估表明，与基线LRU策略相比，tadip管理的共享缓存将总吞吐量分别提高了94%、64%、26%和16%(平均为14%、18%、15%和17%)。与DIP相比，TADIP的性能优势是2.6倍，与最近提出的基于实用程序的缓存分区(Utility-based Cache Partitioning, UCP)方案相比，它的性能优势是1.3倍。我们还展示了tadip管理的共享缓存提供的性能优势，类似于将lru管理的缓存的大小增加一倍。此外，TADIP需要每个核心的总存储开销小于两个字节，不需要更改现有的缓存结构，并且对于LRU友好的工作负载执行类似于LRU。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)

自引率

0.00%

发文量

期刊最新文献

Meeting points: Using thread criticality to adapt multicore hardware to parallel regions COMIC: A coherent shared memory interface for cell BE Pangaea: A tightly-coupled IA32 heterogeneous chip multiprocessor Multi-mode energy management for multi-tier server clusters MCAMP: Communication optimization on Massively Parallel Machines with hierarchical scratch-pad memory