ALLARM:为线程本地数据优化稀疏目录

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2014-03-24 DOI:10.7873/DATE.2014.091

Amitabha Roy, Timothy M. Jones

{"title":"ALLARM:为线程本地数据优化稀疏目录","authors":"Amitabha Roy, Timothy M. Jones","doi":"10.7873/DATE.2014.091","DOIUrl":null,"url":null,"abstract":"Large-scale cache-coherent systems often impose unnecessary overhead on data that is thread-private for the whole of its lifetime. These include resources devoted to tracking the coherence state of the data, as well as unnecessary coherence messages sent out over the interconnect. In this paper we show how the memory allocation strategy for non-uniform memory access (NUMA) systems can be exploited to remove any coherence-related traffic for thread-local data, as well removing the need to track those cache lines in sparse directories. Our strategy is to allocate directory state only on a miss from a node in a different affinity domain from the directory. We call this ALLocAte on Remote Miss, or ALLARM. Our solution is entirely backward compatible with existing operating systems and software, and provides a means to scale cache coherence into the many-core era. On a mix of SPLASH2 and Parsec workloads, ALLARM is able to improve performance by 13% on average while reducing dynamic energy consumption by 9% in the on-chip network and 15% in the directory controller. This is achieved through a 46% reduction in the number of sparse directory entries evicted.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"8 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"ALLARM: Optimizing sparse directories for thread-local data\",\"authors\":\"Amitabha Roy, Timothy M. Jones\",\"doi\":\"10.7873/DATE.2014.091\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large-scale cache-coherent systems often impose unnecessary overhead on data that is thread-private for the whole of its lifetime. These include resources devoted to tracking the coherence state of the data, as well as unnecessary coherence messages sent out over the interconnect. In this paper we show how the memory allocation strategy for non-uniform memory access (NUMA) systems can be exploited to remove any coherence-related traffic for thread-local data, as well removing the need to track those cache lines in sparse directories. Our strategy is to allocate directory state only on a miss from a node in a different affinity domain from the directory. We call this ALLocAte on Remote Miss, or ALLARM. Our solution is entirely backward compatible with existing operating systems and software, and provides a means to scale cache coherence into the many-core era. On a mix of SPLASH2 and Parsec workloads, ALLARM is able to improve performance by 13% on average while reducing dynamic energy consumption by 9% in the on-chip network and 15% in the directory controller. This is achieved through a 46% reduction in the number of sparse directory entries evicted.\",\"PeriodicalId\":6550,\"journal\":{\"name\":\"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)\",\"volume\":\"8 1\",\"pages\":\"1-6\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-03-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.7873/DATE.2014.091\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7873/DATE.2014.091","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

大规模缓存一致的系统通常会对在整个生命周期内都是线程私有的数据施加不必要的开销。这包括用于跟踪数据相干状态的资源，以及通过互连发送的不必要的相干消息。在本文中，我们展示了如何利用非统一内存访问(NUMA)系统的内存分配策略来消除线程本地数据的任何与一致性相关的流量，以及消除在稀疏目录中跟踪这些缓存线的需要。我们的策略是，仅在从与目录不同的关联域中的节点丢失时才分配目录状态。我们将此称为对远程错过分配，或ALLARM。我们的解决方案完全向后兼容现有的操作系统和软件，并提供了一种将缓存一致性扩展到多核时代的方法。在SPLASH2和Parsec工作负载的混合情况下，ALLARM能够平均提高13%的性能，同时在片上网络中减少9%的动态能耗，在目录控制器中减少15%的动态能耗。这是通过减少46%的稀疏目录条目数量来实现的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

ALLARM: Optimizing sparse directories for thread-local data

Large-scale cache-coherent systems often impose unnecessary overhead on data that is thread-private for the whole of its lifetime. These include resources devoted to tracking the coherence state of the data, as well as unnecessary coherence messages sent out over the interconnect. In this paper we show how the memory allocation strategy for non-uniform memory access (NUMA) systems can be exploited to remove any coherence-related traffic for thread-local data, as well removing the need to track those cache lines in sparse directories. Our strategy is to allocate directory state only on a miss from a node in a different affinity domain from the directory. We call this ALLocAte on Remote Miss, or ALLARM. Our solution is entirely backward compatible with existing operating systems and software, and provides a means to scale cache coherence into the many-core era. On a mix of SPLASH2 and Parsec workloads, ALLARM is able to improve performance by 13% on average while reducing dynamic energy consumption by 9% in the on-chip network and 15% in the directory controller. This is achieved through a 46% reduction in the number of sparse directory entries evicted.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)

自引率

0.00%

发文量

期刊最新文献

Simple interpolants for linear arithmetic Modeling steep slope devices: From circuits to architectures Software-based Pauli tracking in fault-tolerant quantum circuits Using guided local search for adaptive resource reservation in large-scale embedded systems Emulation-based robustness assessment for automotive smart-power ICs