亲爱的，我缩小了一致性目录

2011 International Conference on Parallel Architectures and Compilation Techniques Pub Date : 2011-10-10 DOI:10.1109/PACT.2011.10

Hongzhou Zhao, Arrvindh Shriraman, S. Dwarkadas, V. Srinivasan

{"title":"亲爱的，我缩小了一致性目录","authors":"Hongzhou Zhao, Arrvindh Shriraman, S. Dwarkadas, V. Srinivasan","doi":"10.1109/PACT.2011.10","DOIUrl":null,"url":null,"abstract":"One of the key scalability challenges of on-chip coherence in a multicore chip is the coherence directory, which provides information on sharing of cache blocks. Shadow tags that duplicate entire private cache tag arrays are widely used to minimize area overhead, but require an energy-intensive associative search to obtain the sharing information. Recent research proposed a Tagless directory, which uses bloom filters to summarize the tags in a cache set. The Tagless directory associates the sharing vector with the bloom filter buckets to completely eliminate the associative lookup and reduce the directory overhead. However, Tagless still uses a full map sharing vector to represent the sharing information, resulting in remaining area and energy challenges with increasing core counts. In this paper, we first show that due to the regular nature of applications, many bloom filters essentially replicate the same sharing pattern. We next exploit the pattern commonality and propose SPATL (Sharing-pattern based Tagless Directory). SPATL exploits the sharing pattern commonality to decouple the sharing patterns from the bloom filters and eliminates the redundant copies of sharing patterns. SPATL works with both inclusive and noninclusive shared caches and provides 34% storage savings over Tagless, the previous most storage-efficient directory, at 16 cores. We study multiple strategies to periodically eliminate the false sharing that comes from combining sharing pattern compression with Tagless, and demonstrate that SPATL can achieve the same level of false sharers as Tagless with 5% extra bandwidth. Finally, we demonstrate that SPATL scales even better than an idealized directory and can support 1024-core chips with less than 1% of the private cache space for data parallel applications.","PeriodicalId":106423,"journal":{"name":"2011 International Conference on Parallel Architectures and Compilation Techniques","volume":"325 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"56","resultStr":"{\"title\":\"SPATL: Honey, I Shrunk the Coherence Directory\",\"authors\":\"Hongzhou Zhao, Arrvindh Shriraman, S. Dwarkadas, V. Srinivasan\",\"doi\":\"10.1109/PACT.2011.10\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One of the key scalability challenges of on-chip coherence in a multicore chip is the coherence directory, which provides information on sharing of cache blocks. Shadow tags that duplicate entire private cache tag arrays are widely used to minimize area overhead, but require an energy-intensive associative search to obtain the sharing information. Recent research proposed a Tagless directory, which uses bloom filters to summarize the tags in a cache set. The Tagless directory associates the sharing vector with the bloom filter buckets to completely eliminate the associative lookup and reduce the directory overhead. However, Tagless still uses a full map sharing vector to represent the sharing information, resulting in remaining area and energy challenges with increasing core counts. In this paper, we first show that due to the regular nature of applications, many bloom filters essentially replicate the same sharing pattern. We next exploit the pattern commonality and propose SPATL (Sharing-pattern based Tagless Directory). SPATL exploits the sharing pattern commonality to decouple the sharing patterns from the bloom filters and eliminates the redundant copies of sharing patterns. SPATL works with both inclusive and noninclusive shared caches and provides 34% storage savings over Tagless, the previous most storage-efficient directory, at 16 cores. We study multiple strategies to periodically eliminate the false sharing that comes from combining sharing pattern compression with Tagless, and demonstrate that SPATL can achieve the same level of false sharers as Tagless with 5% extra bandwidth. Finally, we demonstrate that SPATL scales even better than an idealized directory and can support 1024-core chips with less than 1% of the private cache space for data parallel applications.\",\"PeriodicalId\":106423,\"journal\":{\"name\":\"2011 International Conference on Parallel Architectures and Compilation Techniques\",\"volume\":\"325 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"56\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 International Conference on Parallel Architectures and Compilation Techniques\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PACT.2011.10\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Parallel Architectures and Compilation Techniques","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PACT.2011.10","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 56

摘要

在多核芯片中实现片上一致性的关键可扩展性挑战之一是一致性目录，它提供了关于缓存块共享的信息。复制整个私有缓存标记数组的影子标记被广泛用于最小化区域开销，但需要能量密集的关联搜索来获取共享信息。最近的研究提出了一种无标签目录，它使用bloom过滤器来总结缓存集中的标签。无标签目录将共享向量与bloom过滤器桶关联起来，以完全消除关联查找并减少目录开销。然而，Tagless仍然使用完整的地图共享向量来表示共享信息，导致随着核心数量的增加，剩余面积和能量的挑战。在本文中，我们首先表明，由于应用程序的规则性质，许多bloom过滤器本质上复制相同的共享模式。接下来，我们利用模式通用性，提出了基于共享模式的无标签目录(SPATL)。SPATL利用共享模式的通用性将共享模式与布隆过滤器解耦，并消除了共享模式的冗余副本。SPATL可以与包含和不包含的共享缓存一起工作，并且比Tagless(以前存储效率最高的目录)节省34%的存储，只有16核。研究了将共享模式压缩与Tagless相结合所产生的周期性错误共享的多种策略，并证明SPATL可以在额外5%的带宽下实现与Tagless相同级别的错误共享。最后，我们证明了SPATL的可伸缩性甚至比理想的目录更好，并且可以用不到1%的私有缓存空间为数据并行应用程序支持1024核芯片。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

SPATL: Honey, I Shrunk the Coherence Directory

One of the key scalability challenges of on-chip coherence in a multicore chip is the coherence directory, which provides information on sharing of cache blocks. Shadow tags that duplicate entire private cache tag arrays are widely used to minimize area overhead, but require an energy-intensive associative search to obtain the sharing information. Recent research proposed a Tagless directory, which uses bloom filters to summarize the tags in a cache set. The Tagless directory associates the sharing vector with the bloom filter buckets to completely eliminate the associative lookup and reduce the directory overhead. However, Tagless still uses a full map sharing vector to represent the sharing information, resulting in remaining area and energy challenges with increasing core counts. In this paper, we first show that due to the regular nature of applications, many bloom filters essentially replicate the same sharing pattern. We next exploit the pattern commonality and propose SPATL (Sharing-pattern based Tagless Directory). SPATL exploits the sharing pattern commonality to decouple the sharing patterns from the bloom filters and eliminates the redundant copies of sharing patterns. SPATL works with both inclusive and noninclusive shared caches and provides 34% storage savings over Tagless, the previous most storage-efficient directory, at 16 cores. We study multiple strategies to periodically eliminate the false sharing that comes from combining sharing pattern compression with Tagless, and demonstrate that SPATL can achieve the same level of false sharers as Tagless with 5% extra bandwidth. Finally, we demonstrate that SPATL scales even better than an idealized directory and can support 1024-core chips with less than 1% of the private cache space for data parallel applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 International Conference on Parallel Architectures and Compilation Techniques

自引率

0.00%

发文量