使用布隆过滤器运行提前缓存失败

2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT) Pub Date : 2016-12-01 DOI:10.1109/PDCAT.2016.017

Xi Tao, Qi Zeng, J. Peir, Shih-Lien Lu

{"title":"使用布隆过滤器运行提前缓存失败","authors":"Xi Tao, Qi Zeng, J. Peir, Shih-Lien Lu","doi":"10.1109/PDCAT.2016.017","DOIUrl":null,"url":null,"abstract":"In order to hide long memory latency and alleviate memory bandwidth requirement, a fourth-level cache (L4) is introduced in modern high-performance multi-core systems for supporting parallel computation. However, additional cache level causes higher cache miss penalty since a request needs to go through all levels of caches to reach to the main memory. In this paper, we introduce a new way of using a Bloom Filter (BF) to predict cache misses at any cache level in a multicore system. These misses can runahead to access lower-level caches or memory to reduce the miss penalty. The proposed hashing scheme extends the cache index of the target set and uses it for accessing the BF array to avoid counters in the BF array. Performance evaluation using a set of SPEC2006 benchmarks on 8-core systems with 4-level cache hierarchy shows that using a BF for the third-level (L3) cache to filter and runahead L3 misses, the IPCs can be improved by 4-20% with an average 10.5%. In comparison with the delay-recalibration scheme, the improvement is 3.5-4.8%.","PeriodicalId":203925,"journal":{"name":"2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Runahead Cache Misses Using Bloom Filter\",\"authors\":\"Xi Tao, Qi Zeng, J. Peir, Shih-Lien Lu\",\"doi\":\"10.1109/PDCAT.2016.017\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In order to hide long memory latency and alleviate memory bandwidth requirement, a fourth-level cache (L4) is introduced in modern high-performance multi-core systems for supporting parallel computation. However, additional cache level causes higher cache miss penalty since a request needs to go through all levels of caches to reach to the main memory. In this paper, we introduce a new way of using a Bloom Filter (BF) to predict cache misses at any cache level in a multicore system. These misses can runahead to access lower-level caches or memory to reduce the miss penalty. The proposed hashing scheme extends the cache index of the target set and uses it for accessing the BF array to avoid counters in the BF array. Performance evaluation using a set of SPEC2006 benchmarks on 8-core systems with 4-level cache hierarchy shows that using a BF for the third-level (L3) cache to filter and runahead L3 misses, the IPCs can be improved by 4-20% with an average 10.5%. In comparison with the delay-recalibration scheme, the improvement is 3.5-4.8%.\",\"PeriodicalId\":203925,\"journal\":{\"name\":\"2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)\",\"volume\":\"87 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDCAT.2016.017\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDCAT.2016.017","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

为了隐藏较长的内存延迟和降低内存带宽需求，在现代高性能多核系统中引入了支持并行计算的第四级缓存(L4)。但是，额外的缓存级别会导致更高的缓存丢失损失，因为请求需要经过所有级别的缓存才能到达主内存。本文介绍了一种利用布隆滤波器(BF)来预测多核系统中任意缓存级别的缓存缺失的新方法。这些丢失可以提前运行以访问低级缓存或内存，以减少丢失的损失。提出的散列方案扩展了目标集的缓存索引，并使用它来访问BF数组，以避免BF数组中的计数器。在具有4级缓存层次结构的8核系统上使用一组SPEC2006基准测试进行性能评估表明，在第三级(L3)缓存中使用BF来过滤和提前运行L3错误，ipc可以提高4-20%，平均提高10.5%。与延迟重新校准方案相比，改进幅度为3.5 ~ 4.8%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Runahead Cache Misses Using Bloom Filter

In order to hide long memory latency and alleviate memory bandwidth requirement, a fourth-level cache (L4) is introduced in modern high-performance multi-core systems for supporting parallel computation. However, additional cache level causes higher cache miss penalty since a request needs to go through all levels of caches to reach to the main memory. In this paper, we introduce a new way of using a Bloom Filter (BF) to predict cache misses at any cache level in a multicore system. These misses can runahead to access lower-level caches or memory to reduce the miss penalty. The proposed hashing scheme extends the cache index of the target set and uses it for accessing the BF array to avoid counters in the BF array. Performance evaluation using a set of SPEC2006 benchmarks on 8-core systems with 4-level cache hierarchy shows that using a BF for the third-level (L3) cache to filter and runahead L3 misses, the IPCs can be improved by 4-20% with an average 10.5%. In comparison with the delay-recalibration scheme, the improvement is 3.5-4.8%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)

自引率

0.00%

发文量