{"title":"使用布隆过滤器运行提前缓存失败","authors":"Xi Tao, Qi Zeng, J. Peir, Shih-Lien Lu","doi":"10.1109/PDCAT.2016.017","DOIUrl":null,"url":null,"abstract":"In order to hide long memory latency and alleviate memory bandwidth requirement, a fourth-level cache (L4) is introduced in modern high-performance multi-core systems for supporting parallel computation. However, additional cache level causes higher cache miss penalty since a request needs to go through all levels of caches to reach to the main memory. In this paper, we introduce a new way of using a Bloom Filter (BF) to predict cache misses at any cache level in a multicore system. These misses can runahead to access lower-level caches or memory to reduce the miss penalty. The proposed hashing scheme extends the cache index of the target set and uses it for accessing the BF array to avoid counters in the BF array. Performance evaluation using a set of SPEC2006 benchmarks on 8-core systems with 4-level cache hierarchy shows that using a BF for the third-level (L3) cache to filter and runahead L3 misses, the IPCs can be improved by 4-20% with an average 10.5%. In comparison with the delay-recalibration scheme, the improvement is 3.5-4.8%.","PeriodicalId":203925,"journal":{"name":"2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Runahead Cache Misses Using Bloom Filter\",\"authors\":\"Xi Tao, Qi Zeng, J. Peir, Shih-Lien Lu\",\"doi\":\"10.1109/PDCAT.2016.017\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In order to hide long memory latency and alleviate memory bandwidth requirement, a fourth-level cache (L4) is introduced in modern high-performance multi-core systems for supporting parallel computation. However, additional cache level causes higher cache miss penalty since a request needs to go through all levels of caches to reach to the main memory. In this paper, we introduce a new way of using a Bloom Filter (BF) to predict cache misses at any cache level in a multicore system. These misses can runahead to access lower-level caches or memory to reduce the miss penalty. The proposed hashing scheme extends the cache index of the target set and uses it for accessing the BF array to avoid counters in the BF array. Performance evaluation using a set of SPEC2006 benchmarks on 8-core systems with 4-level cache hierarchy shows that using a BF for the third-level (L3) cache to filter and runahead L3 misses, the IPCs can be improved by 4-20% with an average 10.5%. In comparison with the delay-recalibration scheme, the improvement is 3.5-4.8%.\",\"PeriodicalId\":203925,\"journal\":{\"name\":\"2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)\",\"volume\":\"87 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDCAT.2016.017\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDCAT.2016.017","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
In order to hide long memory latency and alleviate memory bandwidth requirement, a fourth-level cache (L4) is introduced in modern high-performance multi-core systems for supporting parallel computation. However, additional cache level causes higher cache miss penalty since a request needs to go through all levels of caches to reach to the main memory. In this paper, we introduce a new way of using a Bloom Filter (BF) to predict cache misses at any cache level in a multicore system. These misses can runahead to access lower-level caches or memory to reduce the miss penalty. The proposed hashing scheme extends the cache index of the target set and uses it for accessing the BF array to avoid counters in the BF array. Performance evaluation using a set of SPEC2006 benchmarks on 8-core systems with 4-level cache hierarchy shows that using a BF for the third-level (L3) cache to filter and runahead L3 misses, the IPCs can be improved by 4-20% with an average 10.5%. In comparison with the delay-recalibration scheme, the improvement is 3.5-4.8%.