解决过滤器缓存中的能源挑战

2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) Pub Date : 2017-10-01 DOI:10.1109/SBAC-PAD.2017.14

Ricardo Alves, Nikos Nikoleris, S. Kaxiras, D. Black-Schaffer

{"title":"解决过滤器缓存中的能源挑战","authors":"Ricardo Alves, Nikos Nikoleris, S. Kaxiras, D. Black-Schaffer","doi":"10.1109/SBAC-PAD.2017.14","DOIUrl":null,"url":null,"abstract":"Filter caches and way-predictors are common approaches to improve the efficiency and/or performance of first-level caches. Filter caches use a small L0 to provide more efficient and faster access to a small subset of the data, and work well for programs with high locality. Way-predictors improve efficiency by accessing only the way predicted, which alleviates the need to read all ways in parallel without increasing latency, but hurts performance due to mispredictions.In this work we examine how SRAM layout constraints (h-trees and data mapping inside the cache) affect way-predictors and filter caches. We show that accessing the smaller L0 array can be significantly more energy efficient than attempting to read fewer ways from a larger L1 cache; and that the main source of energy inefficiency in filter caches comes from L0 and L1 misses. We propose a filter cache optimization that shares the tag array between the L0 and the L1, which incurs the overhead of reading the larger tag array on every access, but in return allows us to directly access the correct L1 way on each L0 miss. This optimization does not add any extra latency and counter-intuitively, improves the filter caches overall energy efficiency beyond that of the way-predictor.By combining the low power benefits of a physically smaller L0 with the reduction in miss energy by reading L1 tags upfront in parallel with L0 data, we show that the optimized filter cache reduces the dynamic cache energy compared to a traditional filter cache by 26% while providing the same performance advantage. Compared to a way-predictor, the optimized cache improves performance by 6% and energy by 2%.","PeriodicalId":187204,"journal":{"name":"2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Addressing Energy Challenges in Filter Caches\",\"authors\":\"Ricardo Alves, Nikos Nikoleris, S. Kaxiras, D. Black-Schaffer\",\"doi\":\"10.1109/SBAC-PAD.2017.14\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Filter caches and way-predictors are common approaches to improve the efficiency and/or performance of first-level caches. Filter caches use a small L0 to provide more efficient and faster access to a small subset of the data, and work well for programs with high locality. Way-predictors improve efficiency by accessing only the way predicted, which alleviates the need to read all ways in parallel without increasing latency, but hurts performance due to mispredictions.In this work we examine how SRAM layout constraints (h-trees and data mapping inside the cache) affect way-predictors and filter caches. We show that accessing the smaller L0 array can be significantly more energy efficient than attempting to read fewer ways from a larger L1 cache; and that the main source of energy inefficiency in filter caches comes from L0 and L1 misses. We propose a filter cache optimization that shares the tag array between the L0 and the L1, which incurs the overhead of reading the larger tag array on every access, but in return allows us to directly access the correct L1 way on each L0 miss. This optimization does not add any extra latency and counter-intuitively, improves the filter caches overall energy efficiency beyond that of the way-predictor.By combining the low power benefits of a physically smaller L0 with the reduction in miss energy by reading L1 tags upfront in parallel with L0 data, we show that the optimized filter cache reduces the dynamic cache energy compared to a traditional filter cache by 26% while providing the same performance advantage. Compared to a way-predictor, the optimized cache improves performance by 6% and energy by 2%.\",\"PeriodicalId\":187204,\"journal\":{\"name\":\"2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SBAC-PAD.2017.14\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBAC-PAD.2017.14","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

过滤器缓存和方式预测器是提高第一级缓存的效率和/或性能的常用方法。过滤器缓存使用较小的L0来提供对一小部分数据的更高效和更快的访问，并且对于具有高局部性的程序很好地工作。方式预测器通过只访问预测的方式来提高效率，这减少了并行读取所有方式而不增加延迟的需要，但由于错误预测而损害了性能。在这项工作中，我们研究了SRAM布局约束(h树和缓存内的数据映射)如何影响方式预测器和过滤器缓存。我们表明，访问较小的L0数组比尝试从较大的L1缓存中读取更少的方式要节能得多;滤波器缓存中能量效率低下的主要原因是L0和L1缺失。我们提出了一种过滤器缓存优化，它在L0和L1之间共享标签数组，这在每次访问时都会产生读取较大标签数组的开销，但作为回报，我们可以在每次L0 miss时直接访问正确的L1方式。这种优化不会增加任何额外的延迟，而且与直觉相反，它提高了过滤器缓存的整体能源效率，超过了way-predictor。通过将物理上较小的L0的低功耗优势与通过与L0数据并行预先读取L1标签来减少遗漏能量相结合，我们表明，与传统滤波器缓存相比，优化的滤波器缓存在提供相同性能优势的同时减少了26%的动态缓存能量。与方式预测器相比，优化后的缓存性能提高了6%，能耗提高了2%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Addressing Energy Challenges in Filter Caches

Filter caches and way-predictors are common approaches to improve the efficiency and/or performance of first-level caches. Filter caches use a small L0 to provide more efficient and faster access to a small subset of the data, and work well for programs with high locality. Way-predictors improve efficiency by accessing only the way predicted, which alleviates the need to read all ways in parallel without increasing latency, but hurts performance due to mispredictions.In this work we examine how SRAM layout constraints (h-trees and data mapping inside the cache) affect way-predictors and filter caches. We show that accessing the smaller L0 array can be significantly more energy efficient than attempting to read fewer ways from a larger L1 cache; and that the main source of energy inefficiency in filter caches comes from L0 and L1 misses. We propose a filter cache optimization that shares the tag array between the L0 and the L1, which incurs the overhead of reading the larger tag array on every access, but in return allows us to directly access the correct L1 way on each L0 miss. This optimization does not add any extra latency and counter-intuitively, improves the filter caches overall energy efficiency beyond that of the way-predictor.By combining the low power benefits of a physically smaller L0 with the reduction in miss energy by reading L1 tags upfront in parallel with L0 data, we show that the optimized filter cache reduces the dynamic cache energy compared to a traditional filter cache by 26% while providing the same performance advantage. Compared to a way-predictor, the optimized cache improves performance by 6% and energy by 2%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

自引率

0.00%

发文量

期刊最新文献

Resource-Management Study in HPC Runtime-Stacking Context Cloud Workload Prediction and Generation Models GC-CR: A Decentralized Garbage Collector Component for Checkpointing in Clouds Overcoming Memory-Capacity Constraints in the Use of ILUPACK on Graphics Processors Beyond the Fog: Bringing Cross-Platform Code Execution to Constrained IoT Devices