更快的缓存合并功能升温

Proceedings of the DroneSE and RAPIDO: System Engineering for constrained embedded systems Pub Date : 2023-01-17 DOI:10.1145/3579170.3579256

Gustaf Borgström, C. Rohner, D. Black-Schaffer

{"title":"更快的缓存合并功能升温","authors":"Gustaf Borgström, C. Rohner, D. Black-Schaffer","doi":"10.1145/3579170.3579256","DOIUrl":null,"url":null,"abstract":"Smarts-like sampled hardware simulation techniques achieve good accuracy by simulating many small portions of an application in detail. However, while this reduces the simulation time, it results in extensive cache warming times, as each of the many simulation points requires warming the whole memory hierarchy. Adaptive Cache Warming reduces this time by iteratively increasing warming to achieve sufficient accuracy. Unfortunately, each increases requires that the previous warming be redone, nearly doubling the total warming. We address re-warming by developing a technique to merge the cache states from the previous and additional warming iterations. We demonstrate our merging approach on multi-level LRU cache hierarchy and evaluate and address the introduced errors. Our experiments show that Cache Merging delivers an average speedup of 1.44 ×, 1.84 ×, and 1.87 × for 128kB, 2MB, and 8MB L2 caches, respectively, (vs. a 2 × theoretical maximum speedup) with 95-percentile absolute IPC errors of only 0.029, 0.015, and 0.006, respectively. These results demonstrate that Cache Merging yields significantly higher simulation speed with minimal losses.","PeriodicalId":153341,"journal":{"name":"Proceedings of the DroneSE and RAPIDO: System Engineering for constrained embedded systems","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Faster Functional Warming with Cache Merging\",\"authors\":\"Gustaf Borgström, C. Rohner, D. Black-Schaffer\",\"doi\":\"10.1145/3579170.3579256\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Smarts-like sampled hardware simulation techniques achieve good accuracy by simulating many small portions of an application in detail. However, while this reduces the simulation time, it results in extensive cache warming times, as each of the many simulation points requires warming the whole memory hierarchy. Adaptive Cache Warming reduces this time by iteratively increasing warming to achieve sufficient accuracy. Unfortunately, each increases requires that the previous warming be redone, nearly doubling the total warming. We address re-warming by developing a technique to merge the cache states from the previous and additional warming iterations. We demonstrate our merging approach on multi-level LRU cache hierarchy and evaluate and address the introduced errors. Our experiments show that Cache Merging delivers an average speedup of 1.44 ×, 1.84 ×, and 1.87 × for 128kB, 2MB, and 8MB L2 caches, respectively, (vs. a 2 × theoretical maximum speedup) with 95-percentile absolute IPC errors of only 0.029, 0.015, and 0.006, respectively. These results demonstrate that Cache Merging yields significantly higher simulation speed with minimal losses.\",\"PeriodicalId\":153341,\"journal\":{\"name\":\"Proceedings of the DroneSE and RAPIDO: System Engineering for constrained embedded systems\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the DroneSE and RAPIDO: System Engineering for constrained embedded systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3579170.3579256\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the DroneSE and RAPIDO: System Engineering for constrained embedded systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3579170.3579256","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

类似智能的采样硬件仿真技术通过详细模拟应用程序的许多小部分来实现良好的准确性。然而，虽然这减少了模拟时间，但它导致了大量的缓存升温时间，因为许多模拟点中的每一个都需要升温整个内存层次结构。自适应缓存升温通过迭代增加升温来达到足够的精度，从而减少了这一时间。不幸的是，每次升温都需要之前的变暖重新开始，几乎使总变暖增加一倍。我们通过开发一种技术来合并来自先前和额外的升温迭代的缓存状态，从而解决重新升温问题。我们在多级LRU缓存层次结构上演示了我们的合并方法，并评估和解决了引入的错误。我们的实验表明，对于128kB、2MB和8MB的L2缓存，Cache合并分别提供了1.44倍、1.84倍和1.87倍的平均加速(相对于2倍的理论最大加速)，95%的绝对IPC误差分别只有0.029、0.015和0.006。这些结果表明，缓存合并以最小的损失显著提高了模拟速度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Faster Functional Warming with Cache Merging

Smarts-like sampled hardware simulation techniques achieve good accuracy by simulating many small portions of an application in detail. However, while this reduces the simulation time, it results in extensive cache warming times, as each of the many simulation points requires warming the whole memory hierarchy. Adaptive Cache Warming reduces this time by iteratively increasing warming to achieve sufficient accuracy. Unfortunately, each increases requires that the previous warming be redone, nearly doubling the total warming. We address re-warming by developing a technique to merge the cache states from the previous and additional warming iterations. We demonstrate our merging approach on multi-level LRU cache hierarchy and evaluate and address the introduced errors. Our experiments show that Cache Merging delivers an average speedup of 1.44 ×, 1.84 ×, and 1.87 × for 128kB, 2MB, and 8MB L2 caches, respectively, (vs. a 2 × theoretical maximum speedup) with 95-percentile absolute IPC errors of only 0.029, 0.015, and 0.006, respectively. These results demonstrate that Cache Merging yields significantly higher simulation speed with minimal losses.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the DroneSE and RAPIDO: System Engineering for constrained embedded systems

自引率

0.00%

发文量

期刊最新文献

Automatic DRAM Subsystem Configuration with irace Datapath Optimization for Embedded Signal Processing Architectures utilizing Design Space Exploration Faster Functional Warming with Cache Merging Towards a European Network of Enabling Technologies for Drones Towards an Ontological Methodology for Dynamic Dependability Management of Unmanned Aerial Vehicles