Online Prediction of Applications Cache Utility

2007 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation Pub Date : 2007-07-16 DOI:10.1109/ICSAMOS.2007.4285748

Miquel Moretó, F. Cazorla, Alex Ramírez, M. Valero

{"title":"Online Prediction of Applications Cache Utility","authors":"Miquel Moretó, F. Cazorla, Alex Ramírez, M. Valero","doi":"10.1109/ICSAMOS.2007.4285748","DOIUrl":null,"url":null,"abstract":"General purpose architectures are designed to offer average high performance regardless of the particular application that is being run. Performance and power inefficiencies appear as a consequence for some programs. Reconfigurable hardware (cache hierarchy, branch predictor, execution units, bandwidth, etc.) has been proposed to overcome these inefficiencies by dynamically adapting the architecture to the application needs. However, nearly all the proposals use indirect measures or heuristics of performance to decide new configurations, what may lead to inefficiencies. In this paper we propose a runtime mechanism that allows to predict the throughput of an application on an architecture using a reconfigurable L2 cache. L2 cache size varies at a way granularity and we predict the performance of the same application on all other L2 cache sizes at the same time. We obtain for different L2 cache sizes an average error of 3.11%, a maximum error of 16.4% and standard deviation of 3.7%. No profiling or operating system participation is needed in this mechanism. We also give a hardware implementation that allows to reduce the hardware cost under 0.4% of the total L2 size and maintains high accuracy. This mechanism can be used to reduce power consumption in single threaded architectures and improve performance in multithreaded architectures that dynamically partition shared L2 caches.","PeriodicalId":106933,"journal":{"name":"2007 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2007-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSAMOS.2007.4285748","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

Abstract

General purpose architectures are designed to offer average high performance regardless of the particular application that is being run. Performance and power inefficiencies appear as a consequence for some programs. Reconfigurable hardware (cache hierarchy, branch predictor, execution units, bandwidth, etc.) has been proposed to overcome these inefficiencies by dynamically adapting the architecture to the application needs. However, nearly all the proposals use indirect measures or heuristics of performance to decide new configurations, what may lead to inefficiencies. In this paper we propose a runtime mechanism that allows to predict the throughput of an application on an architecture using a reconfigurable L2 cache. L2 cache size varies at a way granularity and we predict the performance of the same application on all other L2 cache sizes at the same time. We obtain for different L2 cache sizes an average error of 3.11%, a maximum error of 16.4% and standard deviation of 3.7%. No profiling or operating system participation is needed in this mechanism. We also give a hardware implementation that allows to reduce the hardware cost under 0.4% of the total L2 size and maintains high accuracy. This mechanism can be used to reduce power consumption in single threaded architectures and improve performance in multithreaded architectures that dynamically partition shared L2 caches.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在线预测应用程序缓存实用程序

通用架构旨在提供平均的高性能，而不考虑正在运行的特定应用程序。性能和电源效率低下是某些程序的结果。可重构硬件(缓存层次结构、分支预测器、执行单元、带宽等)已经被提出，通过动态调整架构以适应应用程序的需求来克服这些低效率问题。然而，几乎所有的建议都使用间接度量或性能启发式来决定可能导致效率低下的新配置。在本文中，我们提出了一种运行时机制，该机制允许使用可重构L2缓存来预测架构上应用程序的吞吐量。二级缓存大小以某种粒度变化，我们同时预测同一应用程序在所有其他二级缓存大小上的性能。我们得到不同二级缓存大小的平均误差为3.11%，最大误差为16.4%，标准差为3.7%。在这种机制中不需要分析或操作系统参与。我们还提供了一个硬件实现，可以将硬件成本降低到L2总尺寸的0.4%以下，并保持高精度。此机制可用于减少单线程架构中的功耗，并提高动态划分共享L2缓存的多线程架构中的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2007 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation

自引率

0.00%

发文量