Micro-architecture independent analytical processor performance and power modeling

2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2015-03-29 DOI:10.1109/ISPASS.2015.7095782

S. V. D. Steen, S. D. Pestel, Moncef Mechri, Stijn Eyerman, Trevor E. Carlson, D. Black-Schaffer, Erik Hagersten, L. Eeckhout

{"title":"Micro-architecture independent analytical processor performance and power modeling","authors":"S. V. D. Steen, S. D. Pestel, Moncef Mechri, Stijn Eyerman, Trevor E. Carlson, D. Black-Schaffer, Erik Hagersten, L. Eeckhout","doi":"10.1109/ISPASS.2015.7095782","DOIUrl":null,"url":null,"abstract":"Optimizing processors for specific application(s) can substantially improve energy-efficiency. With the end of Dennard scaling, and the corresponding reduction in energyefficiency gains from technology scaling, such approaches may become increasingly important. However, designing applicationspecific processors require fast design space exploration tools to optimize for the targeted application(s). Analytical models can be a good fit for such design space exploration as they provide fast performance estimations and insight into the interaction between an application's characteristics and the micro-architecture of a processor. Unfortunately, current analytical models require some microarchitecture dependent inputs, such as cache miss rates, branch miss rates and memory-level parallelism. This requires profiling the applications for each cache and branch predictor configuration, which is far more time-consuming than evaluating the actual performance models. In this work we present a micro-architecture independent profiler and associated analytical models that allow us to produce performance and power estimates across a large design space almost instantaneously. We show that using a micro-architecture independent profile leads to a speedup of 25× for our evaluated design space, compared to an analytical model that uses micro-architecture dependent profiles. Over a large design space, the model has a 13% error for performance and a 7% error for power, compared to cycle-level simulation. The model is able to accurately determine the optimal processor configuration for different applications under power or performance constraints, and it can provide insight into performance through cycle stacks.","PeriodicalId":189378,"journal":{"name":"2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"40","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPASS.2015.7095782","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 40

Abstract

Optimizing processors for specific application(s) can substantially improve energy-efficiency. With the end of Dennard scaling, and the corresponding reduction in energyefficiency gains from technology scaling, such approaches may become increasingly important. However, designing applicationspecific processors require fast design space exploration tools to optimize for the targeted application(s). Analytical models can be a good fit for such design space exploration as they provide fast performance estimations and insight into the interaction between an application's characteristics and the micro-architecture of a processor. Unfortunately, current analytical models require some microarchitecture dependent inputs, such as cache miss rates, branch miss rates and memory-level parallelism. This requires profiling the applications for each cache and branch predictor configuration, which is far more time-consuming than evaluating the actual performance models. In this work we present a micro-architecture independent profiler and associated analytical models that allow us to produce performance and power estimates across a large design space almost instantaneously. We show that using a micro-architecture independent profile leads to a speedup of 25× for our evaluated design space, compared to an analytical model that uses micro-architecture dependent profiles. Over a large design space, the model has a 13% error for performance and a 7% error for power, compared to cycle-level simulation. The model is able to accurately determine the optimal processor configuration for different applications under power or performance constraints, and it can provide insight into performance through cycle stacks.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

微架构独立分析处理器性能和功耗建模

针对特定应用优化处理器可以大大提高能源效率。随着登纳德标度的终结，以及技术标度带来的能效收益相应减少，这种方法可能变得越来越重要。然而，设计特定于应用程序的处理器需要快速的设计空间探索工具来针对目标应用程序进行优化。分析模型非常适合这种设计空间探索，因为它们提供了快速的性能评估和对应用程序特征与处理器微体系结构之间交互的洞察。不幸的是，当前的分析模型需要一些微体系结构相关的输入，例如缓存缺失率、分支缺失率和内存级并行性。这需要为每个缓存和分支预测器配置分析应用程序，这比评估实际性能模型要耗时得多。在这项工作中，我们提出了一个微架构独立的分析器和相关的分析模型，使我们能够在几乎即时的情况下产生跨大型设计空间的性能和功耗估计。我们表明，与使用微架构相关配置文件的分析模型相比，使用独立于微架构的配置文件可以使我们评估的设计空间的速度提高25倍。在较大的设计空间内，与周期级仿真相比，该模型的性能误差为13%，功率误差为7%。该模型能够准确地确定功耗或性能限制下不同应用程序的最佳处理器配置，并且可以通过周期堆栈提供对性能的洞察。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

自引率

0.00%

发文量