Considering all starting points for simultaneous multithreading simulation

2006 IEEE International Symposium on Performance Analysis of Systems and Software Pub Date : 2006-03-19 DOI:10.1109/ISPASS.2006.1620799

Michael Van Biesbrouck, L. Eeckhout, B. Calder

{"title":"Considering all starting points for simultaneous multithreading simulation","authors":"Michael Van Biesbrouck, L. Eeckhout, B. Calder","doi":"10.1109/ISPASS.2006.1620799","DOIUrl":null,"url":null,"abstract":"Commercial processors have support for simultaneous multithreading (SMT), yet little work has been done to provide representative simulation results for SMT. Given a workload, current simulation techniques typically run one combination of those programs from a specific starting offset, or just run one combination of samples across the benchmarks. We have found that the architecture behavior and overall throughput seen can vary drastically based upon the starting points of the different benchmarks. Therefore, to completely evaluate the effect of an SMT architecture optimization on a workload, one would need to simulate many or all of the program combinations from different starting offsets. But exhaustively running all program combinations from many starting offsets is infeasible - even running single programs to completion is often infeasible with modern benchmarks. In this paper we propose an SMT simulation methodology that estimates the average performance over all possible starting points when running multiple programs concurrently on an SMT processor. This is based on our prior co-phase matrix phase analysis and simulation infrastructure. This approach samples all of the unique phase combinations for a set of benchmarks to be run together. Once these phase combinations are sampled, our approach uses these samples, along with a trace of the phase behavior for each program, to provide a CPI estimate of all starting points. This all starting point CPI estimate is precisely calculated in just minutes.","PeriodicalId":369192,"journal":{"name":"2006 IEEE International Symposium on Performance Analysis of Systems and Software","volume":"91 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 IEEE International Symposium on Performance Analysis of Systems and Software","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPASS.2006.1620799","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 35

Abstract

Commercial processors have support for simultaneous multithreading (SMT), yet little work has been done to provide representative simulation results for SMT. Given a workload, current simulation techniques typically run one combination of those programs from a specific starting offset, or just run one combination of samples across the benchmarks. We have found that the architecture behavior and overall throughput seen can vary drastically based upon the starting points of the different benchmarks. Therefore, to completely evaluate the effect of an SMT architecture optimization on a workload, one would need to simulate many or all of the program combinations from different starting offsets. But exhaustively running all program combinations from many starting offsets is infeasible - even running single programs to completion is often infeasible with modern benchmarks. In this paper we propose an SMT simulation methodology that estimates the average performance over all possible starting points when running multiple programs concurrently on an SMT processor. This is based on our prior co-phase matrix phase analysis and simulation infrastructure. This approach samples all of the unique phase combinations for a set of benchmarks to be run together. Once these phase combinations are sampled, our approach uses these samples, along with a trace of the phase behavior for each program, to provide a CPI estimate of all starting points. This all starting point CPI estimate is precisely calculated in just minutes.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

考虑同步多线程模拟的所有起点

商业处理器支持同步多线程(SMT)，但是为SMT提供有代表性的仿真结果所做的工作很少。给定一个工作负载，当前的模拟技术通常从一个特定的起始偏移量运行这些程序的一个组合，或者只是在基准测试中运行一个样本组合。我们发现，根据不同基准的起点，所看到的体系结构行为和总体吞吐量可能会有很大的不同。因此，要完全评估SMT体系结构优化对工作负载的影响，需要模拟来自不同起始偏移量的许多或所有程序组合。但是，从许多起始偏移量中详尽地运行所有程序组合是不可行的——即使运行单个程序直到完成，在现代基准测试中也常常是不可行的。在本文中，我们提出了一种SMT模拟方法，该方法可以估计在SMT处理器上并发运行多个程序时所有可能起点上的平均性能。这是基于我们之前的共相矩阵相位分析和仿真基础设施。此方法为一组要一起运行的基准测试对所有唯一的阶段组合进行采样。一旦对这些阶段组合进行采样，我们的方法就会使用这些样本，以及每个程序的阶段行为的跟踪，来提供所有起点的CPI估计。所有这些起点CPI估算都是在几分钟内精确计算出来的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2006 IEEE International Symposium on Performance Analysis of Systems and Software

自引率

0.00%

发文量

期刊最新文献

Accelerating architectural exploration using canonical instruction segments Simulation sampling with live-points Characterizing the branch misprediction penalty Friendly fire: understanding the effects of multiprocessor prefetches Evaluating the efficacy of statistical simulation for design space exploration