Considering all starting points for simultaneous multithreading simulation

Michael Van Biesbrouck, L. Eeckhout, B. Calder
{"title":"Considering all starting points for simultaneous multithreading simulation","authors":"Michael Van Biesbrouck, L. Eeckhout, B. Calder","doi":"10.1109/ISPASS.2006.1620799","DOIUrl":null,"url":null,"abstract":"Commercial processors have support for simultaneous multithreading (SMT), yet little work has been done to provide representative simulation results for SMT. Given a workload, current simulation techniques typically run one combination of those programs from a specific starting offset, or just run one combination of samples across the benchmarks. We have found that the architecture behavior and overall throughput seen can vary drastically based upon the starting points of the different benchmarks. Therefore, to completely evaluate the effect of an SMT architecture optimization on a workload, one would need to simulate many or all of the program combinations from different starting offsets. But exhaustively running all program combinations from many starting offsets is infeasible - even running single programs to completion is often infeasible with modern benchmarks. In this paper we propose an SMT simulation methodology that estimates the average performance over all possible starting points when running multiple programs concurrently on an SMT processor. This is based on our prior co-phase matrix phase analysis and simulation infrastructure. This approach samples all of the unique phase combinations for a set of benchmarks to be run together. Once these phase combinations are sampled, our approach uses these samples, along with a trace of the phase behavior for each program, to provide a CPI estimate of all starting points. This all starting point CPI estimate is precisely calculated in just minutes.","PeriodicalId":369192,"journal":{"name":"2006 IEEE International Symposium on Performance Analysis of Systems and Software","volume":"91 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 IEEE International Symposium on Performance Analysis of Systems and Software","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPASS.2006.1620799","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 35

Abstract

Commercial processors have support for simultaneous multithreading (SMT), yet little work has been done to provide representative simulation results for SMT. Given a workload, current simulation techniques typically run one combination of those programs from a specific starting offset, or just run one combination of samples across the benchmarks. We have found that the architecture behavior and overall throughput seen can vary drastically based upon the starting points of the different benchmarks. Therefore, to completely evaluate the effect of an SMT architecture optimization on a workload, one would need to simulate many or all of the program combinations from different starting offsets. But exhaustively running all program combinations from many starting offsets is infeasible - even running single programs to completion is often infeasible with modern benchmarks. In this paper we propose an SMT simulation methodology that estimates the average performance over all possible starting points when running multiple programs concurrently on an SMT processor. This is based on our prior co-phase matrix phase analysis and simulation infrastructure. This approach samples all of the unique phase combinations for a set of benchmarks to be run together. Once these phase combinations are sampled, our approach uses these samples, along with a trace of the phase behavior for each program, to provide a CPI estimate of all starting points. This all starting point CPI estimate is precisely calculated in just minutes.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
考虑同步多线程模拟的所有起点
商业处理器支持同步多线程(SMT),但是为SMT提供有代表性的仿真结果所做的工作很少。给定一个工作负载,当前的模拟技术通常从一个特定的起始偏移量运行这些程序的一个组合,或者只是在基准测试中运行一个样本组合。我们发现,根据不同基准的起点,所看到的体系结构行为和总体吞吐量可能会有很大的不同。因此,要完全评估SMT体系结构优化对工作负载的影响,需要模拟来自不同起始偏移量的许多或所有程序组合。但是,从许多起始偏移量中详尽地运行所有程序组合是不可行的——即使运行单个程序直到完成,在现代基准测试中也常常是不可行的。在本文中,我们提出了一种SMT模拟方法,该方法可以估计在SMT处理器上并发运行多个程序时所有可能起点上的平均性能。这是基于我们之前的共相矩阵相位分析和仿真基础设施。此方法为一组要一起运行的基准测试对所有唯一的阶段组合进行采样。一旦对这些阶段组合进行采样,我们的方法就会使用这些样本,以及每个程序的阶段行为的跟踪,来提供所有起点的CPI估计。所有这些起点CPI估算都是在几分钟内精确计算出来的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Accelerating architectural exploration using canonical instruction segments Simulation sampling with live-points Characterizing the branch misprediction penalty Friendly fire: understanding the effects of multiprocessor prefetches Evaluating the efficacy of statistical simulation for design space exploration
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1