2006 IEEE International Symposium on Performance Analysis of Systems and Software最新文献

英文中文

Considering all starting points for simultaneous multithreading simulation 考虑同步多线程模拟的所有起点

2006 IEEE International Symposium on Performance Analysis of Systems and Software

Pub Date : 2006-03-19 DOI: 10.1109/ISPASS.2006.1620799

Michael Van Biesbrouck, L. Eeckhout, B. Calder

Commercial processors have support for simultaneous multithreading (SMT), yet little work has been done to provide representative simulation results for SMT. Given a workload, current simulation techniques typically run one combination of those programs from a specific starting offset, or just run one combination of samples across the benchmarks. We have found that the architecture behavior and overall throughput seen can vary drastically based upon the starting points of the different benchmarks. Therefore, to completely evaluate the effect of an SMT architecture optimization on a workload, one would need to simulate many or all of the program combinations from different starting offsets. But exhaustively running all program combinations from many starting offsets is infeasible - even running single programs to completion is often infeasible with modern benchmarks. In this paper we propose an SMT simulation methodology that estimates the average performance over all possible starting points when running multiple programs concurrently on an SMT processor. This is based on our prior co-phase matrix phase analysis and simulation infrastructure. This approach samples all of the unique phase combinations for a set of benchmarks to be run together. Once these phase combinations are sampled, our approach uses these samples, along with a trace of the phase behavior for each program, to provide a CPI estimate of all starting points. This all starting point CPI estimate is precisely calculated in just minutes.

商业处理器支持同步多线程(SMT)，但是为SMT提供有代表性的仿真结果所做的工作很少。给定一个工作负载，当前的模拟技术通常从一个特定的起始偏移量运行这些程序的一个组合，或者只是在基准测试中运行一个样本组合。我们发现，根据不同基准的起点，所看到的体系结构行为和总体吞吐量可能会有很大的不同。因此，要完全评估SMT体系结构优化对工作负载的影响，需要模拟来自不同起始偏移量的许多或所有程序组合。但是，从许多起始偏移量中详尽地运行所有程序组合是不可行的——即使运行单个程序直到完成，在现代基准测试中也常常是不可行的。在本文中，我们提出了一种SMT模拟方法，该方法可以估计在SMT处理器上并发运行多个程序时所有可能起点上的平均性能。这是基于我们之前的共相矩阵相位分析和仿真基础设施。此方法为一组要一起运行的基准测试对所有唯一的阶段组合进行采样。一旦对这些阶段组合进行采样，我们的方法就会使用这些样本，以及每个程序的阶段行为的跟踪，来提供所有起点的CPI估计。所有这些起点CPI估算都是在几分钟内精确计算出来的。

{"title":"Considering all starting points for simultaneous multithreading simulation","authors":"Michael Van Biesbrouck, L. Eeckhout, B. Calder","doi":"10.1109/ISPASS.2006.1620799","DOIUrl":"https://doi.org/10.1109/ISPASS.2006.1620799","url":null,"abstract":"Commercial processors have support for simultaneous multithreading (SMT), yet little work has been done to provide representative simulation results for SMT. Given a workload, current simulation techniques typically run one combination of those programs from a specific starting offset, or just run one combination of samples across the benchmarks. We have found that the architecture behavior and overall throughput seen can vary drastically based upon the starting points of the different benchmarks. Therefore, to completely evaluate the effect of an SMT architecture optimization on a workload, one would need to simulate many or all of the program combinations from different starting offsets. But exhaustively running all program combinations from many starting offsets is infeasible - even running single programs to completion is often infeasible with modern benchmarks. In this paper we propose an SMT simulation methodology that estimates the average performance over all possible starting points when running multiple programs concurrently on an SMT processor. This is based on our prior co-phase matrix phase analysis and simulation infrastructure. This approach samples all of the unique phase combinations for a set of benchmarks to be run together. Once these phase combinations are sampled, our approach uses these samples, along with a trace of the phase behavior for each program, to provide a CPI estimate of all starting points. This all starting point CPI estimate is precisely calculated in just minutes.","PeriodicalId":369192,"journal":{"name":"2006 IEEE International Symposium on Performance Analysis of Systems and Software","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124611952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35

Comparing multinomial and k-means clustering for SimPoint 比较SimPoint的多项聚类和k-means聚类

2006 IEEE International Symposium on Performance Analysis of Systems and Software

Pub Date : 2006-03-19 DOI: 10.1109/ISPASS.2006.1620798

Greg Hamerly, Erez Perelman, B. Calder

SimPoint is a technique used to pick what parts of the program's execution to simulate in order to have a complete picture of execution. SimPoint uses data clustering algorithms from machine learning to automatically find repetitive (similar) patterns in a program's execution, and it chooses one sample to represent each unique repetitive behavior. Together these samples represent an accurate picture of the complete execution of the program. SimPoint is based on the k-means clustering algorithm; recent work proposed using a different clustering method based on multinomial models, but only provided a preliminary comparison and analysis. In this work we provide a detailed comparison of using k-means and multinomial clustering for SimPoint. We show that k-means performs better than the recently proposed multinomial clustering approach. We then propose two improvements to the prior multinomial clustering approach in the areas of feature reduction and the picking of simulation points which allow multinomial clustering to perform as well as k-means. We then conclude by examining how to potentially combine multinomial clustering with k-means.

SimPoint是一种技术，用于选择要模拟程序执行的哪些部分，以便获得执行的完整图像。SimPoint使用机器学习中的数据聚类算法来自动查找程序执行中的重复(类似)模式，并选择一个样本来表示每个独特的重复行为。这些样本一起代表了程序完整执行的准确画面。SimPoint基于k-means聚类算法;最近的研究提出了一种基于多项模型的不同聚类方法，但只提供了初步的比较和分析。在这项工作中，我们提供了SimPoint使用k-means和多项聚类的详细比较。我们证明k-means比最近提出的多项聚类方法表现得更好。然后，我们在特征约简和模拟点选择方面对先前的多项聚类方法提出了两个改进，使多项聚类的性能与k-means一样好。然后，我们通过研究如何潜在地将多项聚类与k-means相结合来得出结论。

{"title":"Comparing multinomial and k-means clustering for SimPoint","authors":"Greg Hamerly, Erez Perelman, B. Calder","doi":"10.1109/ISPASS.2006.1620798","DOIUrl":"https://doi.org/10.1109/ISPASS.2006.1620798","url":null,"abstract":"SimPoint is a technique used to pick what parts of the program's execution to simulate in order to have a complete picture of execution. SimPoint uses data clustering algorithms from machine learning to automatically find repetitive (similar) patterns in a program's execution, and it chooses one sample to represent each unique repetitive behavior. Together these samples represent an accurate picture of the complete execution of the program. SimPoint is based on the k-means clustering algorithm; recent work proposed using a different clustering method based on multinomial models, but only provided a preliminary comparison and analysis. In this work we provide a detailed comparison of using k-means and multinomial clustering for SimPoint. We show that k-means performs better than the recently proposed multinomial clustering approach. We then propose two improvements to the prior multinomial clustering approach in the areas of feature reduction and the picking of simulation points which allow multinomial clustering to perform as well as k-means. We then conclude by examining how to potentially combine multinomial clustering with k-means.","PeriodicalId":369192,"journal":{"name":"2006 IEEE International Symposium on Performance Analysis of Systems and Software","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124830852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Workload sanitation for performance evaluation 用于绩效评估的工作负载卫生

2006 IEEE International Symposium on Performance Analysis of Systems and Software

Pub Date : 2006-03-19 DOI: 10.1109/ISPASS.2006.1620806

D. Feitelson, Dan Tsafrir

The performance of computer systems depends, among other things, on the workload. Performance evaluations are therefore often done using logs of workloads on current productions systems, under the assumption that such real workloads are representative and reliable; likewise, workload modeling is typically based on real workloads. We show, however, that real workloads may also contain anomalies that make them non-representative and unreliable. This is a special case of multi-class workloads, where one class is the "real" workload which we wish to use in the evaluation, and the other class contaminates the log with "bogus" data. We provide several examples of this situation, including a previously unrecognized type of anomaly we call "workload flurries": surges of activity with a repetitive nature, caused by a single user, that dominate the workload for a relatively short period. Using a workload with such anomalies in effect emphasizes rare and unique events (e.g. occurring for a few days out of two years of logged data), and risks optimizing the design decision for the anomalous workload at the expense of the normal workload. Thus we claim that such anomalies should be removed from the workload before it is used in evaluations, and that ignoring them is actually an unjustifiable approach.

计算机系统的性能除其他因素外，还取决于工作负荷。因此，业绩评价通常是在假定这种实际工作负荷具有代表性和可靠性的情况下，使用当前生产系统的工作负荷日志进行的;同样，工作负载建模通常基于实际工作负载。然而，我们表明，实际工作负载也可能包含使其不具有代表性和不可靠的异常情况。这是多类工作负载的一种特殊情况，其中一个类是我们希望在评估中使用的“真实”工作负载，而另一个类则用“伪造”数据污染日志。我们提供了这种情况的几个例子，包括一种以前未被识别的异常类型，我们称之为“工作负载骚动”:由单个用户引起的具有重复性质的活动激增，在相对较短的时间内主导工作负载。使用具有这种异常的工作负载实际上强调罕见和独特的事件(例如，在两年的日志数据中只发生几天)，并且有可能以牺牲正常工作负载为代价，为异常工作负载优化设计决策。因此，我们声称，在评估中使用工作量之前，应该将这些异常情况从工作量中删除，而忽略它们实际上是一种不合理的方法。

{"title":"Workload sanitation for performance evaluation","authors":"D. Feitelson, Dan Tsafrir","doi":"10.1109/ISPASS.2006.1620806","DOIUrl":"https://doi.org/10.1109/ISPASS.2006.1620806","url":null,"abstract":"The performance of computer systems depends, among other things, on the workload. Performance evaluations are therefore often done using logs of workloads on current productions systems, under the assumption that such real workloads are representative and reliable; likewise, workload modeling is typically based on real workloads. We show, however, that real workloads may also contain anomalies that make them non-representative and unreliable. This is a special case of multi-class workloads, where one class is the \"real\" workload which we wish to use in the evaluation, and the other class contaminates the log with \"bogus\" data. We provide several examples of this situation, including a previously unrecognized type of anomaly we call \"workload flurries\": surges of activity with a repetitive nature, caused by a single user, that dominate the workload for a relatively short period. Using a workload with such anomalies in effect emphasizes rare and unique events (e.g. occurring for a few days out of two years of logged data), and risks optimizing the design decision for the anomalous workload at the expense of the normal workload. Thus we claim that such anomalies should be removed from the workload before it is used in evaluations, and that ignoring them is actually an unjustifiable approach.","PeriodicalId":369192,"journal":{"name":"2006 IEEE International Symposium on Performance Analysis of Systems and Software","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130820245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 53

Evaluating the efficacy of statistical simulation for design space exploration 评估统计模拟在设计空间探索中的有效性

2006 IEEE International Symposium on Performance Analysis of Systems and Software

Pub Date : 2006-03-19 DOI: 10.1109/ISPASS.2006.1620791

A. Joshi, J. Yi, R. Bell, L. Eeckhout, L. John, D. Lilja

Recent research has proposed statistical simulation as a technique for fast performance evaluation of superscalar microprocessors. The idea in statistical simulation is to measure a program's key performance characteristics, generate a synthetic trace with these characteristics, and simulate the synthetic trace. Due to the probabilistic nature of statistical simulation the performance estimate quickly converges to a solution, making it an attractive technique to efficiently cull a large microprocessor design space. In this paper, we evaluate the efficacy of statistical simulation in exploring the design space. Specifically, we characterize the following aspects of statistical simulation: (i) fidelity of performance bottlenecks, with respect to cycle-accurate simulation of the program, (ii) ability' to track design changes, and (Hi) trade-off between accuracy and complexity in statistical simulation models. In our characterization experiments, we use the Plackett & Burman (P&B) design to systematically stress statistical simulation by creating different performance bottlenecks. The key results from this paper are: (1) Synthetic traces stress at least the same 10 most significant processor performance bottlenecks as the original workload, (2) Statistical simulation can effectively track design changes to identify feasible design points in a large design space of aggressive microarchitectures, (3) Our evaluation of 4 statistical simulation models shows that although a very detailed model is needed to achieve a good absolute accuracy in performance estimation, a simple model is sufficient to achieve good relative accuracy, and (4) The P&B design technique can be used to quickly identify areas to focus on to improve the accuracy of the statistical simulation model.

最近的研究提出了统计模拟作为一种快速评估超标量微处理器性能的技术。统计模拟的思想是度量程序的关键性能特征，生成具有这些特征的合成跟踪，并模拟合成跟踪。由于统计模拟的概率性质，性能估计迅速收敛到一个解决方案，使其成为有效地筛选大型微处理器设计空间的一种有吸引力的技术。在本文中，我们评估了统计模拟在探索设计空间方面的功效。具体来说，我们描述了统计仿真的以下方面:(i)性能瓶颈的保真度，关于程序的周期精确仿真，(ii)跟踪设计更改的能力，以及(Hi)统计仿真模型中准确性和复杂性之间的权衡。在我们的表征实验中，我们使用Plackett & Burman (P&B)设计，通过创建不同的性能瓶颈来系统地强调统计模拟。本文的主要结论是:(1)合成轨迹强调了至少与原始工作负载相同的10个最重要的处理器性能瓶颈;(2)统计仿真可以有效地跟踪设计变化，从而在具有侵略性的微架构的大设计空间中识别可行的设计点;(3)我们对4个统计仿真模型的评估表明，尽管需要非常详细的模型来实现良好的性能估计的绝对准确性;(4)采用P&B设计技术可以快速识别重点区域，提高统计仿真模型的精度。

{"title":"Evaluating the efficacy of statistical simulation for design space exploration","authors":"A. Joshi, J. Yi, R. Bell, L. Eeckhout, L. John, D. Lilja","doi":"10.1109/ISPASS.2006.1620791","DOIUrl":"https://doi.org/10.1109/ISPASS.2006.1620791","url":null,"abstract":"Recent research has proposed statistical simulation as a technique for fast performance evaluation of superscalar microprocessors. The idea in statistical simulation is to measure a program's key performance characteristics, generate a synthetic trace with these characteristics, and simulate the synthetic trace. Due to the probabilistic nature of statistical simulation the performance estimate quickly converges to a solution, making it an attractive technique to efficiently cull a large microprocessor design space. In this paper, we evaluate the efficacy of statistical simulation in exploring the design space. Specifically, we characterize the following aspects of statistical simulation: (i) fidelity of performance bottlenecks, with respect to cycle-accurate simulation of the program, (ii) ability' to track design changes, and (Hi) trade-off between accuracy and complexity in statistical simulation models. In our characterization experiments, we use the Plackett & Burman (P&B) design to systematically stress statistical simulation by creating different performance bottlenecks. The key results from this paper are: (1) Synthetic traces stress at least the same 10 most significant processor performance bottlenecks as the original workload, (2) Statistical simulation can effectively track design changes to identify feasible design points in a large design space of aggressive microarchitectures, (3) Our evaluation of 4 statistical simulation models shows that although a very detailed model is needed to achieve a good absolute accuracy in performance estimation, a simple model is sufficient to achieve good relative accuracy, and (4) The P&B design technique can be used to quickly identify areas to focus on to improve the accuracy of the statistical simulation model.","PeriodicalId":369192,"journal":{"name":"2006 IEEE International Symposium on Performance Analysis of Systems and Software","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121051973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2006 IEEE International Symposium on Performance Analysis of Systems and Software

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀