分层循环记帐:应用程序性能调优的新方法

2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2015-03-29 DOI:10.1109/ISPASS.2015.7095790

A. Nowak, D. Levinthal, W. Zwaenepoel

{"title":"分层循环记帐:应用程序性能调优的新方法","authors":"A. Nowak, D. Levinthal, W. Zwaenepoel","doi":"10.1109/ISPASS.2015.7095790","DOIUrl":null,"url":null,"abstract":"To address the growing difficulty of performance debugging on modern processors with increasingly complex micro-architectures, we present Hierarchical Cycle Accounting (HCA), a structured, hierarchical, architecture-agnostic methodology for the identification of performance issues in workloads running on these modern processors. HCA reports to the user the cost of a number of execution components, such as load latency, memory bandwidth, instruction starvation, and branch misprediction. A critical novel feature of HCA is that all cost components are presented in the same unit, core pipeline cycles. Their relative importance can therefore be compared directly. These cost components are furthermore presented in a hierarchical fashion, with architecture-agnostic components at the top levels of the hierarchy and architecture-specific components at the bottom. This hierarchical structure is useful in guiding the performance debugging effort to the places where it can be the most effective. For a given architecture, the cost components are computed based on the observation of architecture-specific events, typically provided by a performance monitoring unit (PMU), and using a set of formulas to attribute a certain cost in cycles to each event. The selection of what PMU events to use, their validation, and the derivation of the formulas are done offline by an architecture expert, thereby freeing the non-expert from the burdensome and error-prone task of directly interpreting PMU data. We have implemented the HCA methodology in Gooda, a publicly available tool. We describe the application of Gooda to the analysis of several workloads in wide use, showing how HCA's features facilitated performance debugging for these applications. We also describe the discovery of relevant bugs in Intel hardware and the Linux Kernel as a result of using HCA.","PeriodicalId":189378,"journal":{"name":"2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Hierarchical cycle accounting: a new method for application performance tuning\",\"authors\":\"A. Nowak, D. Levinthal, W. Zwaenepoel\",\"doi\":\"10.1109/ISPASS.2015.7095790\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To address the growing difficulty of performance debugging on modern processors with increasingly complex micro-architectures, we present Hierarchical Cycle Accounting (HCA), a structured, hierarchical, architecture-agnostic methodology for the identification of performance issues in workloads running on these modern processors. HCA reports to the user the cost of a number of execution components, such as load latency, memory bandwidth, instruction starvation, and branch misprediction. A critical novel feature of HCA is that all cost components are presented in the same unit, core pipeline cycles. Their relative importance can therefore be compared directly. These cost components are furthermore presented in a hierarchical fashion, with architecture-agnostic components at the top levels of the hierarchy and architecture-specific components at the bottom. This hierarchical structure is useful in guiding the performance debugging effort to the places where it can be the most effective. For a given architecture, the cost components are computed based on the observation of architecture-specific events, typically provided by a performance monitoring unit (PMU), and using a set of formulas to attribute a certain cost in cycles to each event. The selection of what PMU events to use, their validation, and the derivation of the formulas are done offline by an architecture expert, thereby freeing the non-expert from the burdensome and error-prone task of directly interpreting PMU data. We have implemented the HCA methodology in Gooda, a publicly available tool. We describe the application of Gooda to the analysis of several workloads in wide use, showing how HCA's features facilitated performance debugging for these applications. We also describe the discovery of relevant bugs in Intel hardware and the Linux Kernel as a result of using HCA.\",\"PeriodicalId\":189378,\"journal\":{\"name\":\"2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-03-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISPASS.2015.7095790\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPASS.2015.7095790","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

摘要

为了解决在具有日益复杂的微体系结构的现代处理器上进行性能调试日益困难的问题，我们提出了分层周期核算(HCA)，这是一种结构化的、分层的、与体系结构无关的方法，用于识别在这些现代处理器上运行的工作负载中的性能问题。HCA向用户报告许多执行组件的成本，例如负载延迟、内存带宽、指令饥饿和分支错误预测。HCA的一个重要特点是，所有的成本组成部分都在同一个单元中，即核心管道周期中。因此，可以直接比较它们的相对重要性。这些成本组件进一步以层次方式呈现，与体系结构无关的组件位于层次结构的顶层，而特定于体系结构的组件位于底层。这种分层结构有助于将性能调试工作引导到最有效的地方。对于给定的体系结构，成本组件是基于对特定于体系结构的事件的观察来计算的，这些事件通常由性能监视单元(PMU)提供，并使用一组公式将特定的周期成本归因于每个事件。要使用的PMU事件的选择、它们的验证以及公式的推导都由体系结构专家离线完成，从而将非专家从直接解释PMU数据的繁重且容易出错的任务中解放出来。我们已经在Gooda(一个公开可用的工具)中实现了HCA方法。我们描述了Gooda的应用程序，分析了几种广泛使用的工作负载，展示了HCA的特性如何促进这些应用程序的性能调试。我们还描述了由于使用HCA而在英特尔硬件和Linux内核中发现的相关错误。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Hierarchical cycle accounting: a new method for application performance tuning

To address the growing difficulty of performance debugging on modern processors with increasingly complex micro-architectures, we present Hierarchical Cycle Accounting (HCA), a structured, hierarchical, architecture-agnostic methodology for the identification of performance issues in workloads running on these modern processors. HCA reports to the user the cost of a number of execution components, such as load latency, memory bandwidth, instruction starvation, and branch misprediction. A critical novel feature of HCA is that all cost components are presented in the same unit, core pipeline cycles. Their relative importance can therefore be compared directly. These cost components are furthermore presented in a hierarchical fashion, with architecture-agnostic components at the top levels of the hierarchy and architecture-specific components at the bottom. This hierarchical structure is useful in guiding the performance debugging effort to the places where it can be the most effective. For a given architecture, the cost components are computed based on the observation of architecture-specific events, typically provided by a performance monitoring unit (PMU), and using a set of formulas to attribute a certain cost in cycles to each event. The selection of what PMU events to use, their validation, and the derivation of the formulas are done offline by an architecture expert, thereby freeing the non-expert from the burdensome and error-prone task of directly interpreting PMU data. We have implemented the HCA methodology in Gooda, a publicly available tool. We describe the application of Gooda to the analysis of several workloads in wide use, showing how HCA's features facilitated performance debugging for these applications. We also describe the discovery of relevant bugs in Intel hardware and the Linux Kernel as a result of using HCA.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

自引率

0.00%

发文量