Jan-Patrick Lehr, A. Calotoiu, C. Bischof, F. Wolf
{"title":"Automatic Instrumentation Refinement for Empirical Performance Modeling","authors":"Jan-Patrick Lehr, A. Calotoiu, C. Bischof, F. Wolf","doi":"10.1109/ProTools49597.2019.00011","DOIUrl":null,"url":null,"abstract":"The analysis of runtime performance is important during the development and throughout the life cycle of HPC applications. One important objective in performance analysis is to identify regions in the code that show significant runtime increase with larger problem sizes or more processes. One approach to identify such regions is to use empirical performance modeling, i.e., building performance models based on measurements. While the modeling itself has already been streamlined and automated, the generation of the required measurements is time consuming and tedious. In this paper, we propose an approach to automatically adjust the instrumentation to reduce overhead and focus the measurements to relevant regions, i.e.,such that show increasing runtime with larger input parameters or increasing number of MPI ranks. Our approach employs Extra-P to generate performance models, which it then uses to extrapolate runtime and, finally, decide which functions should be kept for measurement. Also, the analysis expands the instrumentation, by heuristically adding functions based on static source-code features. We evaluate our approach using benchmarks from SPEC CPU 2006, SU2, and parallel MILC. The evaluation shows that our approach can filter functions of little interest and generate profiles that contain mostly relevant regions. For example, the overhead for SU2 can be improved automatically from 200% to 11% compared to filtered Score-P measurements.","PeriodicalId":418029,"journal":{"name":"2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ProTools49597.2019.00011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
The analysis of runtime performance is important during the development and throughout the life cycle of HPC applications. One important objective in performance analysis is to identify regions in the code that show significant runtime increase with larger problem sizes or more processes. One approach to identify such regions is to use empirical performance modeling, i.e., building performance models based on measurements. While the modeling itself has already been streamlined and automated, the generation of the required measurements is time consuming and tedious. In this paper, we propose an approach to automatically adjust the instrumentation to reduce overhead and focus the measurements to relevant regions, i.e.,such that show increasing runtime with larger input parameters or increasing number of MPI ranks. Our approach employs Extra-P to generate performance models, which it then uses to extrapolate runtime and, finally, decide which functions should be kept for measurement. Also, the analysis expands the instrumentation, by heuristically adding functions based on static source-code features. We evaluate our approach using benchmarks from SPEC CPU 2006, SU2, and parallel MILC. The evaluation shows that our approach can filter functions of little interest and generate profiles that contain mostly relevant regions. For example, the overhead for SU2 can be improved automatically from 200% to 11% compared to filtered Score-P measurements.
运行时性能分析在HPC应用程序的开发和整个生命周期中非常重要。性能分析的一个重要目标是确定代码中随着较大的问题规模或更多的进程而显着增加运行时的区域。识别这些区域的一种方法是使用经验性能建模,即基于测量建立性能模型。虽然建模本身已经被简化和自动化,但是所需度量的生成是耗时且乏味的。在本文中,我们提出了一种自动调整仪器的方法,以减少开销并将测量集中到相关区域,即随着输入参数的增加或MPI排名的增加,运行时间会增加。我们的方法使用Extra-P来生成性能模型,然后使用它来推断运行时,并最终决定应该保留哪些功能以进行度量。此外,通过启发式地添加基于静态源代码特性的函数,分析扩展了工具。我们使用SPEC CPU 2006、SU2和并行MILC的基准测试来评估我们的方法。评估表明,我们的方法可以过滤不感兴趣的函数,并生成包含大多数相关区域的轮廓。例如,与过滤后的Score-P测量值相比,SU2的开销可以自动从200%提高到11%。