Reconciling Sampling and Direct Instrumentation for Unintrusive Call-Path Profiling of MPI Programs

2011 IEEE International Parallel & Distributed Processing Symposium Pub Date : 2011-05-16 DOI:10.1109/IPDPS.2011.67

Z. Szebenyi, T. Gamblin, M. Schulz, B. Supinski, F. Wolf, B. Wylie

{"title":"Reconciling Sampling and Direct Instrumentation for Unintrusive Call-Path Profiling of MPI Programs","authors":"Z. Szebenyi, T. Gamblin, M. Schulz, B. Supinski, F. Wolf, B. Wylie","doi":"10.1109/IPDPS.2011.67","DOIUrl":null,"url":null,"abstract":"We can profile the performance behavior of parallel programs at the level of individual call paths through sampling or direct instrumentation. While we can easily control measurement dilation by adjusting the sampling frequency, the statistical nature of sampling and the difficulty of accessing the parameters of sampled events make it unsuitable for obtaining certain communication metrics, such as the size of message payloads. Alternatively, direct instrumentation, which is preferable for capturing message-passing events, can excessively dilate measurements, particularly for C++ programs, which often have many short but frequently called class member functions. Thus, we combine these techniques in a unified framework that exploits the strengths of each approach while avoiding their weaknesses: We use direct instrumentation to intercept MPI routines while we record the execution of the remaining code through low-overhead sampling. One of the main technical hurdles mastered was the inexpensive and portable determination of call-path information during the invocation of MPI routines. We show that the overhead of our implementation is sufficiently low to support substantial performance improvement of a C++ fluid-dynamics code.","PeriodicalId":355100,"journal":{"name":"2011 IEEE International Parallel & Distributed Processing Symposium","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Parallel & Distributed Processing Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2011.67","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 20

Abstract

We can profile the performance behavior of parallel programs at the level of individual call paths through sampling or direct instrumentation. While we can easily control measurement dilation by adjusting the sampling frequency, the statistical nature of sampling and the difficulty of accessing the parameters of sampled events make it unsuitable for obtaining certain communication metrics, such as the size of message payloads. Alternatively, direct instrumentation, which is preferable for capturing message-passing events, can excessively dilate measurements, particularly for C++ programs, which often have many short but frequently called class member functions. Thus, we combine these techniques in a unified framework that exploits the strengths of each approach while avoiding their weaknesses: We use direct instrumentation to intercept MPI routines while we record the execution of the remaining code through low-overhead sampling. One of the main technical hurdles mastered was the inexpensive and portable determination of call-path information during the invocation of MPI routines. We show that the overhead of our implementation is sufficiently low to support substantial performance improvement of a C++ fluid-dynamics code.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

调和采样和直接仪器的非侵入式调用路径分析的MPI程序

我们可以通过抽样或直接检测在单个调用路径级别上分析并行程序的性能行为。虽然我们可以很容易地通过调整采样频率来控制测量扩展，但采样的统计性质和访问采样事件参数的难度使得它不适合获得某些通信指标，例如消息有效负载的大小。另外，对于捕获消息传递事件来说，直接插装更可取，但它可能会过度扩展度量，特别是对于c++程序，因为c++程序通常有许多简短但经常调用的类成员函数。因此，我们将这些技术结合在一个统一的框架中，利用每种方法的优点，同时避免它们的缺点:我们使用直接检测来拦截MPI例程，同时通过低开销采样记录剩余代码的执行情况。克服的主要技术障碍之一是在MPI例程调用期间廉价且可移植地确定调用路径信息。我们展示了实现的开销足够低，足以支持c++流体动力学代码的实质性性能改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2011 IEEE International Parallel & Distributed Processing Symposium

自引率

0.00%

发文量