Reconciling Sampling and Direct Instrumentation for Unintrusive Call-Path Profiling of MPI Programs

Z. Szebenyi, T. Gamblin, M. Schulz, B. Supinski, F. Wolf, B. Wylie
{"title":"Reconciling Sampling and Direct Instrumentation for Unintrusive Call-Path Profiling of MPI Programs","authors":"Z. Szebenyi, T. Gamblin, M. Schulz, B. Supinski, F. Wolf, B. Wylie","doi":"10.1109/IPDPS.2011.67","DOIUrl":null,"url":null,"abstract":"We can profile the performance behavior of parallel programs at the level of individual call paths through sampling or direct instrumentation. While we can easily control measurement dilation by adjusting the sampling frequency, the statistical nature of sampling and the difficulty of accessing the parameters of sampled events make it unsuitable for obtaining certain communication metrics, such as the size of message payloads. Alternatively, direct instrumentation, which is preferable for capturing message-passing events, can excessively dilate measurements, particularly for C++ programs, which often have many short but frequently called class member functions. Thus, we combine these techniques in a unified framework that exploits the strengths of each approach while avoiding their weaknesses: We use direct instrumentation to intercept MPI routines while we record the execution of the remaining code through low-overhead sampling. One of the main technical hurdles mastered was the inexpensive and portable determination of call-path information during the invocation of MPI routines. We show that the overhead of our implementation is sufficiently low to support substantial performance improvement of a C++ fluid-dynamics code.","PeriodicalId":355100,"journal":{"name":"2011 IEEE International Parallel & Distributed Processing Symposium","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Parallel & Distributed Processing Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2011.67","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20

Abstract

We can profile the performance behavior of parallel programs at the level of individual call paths through sampling or direct instrumentation. While we can easily control measurement dilation by adjusting the sampling frequency, the statistical nature of sampling and the difficulty of accessing the parameters of sampled events make it unsuitable for obtaining certain communication metrics, such as the size of message payloads. Alternatively, direct instrumentation, which is preferable for capturing message-passing events, can excessively dilate measurements, particularly for C++ programs, which often have many short but frequently called class member functions. Thus, we combine these techniques in a unified framework that exploits the strengths of each approach while avoiding their weaknesses: We use direct instrumentation to intercept MPI routines while we record the execution of the remaining code through low-overhead sampling. One of the main technical hurdles mastered was the inexpensive and portable determination of call-path information during the invocation of MPI routines. We show that the overhead of our implementation is sufficiently low to support substantial performance improvement of a C++ fluid-dynamics code.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
调和采样和直接仪器的非侵入式调用路径分析的MPI程序
我们可以通过抽样或直接检测在单个调用路径级别上分析并行程序的性能行为。虽然我们可以很容易地通过调整采样频率来控制测量扩展,但采样的统计性质和访问采样事件参数的难度使得它不适合获得某些通信指标,例如消息有效负载的大小。另外,对于捕获消息传递事件来说,直接插装更可取,但它可能会过度扩展度量,特别是对于c++程序,因为c++程序通常有许多简短但经常调用的类成员函数。因此,我们将这些技术结合在一个统一的框架中,利用每种方法的优点,同时避免它们的缺点:我们使用直接检测来拦截MPI例程,同时通过低开销采样记录剩余代码的执行情况。克服的主要技术障碍之一是在MPI例程调用期间廉价且可移植地确定调用路径信息。我们展示了实现的开销足够低,足以支持c++流体动力学代码的实质性性能改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Large-Scale Semantic Concept Detection on Manycore Platforms for Multimedia Mining Two-Stage Tridiagonal Reduction for Dense Symmetric Matrices Using Tile Algorithms on Multicore Architectures A Study of Parallel Particle Tracing for Steady-State and Time-Varying Flow Fields Smith-Waterman Alignment of Huge Sequences with GPU in Linear Space CheCL: Transparent Checkpointing and Process Migration of OpenCL Applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1