Using automated performance modeling to find scalability bugs in complex codes

2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC) Pub Date : 2013-11-17 DOI:10.1145/2503210.2503277

A. Calotoiu, T. Hoefler, Marius Poke, F. Wolf

引用次数: 137

Abstract

Many parallel applications suffer from latent performance limitations that may prevent them from scaling to larger machine sizes. Often, such scalability bugs manifest themselves only when an attempt to scale the code is actually being made-a point where remediation can be difficult. However, creating analytical performance models that would allow such issues to be pinpointed earlier is so laborious that application developers attempt it at most for a few selected kernels, running the risk of missing harmful bottlenecks. In this paper, we show how both coverage and speed of this scalability analysis can be substantially improved. Generating an empirical performance model automatically for each part of a parallel program, we can easily identify those parts that will reduce performance at larger core counts. Using a climate simulation as an example, we demonstrate that scalability bugs are not confined to those routines usually chosen as kernels.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用自动性能建模来查找复杂代码中的可伸缩性错误

许多并行应用程序存在潜在的性能限制，这些限制可能会阻止它们扩展到更大的机器规模。通常，这种可伸缩性错误只有在实际尝试扩展代码时才会显现出来——此时很难进行补救。然而，创建分析性能模型以允许更早地确定此类问题是非常费力的，因此应用程序开发人员最多只能针对几个选定的内核进行尝试，从而冒着错过有害瓶颈的风险。在本文中，我们将展示如何有效地改进这种可伸缩性分析的覆盖范围和速度。为并行程序的每个部分自动生成一个经验性能模型，我们可以很容易地识别那些在较大的核心计数下会降低性能的部分。以气候模拟为例，我们证明了可扩展性缺陷并不局限于通常选择作为内核的那些例程。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC)

自引率

0.00%

发文量