Transformation of Scientific Algorithms to Parallel Computing Code: Single GPU and MPI Multi GPU Backends with Subdomain Support

2011 Symposium on Application Accelerators in High-Performance Computing Pub Date : 2011-07-01 DOI:10.1109/SAAHPC.2011.12

B. Meyer, Christian Plessl, Jens Forstner

引用次数: 3

Abstract

We propose an approach for high-performance scientific computing that separates the description of algorithms from the generation of code for parallel hardware architectures like Multi-Core CPUs, GPUs or FPGAs. This way, a scientist can focus on his domain of expertise by describing his algorithms generically without the need to have knowledge of specific hardware architectures, programming languages, APIs or tool flows. We present our prototype implementation that allows for transforming generic descriptions of algorithms with intensive array-type data access to highly optimized code for GPU and multi GPU cluster systems. We evaluate the approach for an example from the domain of computational nanophotonics and show that our current tool flow is able to generate efficient code that achieves speedups of up to 15.3x for a single GPU and even 35.9x for a multi GPU setup compared to a reference CPU implementation.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

科学算法到并行计算代码的转换:支持子域的单GPU和MPI多GPU后端

我们提出了一种高性能科学计算方法，将算法描述与并行硬件架构(如多核cpu、gpu或fpga)的代码生成分离开来。通过这种方式，科学家可以通过描述他的算法来专注于他的专业领域，而不需要了解特定的硬件架构、编程语言、api或工具流。我们提出了我们的原型实现，它允许将具有密集数组类型数据访问的算法的通用描述转换为GPU和多GPU集群系统的高度优化代码。我们对计算纳米光子学领域的一个例子进行了评估，并表明我们当前的工具流能够生成高效的代码，与参考CPU实现相比，单个GPU的速度提高了15.3倍，多GPU设置的速度提高了35.9倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2011 Symposium on Application Accelerators in High-Performance Computing

自引率

0.00%

发文量