Converting data-parallelism to task-parallelism by rewrites: purely functional programs across multiple GPUs

Proceedings of the 4th ACM SIGPLAN Workshop on Functional High-Performance Computing Pub Date : 2015-08-30 DOI:10.1145/2808091.2808093

Bo Joel Svensson, Michael Vollmer, Eric Holk, T. L. McDonell, Ryan Newton

引用次数: 3

Abstract

High-level domain-specific languages for array processing on the GPU are increasingly common, but they typically only run on a single GPU. As computational power is distributed across more devices, languages must target multiple devices simultaneously. To this end, we present a compositional translation that fissions data-parallel programs in the Accelerate language, allowing subsequent compiler and runtime stages to map computations onto multiple devices for improved performance---even programs that begin as a single data-parallel kernel.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过重写将数据并行性转换为任务并行性:跨多个gpu的纯功能程序

用于GPU上的数组处理的高级领域特定语言越来越普遍，但它们通常只在单个GPU上运行。随着计算能力分布在更多的设备上，语言必须同时针对多个设备。为此，我们提出了一种组合转换，它在Accelerate语言中分解数据并行程序，允许随后的编译器和运行时阶段将计算映射到多个设备上以提高性能——甚至是作为单个数据并行内核开始的程序。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 4th ACM SIGPLAN Workshop on Functional High-Performance Computing

自引率

0.00%

发文量

期刊最新文献

Scalan: a framework for domain-specific hotspot optimization (invited tutorial) Generate and offshore: type-safe and modular code generation for low-level optimization Functional array streams Proceedings of the 4th ACM SIGPLAN Workshop on Functional High-Performance Computing Skeletons for distributed topological computation