Array streaming for array programming

Int. J. Comput. Sci. Eng. Pub Date : 1900-01-01 DOI:10.1504/IJCSE.2017.10011354

M. R. B. Kristensen, J. Avery

{"title":"Array streaming for array programming","authors":"M. R. B. Kristensen, J. Avery","doi":"10.1504/IJCSE.2017.10011354","DOIUrl":null,"url":null,"abstract":"A barrier to efficient array programming, for example in Python/NumPy, is that algorithms written as pure array operations completely without loops, while most efficient on small input, can lead to explosions in memory use. The present paper presents a solution to this problem using array streaming, implemented in the automatic parallelisation high-performance framework Bohrium. This makes it possible to use array programming in Python/NumPy code directly, even when the apparent memory requirement exceeds the machine capacity, since the automatic streaming eliminates the temporary memory overhead by performing calculations in per-thread registers. Using Bohrium, we automatically fuse, stream, JIT-compile, and execute NumPy array operations on GPGPUs without modification to the user programs. We present performance evaluations of three benchmarks, all of which show dramatic reductions in memory use from streaming, yielding corresponding improvements in speed and utilisation of GPGPU-cores. The fusion step is implemented using the theoretical framework presented in Kristensen et al. (2016), using a streaming-maximising cost function. The streaming-enabled Bohrium effortlessly runs programs on input sizes several orders of magnitude beyond sizes that crash on pure NumPy due to exhausting system memory.","PeriodicalId":340410,"journal":{"name":"Int. J. Comput. Sci. Eng.","volume":"61 18","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Comput. Sci. Eng.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/IJCSE.2017.10011354","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

A barrier to efficient array programming, for example in Python/NumPy, is that algorithms written as pure array operations completely without loops, while most efficient on small input, can lead to explosions in memory use. The present paper presents a solution to this problem using array streaming, implemented in the automatic parallelisation high-performance framework Bohrium. This makes it possible to use array programming in Python/NumPy code directly, even when the apparent memory requirement exceeds the machine capacity, since the automatic streaming eliminates the temporary memory overhead by performing calculations in per-thread registers. Using Bohrium, we automatically fuse, stream, JIT-compile, and execute NumPy array operations on GPGPUs without modification to the user programs. We present performance evaluations of three benchmarks, all of which show dramatic reductions in memory use from streaming, yielding corresponding improvements in speed and utilisation of GPGPU-cores. The fusion step is implemented using the theoretical framework presented in Kristensen et al. (2016), using a streaming-maximising cost function. The streaming-enabled Bohrium effortlessly runs programs on input sizes several orders of magnitude beyond sizes that crash on pure NumPy due to exhausting system memory.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于数组编程的数组流

高效数组编程(例如在Python/NumPy中)的一个障碍是，将算法编写为完全没有循环的纯数组操作，虽然在小输入时效率最高，但可能导致内存使用激增。本文提出了一种在自动并行高性能框架Bohrium中实现的阵列流的解决方案。这使得直接在Python/NumPy代码中使用数组编程成为可能，即使在表观内存需求超过机器容量时也是如此，因为自动流通过在每个线程寄存器中执行计算消除了临时内存开销。使用Bohrium，我们可以在gpgpu上自动融合、流、jit编译和执行NumPy数组操作，而无需修改用户程序。我们提供了三个基准的性能评估，所有这些都显示了流的内存使用的显着减少，从而在速度和gpgpu内核的利用率方面产生相应的改进。融合步骤使用Kristensen等人(2016)提出的理论框架实现，使用流最大化成本函数。支持流的Bohrium可以毫不费力地在输入大小上运行程序，而输入大小比纯NumPy上由于耗尽系统内存而崩溃的大小要大几个数量级。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Int. J. Comput. Sci. Eng.

自引率

0.00%

发文量