Array streaming for array programming

M. R. B. Kristensen, J. Avery
{"title":"Array streaming for array programming","authors":"M. R. B. Kristensen, J. Avery","doi":"10.1504/IJCSE.2017.10011354","DOIUrl":null,"url":null,"abstract":"A barrier to efficient array programming, for example in Python/NumPy, is that algorithms written as pure array operations completely without loops, while most efficient on small input, can lead to explosions in memory use. The present paper presents a solution to this problem using array streaming, implemented in the automatic parallelisation high-performance framework Bohrium. This makes it possible to use array programming in Python/NumPy code directly, even when the apparent memory requirement exceeds the machine capacity, since the automatic streaming eliminates the temporary memory overhead by performing calculations in per-thread registers. Using Bohrium, we automatically fuse, stream, JIT-compile, and execute NumPy array operations on GPGPUs without modification to the user programs. We present performance evaluations of three benchmarks, all of which show dramatic reductions in memory use from streaming, yielding corresponding improvements in speed and utilisation of GPGPU-cores. The fusion step is implemented using the theoretical framework presented in Kristensen et al. (2016), using a streaming-maximising cost function. The streaming-enabled Bohrium effortlessly runs programs on input sizes several orders of magnitude beyond sizes that crash on pure NumPy due to exhausting system memory.","PeriodicalId":340410,"journal":{"name":"Int. J. Comput. Sci. Eng.","volume":"61 18","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Comput. Sci. Eng.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/IJCSE.2017.10011354","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

A barrier to efficient array programming, for example in Python/NumPy, is that algorithms written as pure array operations completely without loops, while most efficient on small input, can lead to explosions in memory use. The present paper presents a solution to this problem using array streaming, implemented in the automatic parallelisation high-performance framework Bohrium. This makes it possible to use array programming in Python/NumPy code directly, even when the apparent memory requirement exceeds the machine capacity, since the automatic streaming eliminates the temporary memory overhead by performing calculations in per-thread registers. Using Bohrium, we automatically fuse, stream, JIT-compile, and execute NumPy array operations on GPGPUs without modification to the user programs. We present performance evaluations of three benchmarks, all of which show dramatic reductions in memory use from streaming, yielding corresponding improvements in speed and utilisation of GPGPU-cores. The fusion step is implemented using the theoretical framework presented in Kristensen et al. (2016), using a streaming-maximising cost function. The streaming-enabled Bohrium effortlessly runs programs on input sizes several orders of magnitude beyond sizes that crash on pure NumPy due to exhausting system memory.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于数组编程的数组流
高效数组编程(例如在Python/NumPy中)的一个障碍是,将算法编写为完全没有循环的纯数组操作,虽然在小输入时效率最高,但可能导致内存使用激增。本文提出了一种在自动并行高性能框架Bohrium中实现的阵列流的解决方案。这使得直接在Python/NumPy代码中使用数组编程成为可能,即使在表观内存需求超过机器容量时也是如此,因为自动流通过在每个线程寄存器中执行计算消除了临时内存开销。使用Bohrium,我们可以在gpgpu上自动融合、流、jit编译和执行NumPy数组操作,而无需修改用户程序。我们提供了三个基准的性能评估,所有这些都显示了流的内存使用的显着减少,从而在速度和gpgpu内核的利用率方面产生相应的改进。融合步骤使用Kristensen等人(2016)提出的理论框架实现,使用流最大化成本函数。支持流的Bohrium可以毫不费力地在输入大小上运行程序,而输入大小比纯NumPy上由于耗尽系统内存而崩溃的大小要大几个数量级。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
ECC-based lightweight mutual authentication protocol for fog enabled IoT system using three-way authentication procedure Gene selection and classification combining information gain ratio with fruit fly optimisation algorithm for single-cell RNA-seq data Attitude control of an unmanned patrol helicopter based on an optimised spiking neural membrane system for use in coal mines CEMP-IR: a novel location aware cache invalidation and replacement policy Prediction of consumer preference for the bottom of the pyramid using EEG-based deep model
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1