Rank-Polymorphism for Shape-Guided Blocking

Artjoms Šinkarovs, Thomas Koopman, S. Scholz
{"title":"Rank-Polymorphism for Shape-Guided Blocking","authors":"Artjoms Šinkarovs, Thomas Koopman, S. Scholz","doi":"10.1145/3609024.3609410","DOIUrl":null,"url":null,"abstract":"Many numerical algorithms on matrices or tensors can be formulated in a blocking style which improves performance due to better cache locality. In imperative languages, blocking is achieved by introducing additional layers of loops in a nested fashion alongside with suitable adjustments in index computations. While this process is tedious and error-prone, it is also difficult to implement a generically blocked version that would support arbitrary levels of blocking. At the example of matrix multiply, this paper demonstrates how rank-polymorphic array languages enable the specification of such generically blocked algorithms in a simple, recursive form. The depth of the blocking as well as blocking factors can be encoded in the structure of array shapes. In turn, reshaping arrays makes it possible to switch between blocked and non-blocked arrays. Through rank-polymorphic array combinators, any specification of loop boundaries or explicit index computations can be avoided. Firstly, we propose a dependently-typed framework for rank-polymorphic arrays. We use it to demonstrate that all blocked algorithms can be naturally derived by induction on the argument shapes. Our framework guarantees lack of out-of-bound indexing, and we also prove that all the blocked versions compute the same results as the canonical algorithm. Secondly, we translate our specification to the array language SaC. Not only do we show that we achieve similar conciseness in the implementation, but we also observe good performance of the generated code. We achieve a 7% improvement compared to the highly-optimised OpenBLAS library, and 3% compared to Intel’s MKL library when running on a 32-core shared-memory system.","PeriodicalId":424755,"journal":{"name":"Proceedings of the 11th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3609024.3609410","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Many numerical algorithms on matrices or tensors can be formulated in a blocking style which improves performance due to better cache locality. In imperative languages, blocking is achieved by introducing additional layers of loops in a nested fashion alongside with suitable adjustments in index computations. While this process is tedious and error-prone, it is also difficult to implement a generically blocked version that would support arbitrary levels of blocking. At the example of matrix multiply, this paper demonstrates how rank-polymorphic array languages enable the specification of such generically blocked algorithms in a simple, recursive form. The depth of the blocking as well as blocking factors can be encoded in the structure of array shapes. In turn, reshaping arrays makes it possible to switch between blocked and non-blocked arrays. Through rank-polymorphic array combinators, any specification of loop boundaries or explicit index computations can be avoided. Firstly, we propose a dependently-typed framework for rank-polymorphic arrays. We use it to demonstrate that all blocked algorithms can be naturally derived by induction on the argument shapes. Our framework guarantees lack of out-of-bound indexing, and we also prove that all the blocked versions compute the same results as the canonical algorithm. Secondly, we translate our specification to the array language SaC. Not only do we show that we achieve similar conciseness in the implementation, but we also observe good performance of the generated code. We achieve a 7% improvement compared to the highly-optimised OpenBLAS library, and 3% compared to Intel’s MKL library when running on a 32-core shared-memory system.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
形状引导块的秩多态性
许多基于矩阵或张量的数值算法都可以采用块的形式来表述,这种方式由于更好的缓存局部性而提高了性能。在命令式语言中,阻塞是通过以嵌套的方式引入额外的循环层,并在索引计算中进行适当的调整来实现的。虽然这个过程冗长且容易出错,但实现支持任意级别阻塞的通用阻塞版本也很困难。以矩阵乘法为例,本文演示了秩多态数组语言如何以简单的递归形式规范这种一般阻塞算法。阻塞的深度和阻塞因子可以编码在数组形状的结构中。反过来,重塑数组使得在阻塞和非阻塞数组之间切换成为可能。通过秩多态数组组合子,可以避免任何循环边界的指定或显式的索引计算。首先,我们提出了一个秩多态数组的依赖类型框架。我们用它来证明所有阻塞算法都可以通过对参数形状的归纳法自然导出。我们的框架保证了没有越界索引,并且我们还证明了所有被阻塞的版本都与规范算法计算相同的结果。其次,我们将规范转换为数组语言SaC。我们不仅展示了我们在实现中实现了类似的简洁性,而且还观察到生成的代码具有良好的性能。与高度优化的OpenBLAS库相比,我们实现了7%的改进,在32核共享内存系统上运行时,与英特尔的MKL库相比,我们实现了3%的改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Rank-Polymorphism for Shape-Guided Blocking Shape-Constrained Array Programming with Size-Dependent Types Efficient GPU Implementation of Affine Index Permutations on Arrays Proceedings of the 11th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1