Rank-Polymorphism for Shape-Guided Blocking

Proceedings of the 11th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing Pub Date : 2023-08-30 DOI:10.1145/3609024.3609410

Artjoms Šinkarovs, Thomas Koopman, S. Scholz

{"title":"Rank-Polymorphism for Shape-Guided Blocking","authors":"Artjoms Šinkarovs, Thomas Koopman, S. Scholz","doi":"10.1145/3609024.3609410","DOIUrl":null,"url":null,"abstract":"Many numerical algorithms on matrices or tensors can be formulated in a blocking style which improves performance due to better cache locality. In imperative languages, blocking is achieved by introducing additional layers of loops in a nested fashion alongside with suitable adjustments in index computations. While this process is tedious and error-prone, it is also difficult to implement a generically blocked version that would support arbitrary levels of blocking. At the example of matrix multiply, this paper demonstrates how rank-polymorphic array languages enable the specification of such generically blocked algorithms in a simple, recursive form. The depth of the blocking as well as blocking factors can be encoded in the structure of array shapes. In turn, reshaping arrays makes it possible to switch between blocked and non-blocked arrays. Through rank-polymorphic array combinators, any specification of loop boundaries or explicit index computations can be avoided. Firstly, we propose a dependently-typed framework for rank-polymorphic arrays. We use it to demonstrate that all blocked algorithms can be naturally derived by induction on the argument shapes. Our framework guarantees lack of out-of-bound indexing, and we also prove that all the blocked versions compute the same results as the canonical algorithm. Secondly, we translate our specification to the array language SaC. Not only do we show that we achieve similar conciseness in the implementation, but we also observe good performance of the generated code. We achieve a 7% improvement compared to the highly-optimised OpenBLAS library, and 3% compared to Intel’s MKL library when running on a 32-core shared-memory system.","PeriodicalId":424755,"journal":{"name":"Proceedings of the 11th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3609024.3609410","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Many numerical algorithms on matrices or tensors can be formulated in a blocking style which improves performance due to better cache locality. In imperative languages, blocking is achieved by introducing additional layers of loops in a nested fashion alongside with suitable adjustments in index computations. While this process is tedious and error-prone, it is also difficult to implement a generically blocked version that would support arbitrary levels of blocking. At the example of matrix multiply, this paper demonstrates how rank-polymorphic array languages enable the specification of such generically blocked algorithms in a simple, recursive form. The depth of the blocking as well as blocking factors can be encoded in the structure of array shapes. In turn, reshaping arrays makes it possible to switch between blocked and non-blocked arrays. Through rank-polymorphic array combinators, any specification of loop boundaries or explicit index computations can be avoided. Firstly, we propose a dependently-typed framework for rank-polymorphic arrays. We use it to demonstrate that all blocked algorithms can be naturally derived by induction on the argument shapes. Our framework guarantees lack of out-of-bound indexing, and we also prove that all the blocked versions compute the same results as the canonical algorithm. Secondly, we translate our specification to the array language SaC. Not only do we show that we achieve similar conciseness in the implementation, but we also observe good performance of the generated code. We achieve a 7% improvement compared to the highly-optimised OpenBLAS library, and 3% compared to Intel’s MKL library when running on a 32-core shared-memory system.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

形状引导块的秩多态性

许多基于矩阵或张量的数值算法都可以采用块的形式来表述，这种方式由于更好的缓存局部性而提高了性能。在命令式语言中，阻塞是通过以嵌套的方式引入额外的循环层，并在索引计算中进行适当的调整来实现的。虽然这个过程冗长且容易出错，但实现支持任意级别阻塞的通用阻塞版本也很困难。以矩阵乘法为例，本文演示了秩多态数组语言如何以简单的递归形式规范这种一般阻塞算法。阻塞的深度和阻塞因子可以编码在数组形状的结构中。反过来，重塑数组使得在阻塞和非阻塞数组之间切换成为可能。通过秩多态数组组合子，可以避免任何循环边界的指定或显式的索引计算。首先，我们提出了一个秩多态数组的依赖类型框架。我们用它来证明所有阻塞算法都可以通过对参数形状的归纳法自然导出。我们的框架保证了没有越界索引，并且我们还证明了所有被阻塞的版本都与规范算法计算相同的结果。其次，我们将规范转换为数组语言SaC。我们不仅展示了我们在实现中实现了类似的简洁性，而且还观察到生成的代码具有良好的性能。与高度优化的OpenBLAS库相比，我们实现了7%的改进，在32核共享内存系统上运行时，与英特尔的MKL库相比，我们实现了3%的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 11th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing

自引率

0.00%

发文量

期刊最新文献

Rank-Polymorphism for Shape-Guided Blocking Shape-Constrained Array Programming with Size-Dependent Types Efficient GPU Implementation of Affine Index Permutations on Arrays Proceedings of the 11th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing