Proceedings of the 11th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing最新文献

英文中文

Shape-Constrained Array Programming with Size-Dependent Types 具有大小依赖类型的形状约束数组编程

Proceedings of the 11th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing

Pub Date : 2023-08-30 DOI: 10.1145/3609024.3609412

Lubin Bailly, Troels Henriksen, M. Elsman

We present a dependent type system for enforcing array-size consistency in an ML-style functional array language. Our goal is to enforce shape-consistency at compile time and allow nontrivial transformations on array shapes, without the complexity such features tend to introduce in dependently typed languages. Sizes can be arbitrary expressions and size equality is purely syntactical, which fits naturally within a scheme that interprets size-polymorphic functions as having implicit arguments. When non-syntactical equalities are needed, we provide dynamic checking. In contrast to other dependently typed languages, we automate the book-keeping involved in tracking existential sizes, such as when filtering arrays. We formalise a large subset of the presented type system and prove it sound. We also discuss how to adapt the type system for a real implementation, including type inference, within the Futhark programming language.

我们提出了一个依赖类型系统，用于在ml风格的函数数组语言中强制数组大小的一致性。我们的目标是在编译时强制形状一致性，并允许对数组形状进行重要的转换，而不需要在依赖类型语言中引入这些特性所带来的复杂性。大小可以是任意表达式，大小相等纯粹是语法上的，这自然适合将大小多态函数解释为具有隐式参数的模式。当需要非语法等式时，我们提供动态检查。与其他依赖类型语言相比，我们自动化了跟踪存在大小所涉及的簿记，例如在过滤数组时。我们形式化了所提出的类型系统的一个大子集，并证明它是合理的。我们还讨论了如何在Futhark编程语言中为实际实现调整类型系统，包括类型推断。

引用次数: 0

Rank-Polymorphism for Shape-Guided Blocking 形状引导块的秩多态性

Proceedings of the 11th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing

Pub Date : 2023-08-30 DOI: 10.1145/3609024.3609410

Artjoms Šinkarovs, Thomas Koopman, S. Scholz

Many numerical algorithms on matrices or tensors can be formulated in a blocking style which improves performance due to better cache locality. In imperative languages, blocking is achieved by introducing additional layers of loops in a nested fashion alongside with suitable adjustments in index computations. While this process is tedious and error-prone, it is also difficult to implement a generically blocked version that would support arbitrary levels of blocking. At the example of matrix multiply, this paper demonstrates how rank-polymorphic array languages enable the specification of such generically blocked algorithms in a simple, recursive form. The depth of the blocking as well as blocking factors can be encoded in the structure of array shapes. In turn, reshaping arrays makes it possible to switch between blocked and non-blocked arrays. Through rank-polymorphic array combinators, any specification of loop boundaries or explicit index computations can be avoided. Firstly, we propose a dependently-typed framework for rank-polymorphic arrays. We use it to demonstrate that all blocked algorithms can be naturally derived by induction on the argument shapes. Our framework guarantees lack of out-of-bound indexing, and we also prove that all the blocked versions compute the same results as the canonical algorithm. Secondly, we translate our specification to the array language SaC. Not only do we show that we achieve similar conciseness in the implementation, but we also observe good performance of the generated code. We achieve a 7% improvement compared to the highly-optimised OpenBLAS library, and 3% compared to Intel’s MKL library when running on a 32-core shared-memory system.

许多基于矩阵或张量的数值算法都可以采用块的形式来表述，这种方式由于更好的缓存局部性而提高了性能。在命令式语言中，阻塞是通过以嵌套的方式引入额外的循环层，并在索引计算中进行适当的调整来实现的。虽然这个过程冗长且容易出错，但实现支持任意级别阻塞的通用阻塞版本也很困难。以矩阵乘法为例，本文演示了秩多态数组语言如何以简单的递归形式规范这种一般阻塞算法。阻塞的深度和阻塞因子可以编码在数组形状的结构中。反过来，重塑数组使得在阻塞和非阻塞数组之间切换成为可能。通过秩多态数组组合子，可以避免任何循环边界的指定或显式的索引计算。首先，我们提出了一个秩多态数组的依赖类型框架。我们用它来证明所有阻塞算法都可以通过对参数形状的归纳法自然导出。我们的框架保证了没有越界索引，并且我们还证明了所有被阻塞的版本都与规范算法计算相同的结果。其次，我们将规范转换为数组语言SaC。我们不仅展示了我们在实现中实现了类似的简洁性，而且还观察到生成的代码具有良好的性能。与高度优化的OpenBLAS库相比，我们实现了7%的改进，在32核共享内存系统上运行时，与英特尔的MKL库相比，我们实现了3%的改进。

{"title":"Rank-Polymorphism for Shape-Guided Blocking","authors":"Artjoms Šinkarovs, Thomas Koopman, S. Scholz","doi":"10.1145/3609024.3609410","DOIUrl":"https://doi.org/10.1145/3609024.3609410","url":null,"abstract":"Many numerical algorithms on matrices or tensors can be formulated in a blocking style which improves performance due to better cache locality. In imperative languages, blocking is achieved by introducing additional layers of loops in a nested fashion alongside with suitable adjustments in index computations. While this process is tedious and error-prone, it is also difficult to implement a generically blocked version that would support arbitrary levels of blocking. At the example of matrix multiply, this paper demonstrates how rank-polymorphic array languages enable the specification of such generically blocked algorithms in a simple, recursive form. The depth of the blocking as well as blocking factors can be encoded in the structure of array shapes. In turn, reshaping arrays makes it possible to switch between blocked and non-blocked arrays. Through rank-polymorphic array combinators, any specification of loop boundaries or explicit index computations can be avoided. Firstly, we propose a dependently-typed framework for rank-polymorphic arrays. We use it to demonstrate that all blocked algorithms can be naturally derived by induction on the argument shapes. Our framework guarantees lack of out-of-bound indexing, and we also prove that all the blocked versions compute the same results as the canonical algorithm. Secondly, we translate our specification to the array language SaC. Not only do we show that we achieve similar conciseness in the implementation, but we also observe good performance of the generated code. We achieve a 7% improvement compared to the highly-optimised OpenBLAS library, and 3% compared to Intel’s MKL library when running on a 32-core shared-memory system.","PeriodicalId":424755,"journal":{"name":"Proceedings of the 11th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121312418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient GPU Implementation of Affine Index Permutations on Arrays 阵列上仿射索引排列的高效GPU实现

Proceedings of the 11th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing

Pub Date : 2023-06-13 DOI: 10.1145/3609024.3609411

Mathis Bouverot-Dupuis, M. Sheeran

Optimal usage of the memory system is a key element of fast GPU algorithms. Unfortunately many common algorithms fail in this regard despite exhibiting great regularity in memory access patterns. In this paper we propose efficient kernels to permute the elements of an array. We handle a class of permutations known as Bit Matrix Multiply Complement (BMMC) permutations, for which we design kernels of speed comparable to that of a simple array copy. This is a first step towards implementing a set of array combinators based on these permutations.

内存系统的最佳使用是快速GPU算法的关键因素。不幸的是，许多常见的算法在这方面失败了，尽管在内存访问模式中表现出很大的规律性。在本文中，我们提出了一个有效的核来排列数组的元素。我们处理一类称为位矩阵乘补排列(BMMC)的排列，为此我们设计了速度与简单数组复制相当的内核。这是实现一组基于这些排列的数组组合子的第一步。

引用次数: 0

Proceedings of the 11th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing 第11届ACM SIGPLAN功能高性能与数值计算国际研讨会论文集

Proceedings of the 11th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing

Pub Date : 1900-01-01 DOI: 10.1145/3609024

引用次数: 0

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 11th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀