关于并行前缀和推理的一个健全和完整的抽象

Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages Pub Date : 2014-01-08 DOI:10.1145/2535838.2535882

Nathan Chong, A. Donaldson, J. Ketema

{"title":"关于并行前缀和推理的一个健全和完整的抽象","authors":"Nathan Chong, A. Donaldson, J. Ketema","doi":"10.1145/2535838.2535882","DOIUrl":null,"url":null,"abstract":"Prefix sums are key building blocks in the implementation of many concurrent software applications, and recently much work has gone into efficiently implementing prefix sums to run on massively parallel graphics processing units (GPUs). Because they lie at the heart of many GPU-accelerated applications, the correctness of prefix sum implementations is of prime importance. We introduce a novel abstraction, the interval of summations, that allows scalable reasoning about implementations of prefix sums. We present this abstraction as a monoid, and prove a soundness and completeness result showing that a generic sequential prefix sum implementation is correct for an array of length $n$ if and only if it computes the correct result for a specific test case when instantiated with the interval of summations monoid. This allows correctness to be established by running a single test where the input and result require O(n lg(n)) space. This improves upon an existing result by Sheeran where the input requires O(n lg(n)) space and the result O(n2 \\lg(n)) space, and is more feasible for large n than a method by Voigtlaender that uses O(n) space for the input and result but requires running O(n2) tests. We then extend our abstraction and results to the context of data-parallel programs, developing an automated verification method for GPU implementations of prefix sums. Our method uses static verification to prove that a generic prefix sum implementation is data race-free, after which functional correctness of the implementation can be determined by running a single test case under the interval of summations abstraction. We present an experimental evaluation using four different prefix sum algorithms, showing that our method is highly automatic, scales to large thread counts, and significantly outperforms Voigtlaender's method when applied to large arrays.","PeriodicalId":20683,"journal":{"name":"Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2014-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":"{\"title\":\"A sound and complete abstraction for reasoning about parallel prefix sums\",\"authors\":\"Nathan Chong, A. Donaldson, J. Ketema\",\"doi\":\"10.1145/2535838.2535882\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Prefix sums are key building blocks in the implementation of many concurrent software applications, and recently much work has gone into efficiently implementing prefix sums to run on massively parallel graphics processing units (GPUs). Because they lie at the heart of many GPU-accelerated applications, the correctness of prefix sum implementations is of prime importance. We introduce a novel abstraction, the interval of summations, that allows scalable reasoning about implementations of prefix sums. We present this abstraction as a monoid, and prove a soundness and completeness result showing that a generic sequential prefix sum implementation is correct for an array of length $n$ if and only if it computes the correct result for a specific test case when instantiated with the interval of summations monoid. This allows correctness to be established by running a single test where the input and result require O(n lg(n)) space. This improves upon an existing result by Sheeran where the input requires O(n lg(n)) space and the result O(n2 \\\\lg(n)) space, and is more feasible for large n than a method by Voigtlaender that uses O(n) space for the input and result but requires running O(n2) tests. We then extend our abstraction and results to the context of data-parallel programs, developing an automated verification method for GPU implementations of prefix sums. Our method uses static verification to prove that a generic prefix sum implementation is data race-free, after which functional correctness of the implementation can be determined by running a single test case under the interval of summations abstraction. We present an experimental evaluation using four different prefix sum algorithms, showing that our method is highly automatic, scales to large thread counts, and significantly outperforms Voigtlaender's method when applied to large arrays.\",\"PeriodicalId\":20683,\"journal\":{\"name\":\"Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-01-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"29\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2535838.2535882\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2535838.2535882","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 29

摘要

前缀和是实现许多并发软件应用程序的关键构建块，最近有很多工作是为了有效地实现在大规模并行图形处理单元(gpu)上运行的前缀和。由于前缀和位于许多gpu加速应用程序的核心，因此前缀和实现的正确性至关重要。我们引入了一个新的抽象，求和区间，它允许对前缀和的实现进行可伸缩的推理。我们将这个抽象表示为一个单群，并证明了一个稳健性和完备性结果，该结果表明对于长度为$n$的数组，一般顺序前缀和实现是正确的，当且仅当它以求和单群区间实例化时计算特定测试用例的正确结果。这允许通过运行单个测试来建立正确性，其中输入和结果需要O(n lg(n))个空间。这改进了Sheeran的现有结果，其中输入需要O(n lg(n))空间，结果需要O(n2 \lg(n))空间，并且对于大n来说比Voigtlaender使用O(n)空间用于输入和结果但需要运行O(n2)个测试的方法更可行。然后，我们将我们的抽象和结果扩展到数据并行程序的上下文中，开发了一种用于GPU实现前缀和的自动验证方法。我们的方法使用静态验证来证明通用前缀和实现是无数据竞争的，之后可以通过在求和抽象的区间内运行单个测试用例来确定实现的功能正确性。我们使用四种不同的前缀和算法进行了实验评估，结果表明我们的方法高度自动化，适用于大线程数，并且在应用于大型数组时明显优于Voigtlaender的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A sound and complete abstraction for reasoning about parallel prefix sums

Prefix sums are key building blocks in the implementation of many concurrent software applications, and recently much work has gone into efficiently implementing prefix sums to run on massively parallel graphics processing units (GPUs). Because they lie at the heart of many GPU-accelerated applications, the correctness of prefix sum implementations is of prime importance. We introduce a novel abstraction, the interval of summations, that allows scalable reasoning about implementations of prefix sums. We present this abstraction as a monoid, and prove a soundness and completeness result showing that a generic sequential prefix sum implementation is correct for an array of length $n$ if and only if it computes the correct result for a specific test case when instantiated with the interval of summations monoid. This allows correctness to be established by running a single test where the input and result require O(n lg(n)) space. This improves upon an existing result by Sheeran where the input requires O(n lg(n)) space and the result O(n2 \lg(n)) space, and is more feasible for large n than a method by Voigtlaender that uses O(n) space for the input and result but requires running O(n2) tests. We then extend our abstraction and results to the context of data-parallel programs, developing an automated verification method for GPU implementations of prefix sums. Our method uses static verification to prove that a generic prefix sum implementation is data race-free, after which functional correctness of the implementation can be determined by running a single test case under the interval of summations abstraction. We present an experimental evaluation using four different prefix sum algorithms, showing that our method is highly automatic, scales to large thread counts, and significantly outperforms Voigtlaender's method when applied to large arrays.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助