验证弱序体系结构一致性的并行方法

2015 International Conference on Parallel Architecture and Compilation (PACT) Pub Date : 2015-10-18 DOI:10.1109/PACT.2015.18

Adam McLaughlin, D. Merrill, M. Garland, David A. Bader

{"title":"验证弱序体系结构一致性的并行方法","authors":"Adam McLaughlin, D. Merrill, M. Garland, David A. Bader","doi":"10.1109/PACT.2015.18","DOIUrl":null,"url":null,"abstract":"Contemporary microprocessors use relaxed memory consistency models to allow for aggressive optimizations in hardware. This enhancement in performance comes at the cost of design complexity and verification effort. In particular, verifying an execution of a program against its system's memory consistency model is an NP-complete problem. Several graph-based approximations to this problem based on carefully constructed randomized test programs have been proposed in the literature, however, such approaches are sequential and execute slowly on large graphs of interest. Unfortunately, the ability to execute larger tests is tremendously important, since such tests enable one to expose bugs more quickly. Successfully executing more tests per unit time is also desirable, since it allows for one to check for a greater variety of errors in the memory subsystem by utilizing a more diverse set of tests. This paper improves upon existing work by introducing an algorithm that not only reduces the time complexity of the verification process, but also facilitates the development of parallel algorithms for solving these problems. We first show performance improvements from a sequential approach and gain further performance from parallel implementations in OpenMP and CUDA. For large tests of interest, our GPU implementation achieves an average application speedup of 26.36x over existing techniques in use at NVIDIA.","PeriodicalId":385398,"journal":{"name":"2015 International Conference on Parallel Architecture and Compilation (PACT)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Parallel Methods for Verifying the Consistency of Weakly-Ordered Architectures\",\"authors\":\"Adam McLaughlin, D. Merrill, M. Garland, David A. Bader\",\"doi\":\"10.1109/PACT.2015.18\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Contemporary microprocessors use relaxed memory consistency models to allow for aggressive optimizations in hardware. This enhancement in performance comes at the cost of design complexity and verification effort. In particular, verifying an execution of a program against its system's memory consistency model is an NP-complete problem. Several graph-based approximations to this problem based on carefully constructed randomized test programs have been proposed in the literature, however, such approaches are sequential and execute slowly on large graphs of interest. Unfortunately, the ability to execute larger tests is tremendously important, since such tests enable one to expose bugs more quickly. Successfully executing more tests per unit time is also desirable, since it allows for one to check for a greater variety of errors in the memory subsystem by utilizing a more diverse set of tests. This paper improves upon existing work by introducing an algorithm that not only reduces the time complexity of the verification process, but also facilitates the development of parallel algorithms for solving these problems. We first show performance improvements from a sequential approach and gain further performance from parallel implementations in OpenMP and CUDA. For large tests of interest, our GPU implementation achieves an average application speedup of 26.36x over existing techniques in use at NVIDIA.\",\"PeriodicalId\":385398,\"journal\":{\"name\":\"2015 International Conference on Parallel Architecture and Compilation (PACT)\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-10-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Conference on Parallel Architecture and Compilation (PACT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PACT.2015.18\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Parallel Architecture and Compilation (PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PACT.2015.18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

当代微处理器使用宽松的内存一致性模型，以便对硬件进行积极的优化。这种性能的增强是以设计复杂性和验证工作为代价的。特别是，根据系统的内存一致性模型验证程序的执行是一个np完全问题。文献中已经提出了几个基于精心构建的随机测试程序的基于图的近似方法，然而，这些方法是顺序的，并且在感兴趣的大型图上执行缓慢。不幸的是，执行大型测试的能力非常重要，因为这样的测试可以更快地暴露错误。在单位时间内成功地执行更多的测试也是可取的，因为它允许人们通过使用更多样化的测试集来检查内存子系统中更多种类的错误。本文通过引入一种算法来改进现有的工作，该算法不仅降低了验证过程的时间复杂度，而且还促进了解决这些问题的并行算法的发展。我们首先展示了顺序方法的性能改进，并从OpenMP和CUDA的并行实现中获得了进一步的性能。对于感兴趣的大型测试，我们的GPU实现比NVIDIA使用的现有技术实现了26.36倍的平均应用程序加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Parallel Methods for Verifying the Consistency of Weakly-Ordered Architectures

Contemporary microprocessors use relaxed memory consistency models to allow for aggressive optimizations in hardware. This enhancement in performance comes at the cost of design complexity and verification effort. In particular, verifying an execution of a program against its system's memory consistency model is an NP-complete problem. Several graph-based approximations to this problem based on carefully constructed randomized test programs have been proposed in the literature, however, such approaches are sequential and execute slowly on large graphs of interest. Unfortunately, the ability to execute larger tests is tremendously important, since such tests enable one to expose bugs more quickly. Successfully executing more tests per unit time is also desirable, since it allows for one to check for a greater variety of errors in the memory subsystem by utilizing a more diverse set of tests. This paper improves upon existing work by introducing an algorithm that not only reduces the time complexity of the verification process, but also facilitates the development of parallel algorithms for solving these problems. We first show performance improvements from a sequential approach and gain further performance from parallel implementations in OpenMP and CUDA. For large tests of interest, our GPU implementation achieves an average application speedup of 26.36x over existing techniques in use at NVIDIA.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 International Conference on Parallel Architecture and Compilation (PACT)

自引率

0.00%

发文量

期刊最新文献

Storage Consolidation on SSDs: Not Always a Panacea, but Can We Ease the Pain? AREP: Adaptive Resource Efficient Prefetching for Maximizing Multicore Performance NVMMU: A Non-volatile Memory Management Unit for Heterogeneous GPU-SSD Architectures Scalable Task Scheduling and Synchronization Using Hierarchical Effects Scalable SIMD-Efficient Graph Processing on GPUs