Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation最新文献

Decidable verification under a causally consistent shared memory 因果一致共享内存下的可确定验证

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2020-06-11 DOI: 10.1145/3385412.3385966

O. Lahav, Udi Boker

Causal consistency is one of the most fundamental and widely used consistency models weaker than sequential consistency. In this paper, we study the verification of safety properties for finite-state concurrent programs running under a causally consistent shared memory model. We establish the decidability of this problem for a standard model of causal consistency (called also "Causal Convergence" and "Strong-Release-Acquire"). Our proof proceeds by developing an alternative operational semantics, based on the notion of a thread potential, that is equivalent to the existing declarative semantics and constitutes a well-structured transition system. In particular, our result allows for the verification of a large family of programs in the Release/Acquire fragment of C/C++11 (RA). Indeed, while verification under RA was recently shown to be undecidable for general programs, since RA coincides with the model we study here for write/write-race-free programs, the decidability of verification under RA for this widely used class of programs follows from our result. The novel operational semantics may also be of independent use in the investigation of weakly consistent shared memory models and their verification.

因果一致性是最基本和最广泛使用的一致性模型之一，比顺序一致性弱。本文研究了在因果一致共享内存模型下运行的有限状态并发程序的安全性验证问题。我们为因果一致性的标准模型(也称为“因果收敛”和“强-释放-获取”)建立了这个问题的可决性。我们的证明通过开发一种可选的操作语义来进行，该语义基于线程势的概念，与现有的声明性语义等效，并构成结构良好的转换系统。特别是，我们的结果允许在C/ c++ 11 (RA)的Release/Acquire片段中验证一个大的程序族。事实上，虽然RA下的验证最近被证明对一般程序是不可判定的，但由于RA与我们在这里研究的写/无写竞争程序的模型相吻合，因此对于这类广泛使用的程序，RA下验证的可判定性遵循我们的结果。这种新的操作语义也可以独立用于弱一致性共享内存模型的研究及其验证。

{"title":"Decidable verification under a causally consistent shared memory","authors":"O. Lahav, Udi Boker","doi":"10.1145/3385412.3385966","DOIUrl":"https://doi.org/10.1145/3385412.3385966","url":null,"abstract":"Causal consistency is one of the most fundamental and widely used consistency models weaker than sequential consistency. In this paper, we study the verification of safety properties for finite-state concurrent programs running under a causally consistent shared memory model. We establish the decidability of this problem for a standard model of causal consistency (called also \"Causal Convergence\" and \"Strong-Release-Acquire\"). Our proof proceeds by developing an alternative operational semantics, based on the notion of a thread potential, that is equivalent to the existing declarative semantics and constitutes a well-structured transition system. In particular, our result allows for the verification of a large family of programs in the Release/Acquire fragment of C/C++11 (RA). Indeed, while verification under RA was recently shown to be undecidable for general programs, since RA coincides with the model we study here for write/write-race-free programs, the decidability of verification under RA for this widely used class of programs follows from our result. The novel operational semantics may also be of independent use in the investigation of weakly consistent shared memory models and their verification.","PeriodicalId":20580,"journal":{"name":"Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"86 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75697492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Learning fast and precise numerical analysis 学习快速和精确的数值分析

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2020-06-11 DOI: 10.1145/3385412.3386016

Jingxuan He, Gagandeep Singh, Markus Püschel, Martin T. Vechev

Numerical abstract domains are a key component of modern static analyzers. Despite recent advances, precise analysis with highly expressive domains remains too costly for many real-world programs. To address this challenge, we introduce a new data-driven method, called LAIT, that produces a faster and more scalable numerical analysis without significant loss of precision. Our approach is based on the key insight that sequences of abstract elements produced by the analyzer contain redundancy which can be exploited to increase performance without compromising precision significantly. Concretely, we present an iterative learning algorithm that learns a neural policy that identifies and removes redundant constraints at various points in the sequence. We believe that our method is generic and can be applied to various numerical domains. We instantiate LAIT for the widely used Polyhedra and Octagon domains. Our evaluation of LAIT on a range of real-world applications with both domains shows that while the approach is designed to be generic, it is orders of magnitude faster on the most costly benchmarks than a state-of-the-art numerical library while maintaining close-to-original analysis precision. Further, LAIT outperforms hand-crafted heuristics and a domain-specific learning approach in terms of both precision and speed.

数值抽象域是现代静态分析器的重要组成部分。尽管最近取得了一些进展，但对于许多现实世界的程序来说，对高表达域进行精确分析的成本仍然太高。为了应对这一挑战，我们引入了一种新的数据驱动方法，称为LAIT，它可以产生更快、更可扩展的数值分析，而不会显著降低精度。我们的方法是基于关键的洞察力，即由分析仪产生的抽象元素序列包含冗余，可以利用它来提高性能，而不会显著影响精度。具体而言，我们提出了一种迭代学习算法，该算法学习一种神经策略，该策略可以识别和消除序列中不同点的冗余约束。我们相信我们的方法是通用的，可以应用于各种数值领域。我们对广泛使用的多面体和八边形域实例化了LAIT。我们对LAIT在两个领域的一系列实际应用中的评估表明，虽然该方法被设计为通用的，但在最昂贵的基准测试中，它比最先进的数值库要快几个数量级，同时保持接近原始的分析精度。此外，在精度和速度方面，LAIT优于手工制作的启发式和特定领域的学习方法。

{"title":"Learning fast and precise numerical analysis","authors":"Jingxuan He, Gagandeep Singh, Markus Püschel, Martin T. Vechev","doi":"10.1145/3385412.3386016","DOIUrl":"https://doi.org/10.1145/3385412.3386016","url":null,"abstract":"Numerical abstract domains are a key component of modern static analyzers. Despite recent advances, precise analysis with highly expressive domains remains too costly for many real-world programs. To address this challenge, we introduce a new data-driven method, called LAIT, that produces a faster and more scalable numerical analysis without significant loss of precision. Our approach is based on the key insight that sequences of abstract elements produced by the analyzer contain redundancy which can be exploited to increase performance without compromising precision significantly. Concretely, we present an iterative learning algorithm that learns a neural policy that identifies and removes redundant constraints at various points in the sequence. We believe that our method is generic and can be applied to various numerical domains. We instantiate LAIT for the widely used Polyhedra and Octagon domains. Our evaluation of LAIT on a range of real-world applications with both domains shows that while the approach is designed to be generic, it is orders of magnitude faster on the most costly benchmarks than a state-of-the-art numerical library while maintaining close-to-original analysis precision. Further, LAIT outperforms hand-crafted heuristics and a domain-specific learning approach in terms of both precision and speed.","PeriodicalId":20580,"journal":{"name":"Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85464734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

PMThreads: persistent memory threads harnessing versioned shadow copies PMThreads:利用版本控制的影子副本的持久内存线程

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2020-06-11 DOI: 10.1145/3385412.3386000

Zhenwei Wu, Kai Lu, A. Nisbet, Wen-zhe Zhang, M. Luján

Byte-addressable non-volatile memory (NVM) makes it possible to perform fast in-memory accesses to persistent data using standard load/store processor instructions. Some approaches for NVM are based on durable memory transactions and provide a persistent programming paradigm. However, they cannot be applied to existing multi-threaded applications without extensive source code modifications. Durable transactions typically rely on logging to enforce failure-atomic commits that include additional writes to NVM and considerable ordering overheads. This paper presents PMThreads, a novel user-space runtime that provides transparent failure-atomicity for lock-based parallel programs. A shadow DRAM page is used to buffer application writes for efficient propagation to a dual-copy NVM persistent storage framework during a global quiescent state. In this state, the working NVM copy and the crash-consistent copy of each page are atomically updated, and their roles are switched. A global quiescent state is entered at timed intervals by intercepting pthread lock acquire and release operations to ensure that no thread holds a lock to persistent data. Running on a dual-socket system with 20 cores, we show that PMThreads substantially outperforms the state-of-the-art Atlas, Mnemosyne and NVthreads systems for lock-based benchmarks (Phoenix, PARSEC benchmarks, and microbenchmark stress tests). Using Memcached, we also investigate the scalability of PMThreads and the effect of different time intervals for the quiescent state.

字节可寻址非易失性内存(NVM)使得使用标准加载/存储处理器指令在内存中对持久数据执行快速访问成为可能。NVM的一些方法基于持久内存事务，并提供持久的编程范例。但是，如果不大量修改源代码，它们就不能应用于现有的多线程应用程序。持久事务通常依赖于日志记录来强制执行故障原子提交，其中包括对NVM的额外写操作和相当大的排序开销。本文提出了一种新的用户空间运行时PMThreads，它为基于锁的并行程序提供透明的故障原子性。影子DRAM页用于缓冲应用程序写入，以便在全局静态状态期间有效地传播到双副本NVM持久存储框架。在这种状态下，自动更新每个页面的工作NVM副本和崩溃一致副本，并切换它们的角色。通过拦截pthread锁获取和释放操作，以确保没有线程持有持久数据的锁，以定时间隔进入全局静态状态。在一个20核的双插座系统上运行，我们发现PMThreads在基于锁的基准测试(Phoenix、PARSEC基准测试和微基准压力测试)中大大优于最先进的Atlas、Mnemosyne和NVthreads系统。使用Memcached，我们还研究了pmthread的可伸缩性以及不同时间间隔对静态状态的影响。

{"title":"PMThreads: persistent memory threads harnessing versioned shadow copies","authors":"Zhenwei Wu, Kai Lu, A. Nisbet, Wen-zhe Zhang, M. Luján","doi":"10.1145/3385412.3386000","DOIUrl":"https://doi.org/10.1145/3385412.3386000","url":null,"abstract":"Byte-addressable non-volatile memory (NVM) makes it possible to perform fast in-memory accesses to persistent data using standard load/store processor instructions. Some approaches for NVM are based on durable memory transactions and provide a persistent programming paradigm. However, they cannot be applied to existing multi-threaded applications without extensive source code modifications. Durable transactions typically rely on logging to enforce failure-atomic commits that include additional writes to NVM and considerable ordering overheads. This paper presents PMThreads, a novel user-space runtime that provides transparent failure-atomicity for lock-based parallel programs. A shadow DRAM page is used to buffer application writes for efficient propagation to a dual-copy NVM persistent storage framework during a global quiescent state. In this state, the working NVM copy and the crash-consistent copy of each page are atomically updated, and their roles are switched. A global quiescent state is entered at timed intervals by intercepting pthread lock acquire and release operations to ensure that no thread holds a lock to persistent data. Running on a dual-socket system with 20 cores, we show that PMThreads substantially outperforms the state-of-the-art Atlas, Mnemosyne and NVthreads systems for lock-based benchmarks (Phoenix, PARSEC benchmarks, and microbenchmark stress tests). Using Memcached, we also investigate the scalability of PMThreads and the effect of different time intervals for the quiescent state.","PeriodicalId":20580,"journal":{"name":"Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81201252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

Proving almost-sure termination by omega-regular decomposition 通过正则分解证明几乎肯定终止

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2020-06-11 DOI: 10.1145/3385412.3386002

Jianhui Chen, Fei He

Almost-sure termination is the most basic liveness property of probabilistic programs. We present a novel decomposition-based approach for proving almost-sure termination of probabilistic programs with complex control-flow structure and non-determinism. Our approach automatically decomposes the runs of the probabilistic program into a finite union of ω-regular subsets and then proves almost-sure termination of each subset based on the notion of localized ranking supermartingales. Compared to the lexicographic methods and the compositional methods, our approach does not require a lexicographic order over the ranking supermartingales as well as the so-called unaffecting condition. Thus it has high generality. We present the algorithm of our approach and prove its soundness, as well as its relative completeness. We show that our approach can be applied to some hard cases and the evaluation on the benchmarks of previous works shows the significant efficiency of our approach.

几乎确定终止是概率程序最基本的活动性质。本文提出了一种新的基于分解的方法来证明具有复杂控制流结构和非确定性的概率规划的几乎确定终止。我们的方法自动将概率程序的运行分解为ω-正则子集的有限并，然后基于局部排序上鞅的概念证明每个子集的几乎确定终止。与词典编纂方法和组合方法相比，我们的方法不需要词典编纂的顺序，也不需要所谓的不影响条件。因此具有较高的通用性。给出了该方法的算法，并证明了其正确性和相对完备性。我们证明了我们的方法可以应用于一些困难的情况，并且对以前工作的基准的评估表明了我们的方法的显着效率。

引用次数: 9

Debug information validation for optimized code 优化代码的调试信息验证

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2020-06-11 DOI: 10.1145/3385412.3386020

Yuanbo Li, Shuo Ding, Qirun Zhang, Davide Italiano

Almost all modern production software is compiled with optimization. Debugging optimized code is a desirable functionality. For example, developers usually perform post-mortem debugging on the coredumps produced by software crashes. Designing reliable debugging techniques for optimized code has been well-studied in the past. However, little is known about the correctness of the debug information generated by optimizing compilers when debugging optimized code. Optimizing compilers emit debug information (e.g., DWARF information) to support source code debuggers. Wrong debug information causes debuggers to either crash or to display wrong variable values. Existing debugger validation techniques only focus on testing the interactive aspect of debuggers for dynamic languages (i.e., with unoptimized code). Validating debug information for optimized code raises some unique challenges: (1) many breakpoints cannot be reached by debuggers due to code optimization; and (2) inspecting some arbitrary variables such as uninitialized variables introduces undefined behaviors. This paper presents the first generic framework for systematically testing debug information with optimized code. We introduce a novel concept called actionable program. An actionable program P⟨ s, v⟩ contains a program location s and a variable v to inspect. Our key insight is that in both the unoptimized program P⟨ s,v⟩ and the optimized program P⟨ s,v⟩′, debuggers should be able to stop at the program location s and inspect the value of the variable v without any undefined behaviors. Our framework generates actionable programs and does systematic testing by comparing the debugger output of P⟨ s, v⟩′ and the actual value of v at line s in P⟨ s, v⟩. We have applied our framework to two mainstream optimizing C compilers (i.e., GCC and LLVM). Our framework has led to 47 confirmed bug reports, 11 of which have already been fixed. Moreover, in three days, our technique has found 2 confirmed bugs in the Rust compiler. The results have demonstrated the effectiveness and generality of our framework.

几乎所有的现代生产软件都是经过优化编译的。调试优化的代码是一个理想的功能。例如，开发人员通常对软件崩溃产生的核心转储执行事后调试。为优化的代码设计可靠的调试技术在过去已经得到了很好的研究。然而，在调试优化代码时，对优化编译器生成的调试信息的正确性知之甚少。优化编译器发出调试信息(例如，DWARF信息)以支持源代码调试器。错误的调试信息会导致调试器崩溃或显示错误的变量值。现有的调试器验证技术只关注于测试动态语言调试器的交互方面(即，使用未优化的代码)。验证优化代码的调试信息会带来一些独特的挑战:(1)由于代码优化，调试器无法到达许多断点;(2)检查一些任意变量(如未初始化的变量)会引入未定义行为。本文提出了第一个用优化代码系统测试调试信息的通用框架。我们引入了一个新概念，叫做可操作程序。一个可操作的程序P⟨s, v⟩包含一个程序位置s和一个要检查的变量v。我们的关键见解是，在未优化的程序P⟨s,v⟩和优化的程序P⟨s,v⟩中，调试器应该能够在程序位置s处停止并检查变量v的值而没有任何未定义的行为。我们的框架生成可操作的程序，并通过比较P⟨s, v⟩'的调试器输出和P⟨s, v⟩中第s行上v的实际值来进行系统测试。我们已经将我们的框架应用于两种主流的优化C编译器(即GCC和LLVM)。我们的框架产生了47个已确认的bug报告，其中11个已经修复。此外，在三天内，我们的技术在Rust编译器中发现了2个已确认的bug。结果证明了该框架的有效性和通用性。

{"title":"Debug information validation for optimized code","authors":"Yuanbo Li, Shuo Ding, Qirun Zhang, Davide Italiano","doi":"10.1145/3385412.3386020","DOIUrl":"https://doi.org/10.1145/3385412.3386020","url":null,"abstract":"Almost all modern production software is compiled with optimization. Debugging optimized code is a desirable functionality. For example, developers usually perform post-mortem debugging on the coredumps produced by software crashes. Designing reliable debugging techniques for optimized code has been well-studied in the past. However, little is known about the correctness of the debug information generated by optimizing compilers when debugging optimized code. Optimizing compilers emit debug information (e.g., DWARF information) to support source code debuggers. Wrong debug information causes debuggers to either crash or to display wrong variable values. Existing debugger validation techniques only focus on testing the interactive aspect of debuggers for dynamic languages (i.e., with unoptimized code). Validating debug information for optimized code raises some unique challenges: (1) many breakpoints cannot be reached by debuggers due to code optimization; and (2) inspecting some arbitrary variables such as uninitialized variables introduces undefined behaviors. This paper presents the first generic framework for systematically testing debug information with optimized code. We introduce a novel concept called actionable program. An actionable program P⟨ s, v⟩ contains a program location s and a variable v to inspect. Our key insight is that in both the unoptimized program P⟨ s,v⟩ and the optimized program P⟨ s,v⟩′, debuggers should be able to stop at the program location s and inspect the value of the variable v without any undefined behaviors. Our framework generates actionable programs and does systematic testing by comparing the debugger output of P⟨ s, v⟩′ and the actual value of v at line s in P⟨ s, v⟩. We have applied our framework to two mainstream optimizing C compilers (i.e., GCC and LLVM). Our framework has led to 47 confirmed bug reports, 11 of which have already been fixed. Moreover, in three days, our technique has found 2 confirmed bugs in the Rust compiler. The results have demonstrated the effectiveness and generality of our framework.","PeriodicalId":20580,"journal":{"name":"Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"40 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77214881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Type-directed scheduling of streaming accelerators 流式加速器的类型导向调度

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2020-06-11 DOI: 10.1145/3385412.3385983

David Durst, Matthew Feldman, Dillon Huff, David Akeley, Ross G. Daly, G. Bernstein, Marco Patrignani, K. Fatahalian, P. Hanrahan

Designing efficient, application-specialized hardware accelerators requires assessing trade-offs between a hardware module’s performance and resource requirements. To facilitate hardware design space exploration, we describe Aetherling, a system for automatically compiling data-parallel programs into statically scheduled, streaming hardware circuits. Aetherling contributes a space- and time-aware intermediate language featuring data-parallel operators that represent parallel or sequential hardware modules, and sequence data types that encode a module’s throughput by specifying when sequence elements are produced or consumed. As a result, well-typed operator composition in the space-time language corresponds to connecting hardware modules via statically scheduled, streaming interfaces. We provide rules for transforming programs written in a standard data-parallel language (that carries no information about hardware implementation) into equivalent space-time language programs. We then provide a scheduling algorithm that searches over the space of transformations to quickly generate area-efficient hardware designs that achieve a programmer-specified throughput. Using benchmarks from the image processing domain, we demonstrate that Aetherling enables rapid exploration of hardware designs with different throughput and area characteristics, and yields results that require 1.8-7.9× fewer FPGA slices than those of prior hardware generation systems.

设计高效的专用于应用程序的硬件加速器需要评估硬件模块的性能和资源需求之间的权衡。为了方便硬件设计空间的探索，我们描述了Aetherling，一个自动将数据并行程序编译成静态调度的流硬件电路的系统。Aetherling提供了一种具有空间和时间感知的中间语言，其特点是数据并行运算符(表示并行或顺序硬件模块)和序列数据类型(通过指定何时产生或使用序列元素来编码模块的吞吐量)。因此，在时空语言中，类型良好的操作符组合对应于通过静态调度的流接口连接硬件模块。我们提供了将用标准数据并行语言(不携带有关硬件实现的信息)编写的程序转换为等效时空语言程序的规则。然后，我们提供了一种调度算法，该算法搜索转换空间，以快速生成区域高效的硬件设计，从而实现程序员指定的吞吐量。使用来自图像处理领域的基准测试，我们证明了Aetherling能够快速探索具有不同吞吐量和面积特性的硬件设计，并且产生的结果比先前的硬件生成系统需要1.8-7.9倍的FPGA切片。

{"title":"Type-directed scheduling of streaming accelerators","authors":"David Durst, Matthew Feldman, Dillon Huff, David Akeley, Ross G. Daly, G. Bernstein, Marco Patrignani, K. Fatahalian, P. Hanrahan","doi":"10.1145/3385412.3385983","DOIUrl":"https://doi.org/10.1145/3385412.3385983","url":null,"abstract":"Designing efficient, application-specialized hardware accelerators requires assessing trade-offs between a hardware module’s performance and resource requirements. To facilitate hardware design space exploration, we describe Aetherling, a system for automatically compiling data-parallel programs into statically scheduled, streaming hardware circuits. Aetherling contributes a space- and time-aware intermediate language featuring data-parallel operators that represent parallel or sequential hardware modules, and sequence data types that encode a module’s throughput by specifying when sequence elements are produced or consumed. As a result, well-typed operator composition in the space-time language corresponds to connecting hardware modules via statically scheduled, streaming interfaces. We provide rules for transforming programs written in a standard data-parallel language (that carries no information about hardware implementation) into equivalent space-time language programs. We then provide a scheduling algorithm that searches over the space of transformations to quickly generate area-efficient hardware designs that achieve a programmer-specified throughput. Using benchmarks from the image processing domain, we demonstrate that Aetherling enables rapid exploration of hardware designs with different throughput and area characteristics, and yields results that require 1.8-7.9× fewer FPGA slices than those of prior hardware generation systems.","PeriodicalId":20580,"journal":{"name":"Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"136 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86623307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35

Optimizing homomorphic evaluation circuits by program synthesis and term rewriting 利用程序综合和项重写优化同态求值电路

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2020-06-11 DOI: 10.1145/3385412.3385996

Dongkwon Lee, Woosuk Lee, Hakjoo Oh, K. Yi

We present a new and general method for optimizing homomorphic evaluation circuits. Although fully homomorphic encryption (FHE) holds the promise of enabling safe and secure third party computation, building FHE applications has been challenging due to their high computational costs. Domain-specific optimizations require a great deal of expertise on the underlying FHE schemes, and FHE compilers that aims to lower the hurdle, generate outcomes that are typically sub-optimal as they rely on manually-developed optimization rules. In this paper, based on the prior work of FHE compilers, we propose a method for automatically learning and using optimization rules for FHE circuits. Our method focuses on reducing the maximum multiplicative depth, the decisive performance bottleneck, of FHE circuits by combining program synthesis and term rewriting. It first uses program synthesis to learn equivalences of small circuits as rewrite rules from a set of training circuits. Then, we perform term rewriting on the input circuit to obtain a new circuit that has lower multiplicative depth. Our rewriting method maximally generalizes the learned rules based on the equational matching and its soundness and termination properties are formally proven. Experimental results show that our method generates circuits that can be homomorphically evaluated 1.18x – 3.71x faster (with the geometric mean of 2.05x) than the state-of-the-art method. Our method is also orthogonal to existing domain-specific optimizations.

提出了一种新的通用的优化同态求值电路的方法。尽管完全同态加密(FHE)有望实现安全可靠的第三方计算，但由于其高计算成本，构建FHE应用程序一直具有挑战性。特定领域的优化需要大量关于底层FHE方案的专业知识，而FHE编译器的目标是降低障碍，生成的结果通常不是最优的，因为它们依赖于手动开发的优化规则。本文在FHE编译器已有工作的基础上，提出了一种FHE电路自动学习和使用优化规则的方法。我们的方法主要通过结合程序合成和项重写来降低FHE电路的最大乘法深度这一决定性的性能瓶颈。它首先使用程序合成从一组训练电路中学习等效的小电路作为重写规则。然后，我们对输入电路进行项重写，以获得具有更低乘法深度的新电路。我们的重写方法最大限度地推广了基于等式匹配的学习规则，并正式证明了它的可靠性和终止性。实验结果表明，我们的方法产生的电路同态评估速度比现有方法快1.18 - 3.71倍(几何平均值为2.05倍)。我们的方法也与现有的特定领域优化是正交的。

{"title":"Optimizing homomorphic evaluation circuits by program synthesis and term rewriting","authors":"Dongkwon Lee, Woosuk Lee, Hakjoo Oh, K. Yi","doi":"10.1145/3385412.3385996","DOIUrl":"https://doi.org/10.1145/3385412.3385996","url":null,"abstract":"We present a new and general method for optimizing homomorphic evaluation circuits. Although fully homomorphic encryption (FHE) holds the promise of enabling safe and secure third party computation, building FHE applications has been challenging due to their high computational costs. Domain-specific optimizations require a great deal of expertise on the underlying FHE schemes, and FHE compilers that aims to lower the hurdle, generate outcomes that are typically sub-optimal as they rely on manually-developed optimization rules. In this paper, based on the prior work of FHE compilers, we propose a method for automatically learning and using optimization rules for FHE circuits. Our method focuses on reducing the maximum multiplicative depth, the decisive performance bottleneck, of FHE circuits by combining program synthesis and term rewriting. It first uses program synthesis to learn equivalences of small circuits as rewrite rules from a set of training circuits. Then, we perform term rewriting on the input circuit to obtain a new circuit that has lower multiplicative depth. Our rewriting method maximally generalizes the learned rules based on the equational matching and its soundness and termination properties are formally proven. Experimental results show that our method generates circuits that can be homomorphically evaluated 1.18x – 3.71x faster (with the geometric mean of 2.05x) than the state-of-the-art method. Our method is also orthogonal to existing domain-specific optimizations.","PeriodicalId":20580,"journal":{"name":"Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85811196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

BlankIt library debloating: getting what you want instead of cutting what you don’t BlankIt库的精简:得到你想要的，而不是删减你不想要的

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2020-06-11 DOI: 10.1145/3385412.3386017

C. Porter, Girish Mururu, Prithayan Barua, S. Pande

Modern software systems make extensive use of libraries derived from C and C++. Because of the lack of memory safety in these languages, however, the libraries may suffer from vulnerabilities, which can expose the applications to potential attacks. For example, a very large number of return-oriented programming gadgets exist in glibc that allow stitching together semantically valid but malicious Turing-complete and -incomplete programs. While CVEs get discovered and often patched and remedied, such gadgets serve as building blocks of future undiscovered attacks, opening an ever-growing set of possibilities for generating malicious programs. Thus, significant reduction in the quantity and expressiveness (utility) of such gadgets for libraries is an important problem. In this work, we propose a new approach for handling an application’s library functions that focuses on the principle of “getting only what you want.” This is a significant departure from the current approaches that focus on “cutting what is unwanted.” Our approach focuses on activating/deactivating library functions on demand in order to reduce the dynamically linked code surface, so that the possibilities of constructing malicious programs diminishes substantially. The key idea is to load only the set of library functions that will be used at each library call site within the application at runtime. This approach of demand-driven loading relies on an input-aware oracle that predicts a near-exact set of library functions needed at a given call site during the execution. The predicted functions are loaded just in time and unloaded on return. We present a decision-tree based predictor, which acts as an oracle, and an optimized runtime system, which works directly with library binaries like GNU libc and libstdc++. We show that on average, the proposed scheme cuts the exposed code surface of libraries by 97.2%, reduces ROP gadgets present in linked libraries by 97.9%, achieves a prediction accuracy in most cases of at least 97%, and adds a runtime overhead of 18% on all libraries (16% for glibc, 2% for others) across all benchmarks of SPEC 2006. Further, we demonstrate BlankIt on two real-world applications, sshd and nginx, with a high amount of debloating and low overheads.

现代软件系统广泛使用源自C和c++的库。但是，由于这些语言缺乏内存安全性，这些库可能存在漏洞，从而使应用程序暴露于潜在的攻击之下。例如，glibc中存在大量面向返回的编程小工具，这些小工具允许将语义上有效但恶意的图灵完整和不完整的程序拼接在一起。尽管cve会被发现，并经常被修补和修复，但这类小工具会成为未来未被发现的攻击的基石，为生成恶意程序提供了不断增长的可能性。因此，对于库来说，这些小工具的数量和表达能力(效用)的显著减少是一个重要问题。在这项工作中，我们提出了一种处理应用程序库函数的新方法，该方法关注于“只获取您想要的”原则。这与当前专注于“削减不需要的东西”的方法有很大的不同。我们的方法侧重于按需激活/停用库函数，以减少动态链接的代码面，从而大大减少构建恶意程序的可能性。关键思想是在运行时仅加载将在应用程序中的每个库调用站点使用的库函数集。这种需求驱动加载的方法依赖于一个输入感知的oracle，该oracle在执行过程中预测给定调用站点所需的近乎精确的库函数集。预测的函数被及时加载，并在返回时卸载。我们提出了一个基于决策树的预测器，它充当了一个oracle，以及一个优化的运行时系统，它直接与像GNU libc和libstdc++这样的库二进制文件一起工作。我们表明，在SPEC 2006的所有基准测试中，所提出的方案平均将库的暴露代码表面减少了97.2%，将链接库中的ROP gadget减少了97.9%，在大多数情况下实现了至少97%的预测准确性，并在所有库上增加了18%的运行时开销(glibc为16%，其他为2%)。此外，我们在两个真实的应用程序(sshd和nginx)上演示了BlankIt，具有大量的膨胀和低开销。

{"title":"BlankIt library debloating: getting what you want instead of cutting what you don’t","authors":"C. Porter, Girish Mururu, Prithayan Barua, S. Pande","doi":"10.1145/3385412.3386017","DOIUrl":"https://doi.org/10.1145/3385412.3386017","url":null,"abstract":"Modern software systems make extensive use of libraries derived from C and C++. Because of the lack of memory safety in these languages, however, the libraries may suffer from vulnerabilities, which can expose the applications to potential attacks. For example, a very large number of return-oriented programming gadgets exist in glibc that allow stitching together semantically valid but malicious Turing-complete and -incomplete programs. While CVEs get discovered and often patched and remedied, such gadgets serve as building blocks of future undiscovered attacks, opening an ever-growing set of possibilities for generating malicious programs. Thus, significant reduction in the quantity and expressiveness (utility) of such gadgets for libraries is an important problem. In this work, we propose a new approach for handling an application’s library functions that focuses on the principle of “getting only what you want.” This is a significant departure from the current approaches that focus on “cutting what is unwanted.” Our approach focuses on activating/deactivating library functions on demand in order to reduce the dynamically linked code surface, so that the possibilities of constructing malicious programs diminishes substantially. The key idea is to load only the set of library functions that will be used at each library call site within the application at runtime. This approach of demand-driven loading relies on an input-aware oracle that predicts a near-exact set of library functions needed at a given call site during the execution. The predicted functions are loaded just in time and unloaded on return. We present a decision-tree based predictor, which acts as an oracle, and an optimized runtime system, which works directly with library binaries like GNU libc and libstdc++. We show that on average, the proposed scheme cuts the exposed code surface of libraries by 97.2%, reduces ROP gadgets present in linked libraries by 97.9%, achieves a prediction accuracy in most cases of at least 97%, and adds a runtime overhead of 18% on all libraries (16% for glibc, 2% for others) across all benchmarks of SPEC 2006. Further, we demonstrate BlankIt on two real-world applications, sshd and nginx, with a high amount of debloating and low overheads.","PeriodicalId":20580,"journal":{"name":"Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"70 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80387078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

Behavioral simulation for smart contracts 智能合约的行为模拟

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2020-06-11 DOI: 10.1145/3385412.3386022

Sidi Mohamed Beillahi, Gabriela F. Cretu-Ciocarlie, M. Emmi, C. Enea

While smart contracts have the potential to revolutionize many important applications like banking, trade, and supply-chain, their reliable deployment begs for rigorous formal verification. Since most smart contracts are not annotated with formal specifications, general verification of functional properties is impeded. In this work, we propose an automated approach to verify unannotated smart contracts against specifications ascribed to a few manually-annotated contracts. In particular, we propose a notion of behavioral refinement, which implies inheritance of functional properties. Furthermore, we propose an automated approach to inductive proof, by synthesizing simulation relations on the states of related contracts. Empirically, we demonstrate that behavioral simulations can be synthesized automatically for several ubiquitous classes like tokens, auctions, and escrow, thus enabling the verification of unannotated contracts against functional specifications.

虽然智能合约有可能彻底改变许多重要的应用，如银行、贸易和供应链，但它们的可靠部署需要严格的正式验证。由于大多数智能合约没有使用正式规范进行注释，因此阻碍了对功能属性的一般验证。在这项工作中，我们提出了一种自动化的方法来验证未注释的智能合约与归因于一些手动注释合约的规范。特别是，我们提出了行为细化的概念，这意味着功能属性的继承。此外，我们还提出了一种自动化的归纳证明方法，该方法通过综合相关契约状态上的模拟关系来实现。根据经验，我们证明了行为模拟可以自动合成几个普遍存在的类，如令牌、拍卖和托管，从而能够根据功能规范验证未注释的合同。

引用次数: 9

Silq: a high-level quantum language with safe uncomputation and intuitive semantics Silq:一种高级量子语言，具有安全的非计算和直观的语义

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2020-06-11 DOI: 10.1145/3385412.3386007

Benjamin Bichsel, Maximilian Baader, Timon Gehr, Martin T. Vechev

Existing quantum languages force the programmer to work at a low level of abstraction leading to unintuitive and cluttered code. A fundamental reason is that dropping temporary values from the program state requires explicitly applying quantum operations that safely uncompute these values. We present Silq, the first quantum language that addresses this challenge by supporting safe, automatic uncomputation. This enables an intuitive semantics that implicitly drops temporary values, as in classical computation. To ensure physicality of Silq's semantics, its type system leverages novel annotations to reject unphysical programs. Our experimental evaluation demonstrates that Silq programs are not only easier to read and write, but also significantly shorter than equivalent programs in other quantum languages (on average -46% for Q#, -38% for Quipper), while using only half the number of quantum primitives.

现有的量子语言迫使程序员在较低的抽象层次上工作，导致代码不直观和混乱。一个根本原因是，从程序状态中删除临时值需要显式地应用量子操作来安全地取消这些值的计算。我们提出了Silq，这是第一个通过支持安全、自动非计算来解决这一挑战的量子语言。这实现了一种直观的语义，可以隐式地删除临时值，就像在经典计算中一样。为了确保Silq语义的物理性，它的类型系统利用新颖的注释来拒绝非物理程序。我们的实验评估表明，Silq程序不仅更容易读写，而且比其他量子语言的等效程序明显更短(q#平均为-46%，Quipper为-38%)，而只使用了一半的量子原语。

引用次数: 89