Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation最新文献_第6页

Repairing and mechanising the JavaScript relaxed memory model 修复和机械化JavaScript宽松内存模型

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2020-05-21 DOI: 10.1145/3385412.3385973

Conrad Watt, Christopher Pulte, A. Podkopaev, G. Barbier, Stephen Dolan, Shaked Flur, Jean Pichon-Pharabod, Shu-yu Guo

Modern JavaScript includes the SharedArrayBuffer feature, which provides access to true shared memory concurrency. SharedArrayBuffers are simple linear buffers of bytes, and the JavaScript specification defines an axiomatic relaxed memory model to describe their behaviour. While this model is heavily based on the C/C++11 model, it diverges in some key areas. JavaScript chooses to give a well-defined semantics to data-races, unlike the "undefined behaviour" of C/C++11. Moreover, the JavaScript model is mixed-size. This means that its accesses are not to discrete locations, but to (possibly overlapping) ranges of bytes. We show that the model, in violation of the design intention, does not support a compilation scheme to ARMv8 which is used in practice. We propose a correction, which also incorporates a previously proposed fix for a failure of the model to provide Sequential Consistency of Data-Race-Free programs (SC-DRF), an important correctness condition. We use model checking, in Alloy, to generate small counter-examples for these deficiencies, and investigate our correction. To accomplish this, we also develop a mixed-size extension to the existing ARMv8 axiomatic model. Guided by our Alloy experimentation, we mechanise (in Coq) the JavaScript model (corrected and uncorrected), our ARMv8 model, and, for the corrected JavaScript model, a "model-internal" SC-DRF proof and a compilation scheme correctness proof to ARMv8. In addition, we investigate a non-mixed-size subset of the corrected JavaScript model, and give proofs of compilation correctness for this subset to x86-TSO, Power, RISC-V, ARMv7, and (again) ARMv8, via the Intermediate Memory Model (IMM). As a result of our work, the JavaScript standards body (ECMA TC39) will include fixes for both issues in an upcoming edition of the specification.

现代JavaScript包含SharedArrayBuffer特性，它提供了对真正的共享内存并发性的访问。SharedArrayBuffers是简单的线性字节缓冲区，JavaScript规范定义了一个公理放松内存模型来描述它们的行为。虽然这个模型在很大程度上基于C/ c++ 11模型，但它在一些关键领域存在分歧。JavaScript选择为数据竞争提供定义良好的语义，而不像C/ c++ 11中的“未定义行为”。此外，JavaScript模型是混合大小的。这意味着它的访问不是离散的位置，而是(可能重叠的)字节范围。我们表明，该模型违背了设计意图，不支持实际使用的ARMv8编译方案。我们提出了一个更正，它还包含了先前提出的修复模型未能提供数据无竞争程序的顺序一致性(SC-DRF)的问题，这是一个重要的正确性条件。我们在Alloy中使用模型检查来生成针对这些缺陷的小反例，并调查我们的纠正。为了实现这一点，我们还开发了对现有ARMv8公理模型的混合大小扩展。在Alloy实验的指导下，我们机械化(在Coq中)JavaScript模型(已纠正和未纠正)，我们的ARMv8模型，并且，对于已纠正的JavaScript模型，“模型内部”SC-DRF证明和ARMv8的编译方案正确性证明。此外，我们还研究了经过修正的JavaScript模型的非混合大小子集，并通过中间内存模型(IMM)给出了该子集对x86-TSO、Power、RISC-V、ARMv7和(再次)ARMv8的编译正确性证明。作为我们工作的结果，JavaScript标准组织(ECMA TC39)将在即将发布的规范版本中包含这两个问题的修复程序。

{"title":"Repairing and mechanising the JavaScript relaxed memory model","authors":"Conrad Watt, Christopher Pulte, A. Podkopaev, G. Barbier, Stephen Dolan, Shaked Flur, Jean Pichon-Pharabod, Shu-yu Guo","doi":"10.1145/3385412.3385973","DOIUrl":"https://doi.org/10.1145/3385412.3385973","url":null,"abstract":"Modern JavaScript includes the SharedArrayBuffer feature, which provides access to true shared memory concurrency. SharedArrayBuffers are simple linear buffers of bytes, and the JavaScript specification defines an axiomatic relaxed memory model to describe their behaviour. While this model is heavily based on the C/C++11 model, it diverges in some key areas. JavaScript chooses to give a well-defined semantics to data-races, unlike the \"undefined behaviour\" of C/C++11. Moreover, the JavaScript model is mixed-size. This means that its accesses are not to discrete locations, but to (possibly overlapping) ranges of bytes. We show that the model, in violation of the design intention, does not support a compilation scheme to ARMv8 which is used in practice. We propose a correction, which also incorporates a previously proposed fix for a failure of the model to provide Sequential Consistency of Data-Race-Free programs (SC-DRF), an important correctness condition. We use model checking, in Alloy, to generate small counter-examples for these deficiencies, and investigate our correction. To accomplish this, we also develop a mixed-size extension to the existing ARMv8 axiomatic model. Guided by our Alloy experimentation, we mechanise (in Coq) the JavaScript model (corrected and uncorrected), our ARMv8 model, and, for the corrected JavaScript model, a \"model-internal\" SC-DRF proof and a compilation scheme correctness proof to ARMv8. In addition, we investigate a non-mixed-size subset of the corrected JavaScript model, and give proofs of compilation correctness for this subset to x86-TSO, Power, RISC-V, ARMv7, and (again) ARMv8, via the Intermediate Memory Model (IMM). As a result of our work, the JavaScript standards body (ECMA TC39) will include fixes for both issues in an upcoming edition of the specification.","PeriodicalId":20580,"journal":{"name":"Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79505485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

PMEvo: portable inference of port mappings for out-of-order processors by evolutionary optimization 基于进化优化的乱序处理器端口映射的可移植推断

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2020-04-21 DOI: 10.1145/3385412.3385995

Fabian Ritter, Sebastian Hack

Achieving peak performance in a computer system requires optimizations in every layer of the system, be it hardware or software. A detailed understanding of the underlying hardware, and especially the processor, is crucial to optimize software. One key criterion for the performance of a processor is its ability to exploit instruction-level parallelism. This ability is determined by the port mapping of the processor, which describes the execution units of the processor for each instruction. Processor manufacturers usually do not share the port mappings of their microarchitectures. While approaches to automatically infer port mappings from experiments exist, they are based on processor-specific hardware performance counters that are not available on every platform. We present PMEvo, a framework to automatically infer port mappings solely based on the measurement of the execution time of short instruction sequences. PMEvo uses an evolutionary algorithm that evaluates the fitness of candidate mappings with an analytical throughput model formulated as a linear program. Our prototype implementation infers a port mapping for Intel's Skylake architecture that predicts measured instruction throughput with an accuracy that is competitive to existing work. Furthermore, it finds port mappings for AMD's Zen+ architecture and the ARM Cortex-A72 architecture, which are out of scope of existing techniques.

在计算机系统中实现最佳性能需要对系统的每一层进行优化，无论是硬件还是软件。详细了解底层硬件，特别是处理器，对于优化软件至关重要。处理器性能的一个关键标准是它利用指令级并行性的能力。这种能力是由处理器的端口映射决定的，它描述了处理器对每条指令的执行单元。处理器制造商通常不共享其微体系结构的端口映射。虽然存在从实验中自动推断端口映射的方法，但它们基于特定于处理器的硬件性能计数器，并不是每个平台上都可用。我们提出了PMEvo，一个仅基于短指令序列执行时间的测量来自动推断端口映射的框架。PMEvo使用一种进化算法来评估候选映射的适应度，并将分析吞吐量模型表述为线性程序。我们的原型实现推断出英特尔Skylake架构的端口映射，该架构预测测量指令吞吐量的准确性与现有工作相比具有竞争力。此外，它还发现了AMD Zen+架构和ARM Cortex-A72架构的端口映射，这超出了现有技术的范围。

{"title":"PMEvo: portable inference of port mappings for out-of-order processors by evolutionary optimization","authors":"Fabian Ritter, Sebastian Hack","doi":"10.1145/3385412.3385995","DOIUrl":"https://doi.org/10.1145/3385412.3385995","url":null,"abstract":"Achieving peak performance in a computer system requires optimizations in every layer of the system, be it hardware or software. A detailed understanding of the underlying hardware, and especially the processor, is crucial to optimize software. One key criterion for the performance of a processor is its ability to exploit instruction-level parallelism. This ability is determined by the port mapping of the processor, which describes the execution units of the processor for each instruction. Processor manufacturers usually do not share the port mappings of their microarchitectures. While approaches to automatically infer port mappings from experiments exist, they are based on processor-specific hardware performance counters that are not available on every platform. We present PMEvo, a framework to automatically infer port mappings solely based on the measurement of the execution time of short instruction sequences. PMEvo uses an evolutionary algorithm that evaluates the fitness of candidate mappings with an analytical throughput model formulated as a linear program. Our prototype implementation infers a port mapping for Intel's Skylake architecture that predicts measured instruction throughput with an accuracy that is competitive to existing work. Furthermore, it finds port mappings for AMD's Zen+ architecture and the ARM Cortex-A72 architecture, which are out of scope of existing techniques.","PeriodicalId":20580,"journal":{"name":"Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"100 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76220764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

LLHD: a multi-level intermediate representation for hardware description languages 硬件描述语言的多级中间表示

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2020-04-07 DOI: 10.1145/3385412.3386024

Fabian Schuiki, Andreas Kurth, T. Grosser, L. Benini

Modern Hardware Description Languages (HDLs) such as SystemVerilog or VHDL are, due to their sheer complexity, insufficient to transport designs through modern circuit design flows. Instead, each design automation tool lowers HDLs to its own Intermediate Representation (IR). These tools are monolithic and mostly proprietary, disagree in their implementation of HDLs, and while many redundant IRs exists, no IR today can be used through the entire circuit design flow. To solve this problem, we propose the LLHD multi-level IR. LLHD is designed as simple, unambiguous reference description of a digital circuit, yet fully captures existing HDLs. We show this with our reference compiler on designs as complex as full CPU cores. LLHD comes with lowering passes to a hardware-near structural IR, which readily integrates with existing tools. LLHD establishes the basis for innovation in HDLs and tools without redundant compilers or disjoint IRs. For instance, we implement an LLHD simulator that runs up to 2.4× faster than commercial simulators but produces equivalent, cycle-accurate results. An initial vertically-integrated research prototype is capable of representing all levels of the IR, implements lowering from the behavioural to the structural IR, and covers a sufficient subset of SystemVerilog to support a full CPU design.

现代硬件描述语言(hdl)，如SystemVerilog或VHDL，由于其纯粹的复杂性，不足以通过现代电路设计流程传输设计。相反，每个设计自动化工具都将hdl降低到自己的中间表示(IR)。这些工具都是单片的，大多是专有的，在实现hdl方面存在分歧，虽然存在许多冗余IR，但目前没有IR可以用于整个电路设计流程。为了解决这个问题，我们提出了LLHD多层红外光谱。LLHD设计为数字电路的简单，明确的参考描述，但完全捕获现有的hdl。我们用我们的参考编译器在像全CPU内核这样复杂的设计上展示了这一点。LLHD降低了接近硬件的结构红外通道，可以很容易地与现有工具集成。LLHD为hdl和工具的创新奠定了基础，没有冗余的编译器或不连接的ir。例如，我们实现了一个LLHD模拟器，其运行速度比商用模拟器快2.4倍，但产生相同的、周期精确的结果。最初的垂直集成研究原型能够表示IR的所有级别，实现从行为IR到结构IR的降低，并覆盖SystemVerilog的足够子集以支持完整的CPU设计。

{"title":"LLHD: a multi-level intermediate representation for hardware description languages","authors":"Fabian Schuiki, Andreas Kurth, T. Grosser, L. Benini","doi":"10.1145/3385412.3386024","DOIUrl":"https://doi.org/10.1145/3385412.3386024","url":null,"abstract":"Modern Hardware Description Languages (HDLs) such as SystemVerilog or VHDL are, due to their sheer complexity, insufficient to transport designs through modern circuit design flows. Instead, each design automation tool lowers HDLs to its own Intermediate Representation (IR). These tools are monolithic and mostly proprietary, disagree in their implementation of HDLs, and while many redundant IRs exists, no IR today can be used through the entire circuit design flow. To solve this problem, we propose the LLHD multi-level IR. LLHD is designed as simple, unambiguous reference description of a digital circuit, yet fully captures existing HDLs. We show this with our reference compiler on designs as complex as full CPU cores. LLHD comes with lowering passes to a hardware-near structural IR, which readily integrates with existing tools. LLHD establishes the basis for innovation in HDLs and tools without redundant compilers or disjoint IRs. For instance, we implement an LLHD simulator that runs up to 2.4× faster than commercial simulators but produces equivalent, cycle-accurate results. An initial vertically-integrated research prototype is capable of representing all levels of the IR, implements lowering from the behavioural to the structural IR, and covers a sufficient subset of SystemVerilog to support a full CPU design.","PeriodicalId":20580,"journal":{"name":"Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88945846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 36

NVTraverse: in NVRAM data structures, the destination is more important than the journey NVTraverse:在NVRAM数据结构中，目的地比旅程更重要

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2020-04-06 DOI: 10.1145/3385412.3386031

Michal Friedman, N. Ben-David, Yuanhao Wei, G. Blelloch, E. Petrank

The recent availability of fast, dense, byte-addressable non-volatile memory has led to increasing interest in the problem of designing durable data structures that can recover from system crashes. However, designing durable concurrent data structures that are correct and efficient has proven to be very difficult, leading to many inefficient or incorrect algorithms. In this paper, we present a general transformation that takes a lock-free data structure from a general class called traversal data structure (that we formally define) and automatically transforms it into an implementation of the data structure for the NVRAM setting that is provably durably linearizable and highly efficient. The transformation hinges on the observation that many data structure operations begin with a traversal phase that does not need to be persisted, and thus we only begin persisting when the traversal reaches its destination. We demonstrate the transformation's efficiency through extensive measurements on a system with Intel's recently released Optane DC persistent memory, showing that it can outperform competitors on many workloads.

最近快速、密集、字节可寻址的非易失性存储器的出现，引起了人们对设计能够从系统崩溃中恢复的持久数据结构的兴趣。然而，设计正确且高效的持久并发数据结构已被证明是非常困难的，这导致了许多低效或不正确的算法。在本文中，我们提出了一种通用转换，该转换从称为遍历数据结构的通用类(我们正式定义)中获取无锁数据结构，并自动将其转换为NVRAM设置的数据结构的实现，该实现可证明持久线性化且高效。转换取决于以下观察:许多数据结构操作从不需要持久化的遍历阶段开始，因此我们只在遍历到达目的地时才开始持久化。我们通过在使用英特尔最近发布的Optane DC持久内存的系统上进行广泛的测量来展示转换的效率，表明它可以在许多工作负载上优于竞争对手。

{"title":"NVTraverse: in NVRAM data structures, the destination is more important than the journey","authors":"Michal Friedman, N. Ben-David, Yuanhao Wei, G. Blelloch, E. Petrank","doi":"10.1145/3385412.3386031","DOIUrl":"https://doi.org/10.1145/3385412.3386031","url":null,"abstract":"The recent availability of fast, dense, byte-addressable non-volatile memory has led to increasing interest in the problem of designing durable data structures that can recover from system crashes. However, designing durable concurrent data structures that are correct and efficient has proven to be very difficult, leading to many inefficient or incorrect algorithms. In this paper, we present a general transformation that takes a lock-free data structure from a general class called traversal data structure (that we formally define) and automatically transforms it into an implementation of the data structure for the NVRAM setting that is provably durably linearizable and highly efficient. The transformation hinges on the observation that many data structure operations begin with a traversal phase that does not need to be persisted, and thus we only begin persisting when the traversal reaches its destination. We demonstrate the transformation's efficiency through extensive measurements on a system with Intel's recently released Optane DC persistent memory, showing that it can outperform competitors on many workloads.","PeriodicalId":20580,"journal":{"name":"Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"70 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75635665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 51

Responsive parallelism with futures and state 具有未来和状态的响应并行性

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2020-04-06 DOI: 10.1145/3385412.3386013

Stefan K. Muller, Kyle Singer, N. Goldstein, Umut A. Acar, Kunal Agrawal, I. Lee

Motivated by the increasing shift to multicore computers, recent work has developed language support for responsive parallel applications that mix compute-intensive tasks with latency-sensitive, usually interactive, tasks. These developments include calculi that allow assigning priorities to threads, type systems that can rule out priority inversions, and accompanying cost models for predicting responsiveness. These advances share one important limitation: all of this work assumes purely functional programming. This is a significant restriction, because many realistic interactive applications, from games to robots to web servers, use mutable state, e.g., for communication between threads. In this paper, we lift the restriction concerning the use of state. We present λi4, a calculus with implicit parallelism in the form of prioritized futures and mutable state in the form of references. Because both futures and references are first-class values, λi4 programs can exhibit complex dependencies, including interaction between threads and with the external world (users, network, etc). To reason about the responsiveness of λi4 programs, we extend traditional graph-based cost models for parallelism to account for dependencies created via mutable state, and we present a type system to outlaw priority inversions that can lead to unbounded blocking. We show that these techniques are practical by implementing them in C++ and present an empirical evaluation.

由于越来越多地转向多核计算机，最近的工作已经为响应式并行应用程序开发了语言支持，这些应用程序将计算密集型任务与延迟敏感(通常是交互式)任务混合在一起。这些发展包括允许为线程分配优先级的演算，可以排除优先级反转的类型系统，以及用于预测响应性的附带成本模型。这些进步都有一个重要的限制:所有这些工作都假设纯函数式编程。这是一个重要的限制，因为许多现实的交互式应用程序，从游戏到机器人到web服务器，都使用可变状态，例如，用于线程之间的通信。在本文中，我们解除了对状态使用的限制。我们提出了λi4，一种具有优先未来形式的隐式并行性和引用形式的可变状态的微积分。因为future和reference都是一等值，所以λi4程序可以表现出复杂的依赖关系，包括线程之间以及与外部世界(用户、网络等)的交互。为了解释λi4程序的响应性，我们扩展了传统的基于图的并行性成本模型，以考虑通过可变状态创建的依赖关系，并且我们提出了一个类型系统来禁止可能导致无界阻塞的优先级反转。我们通过在c++中实现这些技术来证明它们是实用的，并给出了一个经验评估。

{"title":"Responsive parallelism with futures and state","authors":"Stefan K. Muller, Kyle Singer, N. Goldstein, Umut A. Acar, Kunal Agrawal, I. Lee","doi":"10.1145/3385412.3386013","DOIUrl":"https://doi.org/10.1145/3385412.3386013","url":null,"abstract":"Motivated by the increasing shift to multicore computers, recent work has developed language support for responsive parallel applications that mix compute-intensive tasks with latency-sensitive, usually interactive, tasks. These developments include calculi that allow assigning priorities to threads, type systems that can rule out priority inversions, and accompanying cost models for predicting responsiveness. These advances share one important limitation: all of this work assumes purely functional programming. This is a significant restriction, because many realistic interactive applications, from games to robots to web servers, use mutable state, e.g., for communication between threads. In this paper, we lift the restriction concerning the use of state. We present λi4, a calculus with implicit parallelism in the form of prioritized futures and mutable state in the form of references. Because both futures and references are first-class values, λi4 programs can exhibit complex dependencies, including interaction between threads and with the external world (users, network, etc). To reason about the responsiveness of λi4 programs, we extend traditional graph-based cost models for parallelism to account for dependencies created via mutable state, and we present a type system to outlaw priority inversions that can lead to unbounded blocking. We show that these techniques are practical by implementing them in C++ and present an empirical evaluation.","PeriodicalId":20580,"journal":{"name":"Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"48 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77576879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Typilus: neural type hints Typilus:神经类型提示

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2020-04-06 DOI: 10.1145/3385412.3385997

Miltiadis Allamanis, Earl T. Barr, Soline Ducousso, Zheng Gao

Type inference over partial contexts in dynamically typed languages is challenging. In this work, we present a graph neural network model that predicts types by probabilistically reasoning over a program’s structure, names, and patterns. The network uses deep similarity learning to learn a TypeSpace — a continuous relaxation of the discrete space of types — and how to embed the type properties of a symbol (i.e. identifier) into it. Importantly, our model can employ one-shot learning to predict an open vocabulary of types, including rare and user-defined ones. We realise our approach in Typilus for Python that combines the TypeSpace with an optional type checker. We show that Typilus accurately predicts types. Typilus confidently predicts types for 70% of all annotatable symbols; when it predicts a type, that type optionally type checks 95% of the time. Typilus can also find incorrect type annotations; two important and popular open source libraries, fairseq and allennlp, accepted our pull requests that fixed the annotation errors Typilus discovered.

在动态类型语言中，局部上下文的类型推断具有挑战性。在这项工作中，我们提出了一个图神经网络模型，该模型通过对程序结构、名称和模式的概率推理来预测类型。该网络使用深度相似学习来学习TypeSpace(类型的离散空间的连续松弛)以及如何将符号(即标识符)的类型属性嵌入其中。重要的是，我们的模型可以使用一次性学习来预测开放的类型词汇表，包括罕见的和用户定义的类型。我们在Python的Typilus中实现了将TypeSpace与可选的类型检查器相结合的方法。我们证明Typilus可以准确地预测类型。Typilus自信地预测了70%可注释符号的类型;当它预测一个类型时，95%的情况下，该类型可选地进行类型检查。Typilus还可以找到不正确的类型注释;两个重要且流行的开源库fairseq和allennlp接受了我们的pull请求，修复了Typilus发现的注释错误。

引用次数: 87

On the principles of differentiable quantum programming languages 论可微量子程序设计语言的原理

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2020-04-02 DOI: 10.1145/3385412.3386011

Shaopeng Zhu, S. Hung, Shouvanik Chakrabarti, Xiaodi Wu

Variational Quantum Circuits (VQCs), or the so-called quantum neural-networks, are predicted to be one of the most important near-term quantum applications, not only because of their similar promises as classical neural-networks, but also because of their feasibility on near-term noisy intermediate-size quantum (NISQ) machines. The need for gradient information in the training procedure of VQC applications has stimulated the development of auto-differentiation techniques for quantum circuits. We propose the first formalization of this technique, not only in the context of quantum circuits but also for imperative quantum programs (e.g., with controls), inspired by the success of differentiable programming languages in classical machine learning. In particular, we overcome a few unique difficulties caused by exotic quantum features (such as quantum no-cloning) and provide a rigorous formulation of differentiation applied to bounded-loop imperative quantum programs, its code-transformation rules, as well as a sound logic to reason about their correctness. Moreover, we have implemented our code transformation in OCaml and demonstrated the resource-efficiency of our scheme both analytically and empirically. We also conduct a case study of training a VQC instance with controls, which shows the advantage of our scheme over existing auto-differentiation for quantum circuits without controls.

变分量子电路(vqc)，或所谓的量子神经网络，被预测为近期最重要的量子应用之一，不仅因为它们与经典神经网络的前景相似，而且还因为它们在近期有噪声的中型量子(NISQ)机器上的可行性。在VQC应用的训练过程中对梯度信息的需求刺激了量子电路自分化技术的发展。我们提出了这种技术的第一个形式化，不仅在量子电路的背景下，而且在命令式量子程序(例如，带有控制)中，受到经典机器学习中可微编程语言成功的启发。特别是，我们克服了由奇异量子特征(如量子不可克隆)引起的一些独特困难，并提供了适用于有界循环令状量子程序的微分的严格公式，其代码转换规则，以及一个合理的逻辑来推理它们的正确性。此外，我们已经在OCaml中实现了我们的代码转换，并从分析和经验两方面证明了我们的方案的资源效率。我们还进行了一个带有控制的VQC实例的训练案例研究，这表明我们的方案相对于没有控制的量子电路的现有自分化具有优势。

{"title":"On the principles of differentiable quantum programming languages","authors":"Shaopeng Zhu, S. Hung, Shouvanik Chakrabarti, Xiaodi Wu","doi":"10.1145/3385412.3386011","DOIUrl":"https://doi.org/10.1145/3385412.3386011","url":null,"abstract":"Variational Quantum Circuits (VQCs), or the so-called quantum neural-networks, are predicted to be one of the most important near-term quantum applications, not only because of their similar promises as classical neural-networks, but also because of their feasibility on near-term noisy intermediate-size quantum (NISQ) machines. The need for gradient information in the training procedure of VQC applications has stimulated the development of auto-differentiation techniques for quantum circuits. We propose the first formalization of this technique, not only in the context of quantum circuits but also for imperative quantum programs (e.g., with controls), inspired by the success of differentiable programming languages in classical machine learning. In particular, we overcome a few unique difficulties caused by exotic quantum features (such as quantum no-cloning) and provide a rigorous formulation of differentiation applied to bounded-loop imperative quantum programs, its code-transformation rules, as well as a sound logic to reason about their correctness. Moreover, we have implemented our code transformation in OCaml and demonstrated the resource-efficiency of our scheme both analytically and empirically. We also conduct a case study of training a VQC instance with controls, which shows the advantage of our scheme over existing auto-differentiation for quantum circuits without controls.","PeriodicalId":20580,"journal":{"name":"Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"36 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78671742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Exact and approximate methods for proving unrealizability of syntax-guided synthesis problems 证明语法引导综合问题不可实现性的精确和近似方法

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2020-04-02 DOI: 10.1145/3385412.3385979

Qinheping Hu, John Cyphert, Loris D'antoni, T. Reps

We consider the problem of automatically establishing that a given syntax-guided-synthesis (SyGuS) problem is unrealizable (i.e., has no solution). We formulate the problem of proving that a SyGuS problem is unrealizable over a finite set of examples as one of solving a set of equations: the solution yields an overapproximation of the set of possible outputs that any term in the search space can produce on the given examples. If none of the possible outputs agrees with all of the examples, our technique has proven that the given SyGuS problem is unrealizable. We then present an algorithm for exactly solving the set of equations that result from SyGuS problems over linear integer arithmetic (LIA) and LIA with conditionals (CLIA), thereby showing that LIA and CLIA SyGuS problems over finitely many examples are decidable. We implement the proposed technique and algorithms in a tool called Nay. Nay can prove unrealizability for 70/132 existing SyGuS benchmarks, with running times comparable to those of the state-of-the-art tool Nope. Moreover, Nay can solve 11 benchmarks that Nope cannot solve.

我们考虑自动确定给定语法引导合成(SyGuS)问题是不可实现的(即没有解决方案)的问题。我们将证明SyGuS问题在有限的例子集上是不可实现的问题表述为求解一组方程的问题之一:该解决方案产生了搜索空间中任何项在给定示例上可以产生的可能输出集的过近似值。如果没有一个可能的输出与所有示例一致，我们的技术已经证明给定的SyGuS问题是不可实现的。然后，我们提出了一种精确求解线性整数算法(LIA)和带条件的LIA (CLIA)上SyGuS问题的方程组的算法，从而证明了有限多个例子上的LIA和CLIA SyGuS问题是可决定的。我们在一个名为Nay的工具中实现了所提出的技术和算法。对于现有的70/132个SyGuS基准测试来说，Nay可以证明是不可实现的，其运行时间与最先进的工具相当。此外，“不”可以解决“不”无法解决的11个基准。

{"title":"Exact and approximate methods for proving unrealizability of syntax-guided synthesis problems","authors":"Qinheping Hu, John Cyphert, Loris D'antoni, T. Reps","doi":"10.1145/3385412.3385979","DOIUrl":"https://doi.org/10.1145/3385412.3385979","url":null,"abstract":"We consider the problem of automatically establishing that a given syntax-guided-synthesis (SyGuS) problem is unrealizable (i.e., has no solution). We formulate the problem of proving that a SyGuS problem is unrealizable over a finite set of examples as one of solving a set of equations: the solution yields an overapproximation of the set of possible outputs that any term in the search space can produce on the given examples. If none of the possible outputs agrees with all of the examples, our technique has proven that the given SyGuS problem is unrealizable. We then present an algorithm for exactly solving the set of equations that result from SyGuS problems over linear integer arithmetic (LIA) and LIA with conditionals (CLIA), thereby showing that LIA and CLIA SyGuS problems over finitely many examples are decidable. We implement the proposed technique and algorithms in a tool called Nay. Nay can prove unrealizability for 70/132 existing SyGuS benchmarks, with running times comparable to those of the state-of-the-art tool Nope. Moreover, Nay can solve 11 benchmarks that Nope cannot solve.","PeriodicalId":20580,"journal":{"name":"Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90866434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Crafty: efficient, HTM-compatible persistent transactions 巧妙:高效、兼容html的持久事务

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2020-04-01 DOI: 10.1145/3385412.3385991

K. Genç, Michael D. Bond, G. Xu

Byte-addressable persistent memory, such as Intel/Micron 3D XPoint, is an emerging technology that bridges the gap between volatile memory and persistent storage. Data in persistent memory survives crashes and restarts; however, it is challenging to ensure that this data is consistent after failures. Existing approaches incur significant performance costs to ensure crash consistency. This paper introduces Crafty, a new approach for ensuring consistency and atomicity on persistent memory operations using commodity hardware with existing hardware transactional memory (HTM) capabilities, while incurring low overhead. Crafty employs a novel technique called nondestructive undo logging that leverages commodity HTM to control persist ordering. Our evaluation shows that Crafty outperforms state-of-the-art prior work under low contention, and performs competitively under high contention.

字节可寻址的持久存储器，如Intel/Micron 3D XPoint，是一种新兴的技术，它弥补了易失性存储器和持久存储器之间的差距。持久化内存中的数据在崩溃和重启后仍然存在;然而，要确保这些数据在故障发生后保持一致是一项挑战。为了确保崩溃一致性，现有的方法会产生很大的性能成本。本文介绍了Crafty，这是一种使用具有现有硬件事务性内存(HTM)功能的商品硬件确保持久内存操作的一致性和原子性的新方法，同时产生较低的开销。Crafty采用了一种称为无损撤销日志记录的新技术，它利用商品HTM来控制持久排序。我们的评估表明，Crafty在低竞争下优于最先进的先前工作，并在高竞争下执行竞争性工作。

引用次数: 20

FreezeML: complete and easy type inference for first-class polymorphism FreezeML:为一级多态性提供完整和简单的类型推断

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2020-04-01 DOI: 10.1145/3385412.3386003

Frank Emrich, S. Lindley, Jan Stolarek, J. Cheney, Jonathan Coates

ML is remarkable in providing statically typed polymorphism without the programmer ever having to write any type annotations. The cost of this parsimony is that the programmer is limited to a form of polymorphism in which quantifiers can occur only at the outermost level of a type and type variables can be instantiated only with monomorphic types. Type inference for unrestricted System F-style polymorphism is undecidable in general. Nevertheless, the literature abounds with a range of proposals to bridge the gap between ML and System F. We put forth a new proposal, FreezeML, a conservative extension of ML with two new features. First, let- and lambda-binders may be annotated with arbitrary System F types. Second, variable occurrences may be frozen, explicitly disabling instantiation. FreezeML is equipped with type-preserving translations back and forth between System F and admits a type inference algorithm, an extension of algorithm W, that is sound and complete and which yields principal types.

ML在提供静态类型多态性方面非常出色，程序员无需编写任何类型注释。这种简约的代价是程序员被限制在一种形式的多态性中，在这种形式中，量词只能出现在类型的最外层，类型变量只能用单态类型实例化。一般来说，不受限制的System F-style多态性的类型推断是不可确定的。然而，文献中有大量的建议来弥合ML和System f之间的差距。我们提出了一个新的建议，FreezeML，一个ML的保守扩展，有两个新特性。首先，let和lambda绑定器可以用任意System F类型注释。其次，变量的出现可能会被冻结，显式地禁用实例化。FreezeML配备了系统F之间的类型保持转换，并允许类型推断算法，这是算法W的扩展，它是健全和完整的，并且产生主要类型。

引用次数: 6