Proceedings of the ACM on Programming Languages最新文献_第6页

Spirea: A Mechanized Concurrent Separation Logic for Weak Persistent Memory 用于弱持久内存的机械化并发分离逻辑

Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Proceedings of the ACM on Programming Languages

Pub Date : 2023-10-16 DOI: 10.1145/3622820

Simon Friis Vindum, Lars Birkedal

Weak persistent memory (a.k.a. non-volatile memory) is an emerging technology that offers fast byte-addressable durable main memory. A wealth of algorithms and libraries has been developed to explore this exciting technology. As noted by others, this has led to a significant verification gap. Towards closing this gap, we present Spirea, the first concurrent separation logic for verification of programs under a weak persistent memory model. Spirea is based on the Iris and Perennial verification frameworks, and by combining features from these logics with novel techniques it supports high-level modular reasoning about crash-safe and thread-safe programs and libraries. Spirea is fully mechanized in the Coq proof assistant and allows for interactive development of proofs with the Iris Proof Mode. We use Spirea to verify several challenging examples with modular specifications. We show how our logic can verify thread-safety and crash-safety of non-blocking durable data structures with null-recovery, in particular the Treiber stack and the Michael-Scott queue adapted to persistent memory. This is the first time durable data structures have been verified with a program logic.

弱持久存储器(又称非易失性存储器)是一种新兴的技术，它提供了快速的字节寻址持久主存储器。已经开发了大量的算法和库来探索这一令人兴奋的技术。正如其他人所指出的那样，这导致了重大的核查差距。为了缩小这一差距，我们提出了Spirea，这是第一个在弱持久内存模型下验证程序的并发分离逻辑。Spirea基于Iris和Perennial验证框架，通过将这些逻辑的特性与新颖的技术相结合，它支持关于崩溃安全和线程安全的程序和库的高级模块化推理。Spirea在Coq证明助手中完全机械化，并允许与虹膜证明模式交互开发证明。我们使用Spirea验证了几个具有模块化规范的具有挑战性的示例。我们将展示我们的逻辑如何通过null恢复来验证非阻塞持久数据结构的线程安全性和崩溃安全性，特别是适用于持久内存的Treiber堆栈和Michael-Scott队列。这是第一次用程序逻辑验证持久数据结构。

{"title":"Spirea: A Mechanized Concurrent Separation Logic for Weak Persistent Memory","authors":"Simon Friis Vindum, Lars Birkedal","doi":"10.1145/3622820","DOIUrl":"https://doi.org/10.1145/3622820","url":null,"abstract":"Weak persistent memory (a.k.a. non-volatile memory) is an emerging technology that offers fast byte-addressable durable main memory. A wealth of algorithms and libraries has been developed to explore this exciting technology. As noted by others, this has led to a significant verification gap. Towards closing this gap, we present Spirea, the first concurrent separation logic for verification of programs under a weak persistent memory model. Spirea is based on the Iris and Perennial verification frameworks, and by combining features from these logics with novel techniques it supports high-level modular reasoning about crash-safe and thread-safe programs and libraries. Spirea is fully mechanized in the Coq proof assistant and allows for interactive development of proofs with the Iris Proof Mode. We use Spirea to verify several challenging examples with modular specifications. We show how our logic can verify thread-safety and crash-safety of non-blocking durable data structures with null-recovery, in particular the Treiber stack and the Michael-Scott queue adapted to persistent memory. This is the first time durable data structures have been verified with a program logic.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136115199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Mechanizing Session-Types using a Structural View: Enforcing Linearity without Linearity 使用结构视图机械化会话类型:在没有线性的情况下强制线性

Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Proceedings of the ACM on Programming Languages

Pub Date : 2023-10-16 DOI: 10.1145/3622810

Chuta Sano, Ryan Kavanagh, Brigitte Pientka

Session types employ a linear type system that ensures that communication channels cannot be implicitly copied or discarded. As a result, many mechanizations of these systems require modeling channel contexts and carefully ensuring that they treat channels linearly. We demonstrate a technique that localizes linearity conditions as additional predicates embedded within type judgments, which allows us to use structural typing contexts instead of linear ones. This technique is especially relevant when leveraging (weak) higher-order abstract syntax to handle channel mobility and the intricate binding structures that arise in session-typed systems. Following this approach, we mechanize a session-typed system based on classical linear logic and its type preservation proof in the proof assistant Beluga, which uses the logical framework LF as its encoding language. We also prove adequacy for our encoding. This shows the tractability and effectiveness of our approach in modelling substructural systems such as session-typed languages.

会话类型采用线性类型系统，确保通信通道不会被隐式复制或丢弃。因此，这些系统的许多机械化都需要对通道上下文进行建模，并仔细确保它们对通道进行线性处理。我们演示了一种技术，该技术将线性条件定位为嵌入类型判断中的附加谓词，这允许我们使用结构类型上下文而不是线性类型上下文。当利用(弱)高阶抽象语法来处理会话类型系统中出现的通道移动性和复杂的绑定结构时，这种技术尤其重要。根据这种方法，我们在证明助手Beluga中机械化了一个基于经典线性逻辑及其类型保存证明的会话类型系统，该系统使用逻辑框架LF作为其编码语言。我们还证明了编码的充分性。这显示了我们的方法在模拟子结构系统(如会话类型语言)方面的可跟踪性和有效性。

引用次数: 0

Simple Reference Immutability for System F _<: 系统F的简单引用不变性:

Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Proceedings of the ACM on Programming Languages

Pub Date : 2023-10-16 DOI: 10.1145/3622828

Edward Lee, Ondřej Lhoták

Reference immutability is a type based technique for taming mutation that has long been studied in the context of object-oriented languages, like Java. Recently, though, languages like Scala have blurred the lines between functional programming languages and object oriented programming languages. We explore how reference immutability interacts with features commonly found in these hybrid languages, in particular with higher-order functions – polymorphism – and subtyping. We construct a calculus System F<:M which encodes a reference immutability system as a simple extension of System F<: and prove that it satisfies the standard soundness and immutability safety properties.

引用不变性是一种基于类型的技术，用于驯服突变，这种技术在面向对象语言(如Java)的上下文中已经研究了很长时间。但是最近，像Scala这样的语言模糊了函数式编程语言和面向对象编程语言之间的界限。我们将探讨引用不变性如何与这些混合语言中常见的特性交互，特别是与高阶函数(多态性)和子类型交互。构造了一个微积分系统F< M，该系统编码了一个参考不变性系统作为系统F< M的简单扩展，并证明了它满足标准稳健性和不变性安全性质。

引用次数: 0

A Grounded Conceptual Model for Ownership Types in Rust Rust中所有权类型的基础概念模型

Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Proceedings of the ACM on Programming Languages

Pub Date : 2023-10-16 DOI: 10.1145/3622841

Crichton, Will, Gray, Gavin, Krishnamurthi, Shriram

Programmers learning Rust struggle to understand ownership types, Rust’s core mechanism for ensuring memory safety without garbage collection. This paper describes our attempt to systematically design a pedagogy for ownership types. First, we studied Rust developers’ misconceptions of ownership to create the Ownership Inventory, a new instrument for measuring a person’s knowledge of ownership. We found that Rust learners could not connect Rust’s static and dynamic semantics, such as determining why an ill-typed program would (or would not) exhibit undefined behavior. Second, we created a conceptual model of Rust’s semantics that explains borrow checking in terms of flow-sensitive permissions on paths into memory. Third, we implemented a Rust compiler plugin that visualizes programs under the model. Fourth, we integrated the permissions model and visualizations into a broader pedagogy of ownership by writing a new ownership chapter for The Rust Programming Language , a popular Rust textbook. Fifth, we evaluated an initial deployment of our pedagogy against the original version, using reader responses to the Ownership Inventory as a point of comparison. Thus far, the new pedagogy has improved learner scores on the Ownership Inventory by an average of 9

学习Rust的程序员很难理解所有权类型，这是Rust在没有垃圾收集的情况下确保内存安全的核心机制。本文描述了我们系统设计所有权类型教学法的尝试。首先，我们研究了Rust开发人员对所有权的误解，以创建所有权清单，这是一种衡量个人所有权知识的新工具。我们发现Rust学习者不能连接Rust的静态和动态语义，比如确定为什么一个病态的程序会(或不会)表现出未定义的行为。其次，我们创建了一个Rust语义的概念模型，该模型根据进入内存的路径上的流敏感权限来解释借用检查。第三，我们实现了一个Rust编译器插件，它可以将模型下的程序可视化。第四，我们通过为流行的Rust教科书《Rust编程语言》编写新的所有权章节，将权限模型和可视化集成到更广泛的所有权教学法中。第五，我们利用读者对所有权清单的反应作为比较点，评估了我们的教学法与原始版本的初始部署。到目前为止，新的教学方法已经使学习者在“所有权清单”上的得分平均提高了9分

{"title":"A Grounded Conceptual Model for Ownership Types in Rust","authors":"Crichton, Will, Gray, Gavin, Krishnamurthi, Shriram","doi":"10.1145/3622841","DOIUrl":"https://doi.org/10.1145/3622841","url":null,"abstract":"Programmers learning Rust struggle to understand ownership types, Rust’s core mechanism for ensuring memory safety without garbage collection. This paper describes our attempt to systematically design a pedagogy for ownership types. First, we studied Rust developers’ misconceptions of ownership to create the Ownership Inventory, a new instrument for measuring a person’s knowledge of ownership. We found that Rust learners could not connect Rust’s static and dynamic semantics, such as determining why an ill-typed program would (or would not) exhibit undefined behavior. Second, we created a conceptual model of Rust’s semantics that explains borrow checking in terms of flow-sensitive permissions on paths into memory. Third, we implemented a Rust compiler plugin that visualizes programs under the model. Fourth, we integrated the permissions model and visualizations into a broader pedagogy of ownership by writing a new ownership chapter for The Rust Programming Language , a popular Rust textbook. Fifth, we evaluated an initial deployment of our pedagogy against the original version, using reader responses to the Ownership Inventory as a point of comparison. Thus far, the new pedagogy has improved learner scores on the Ownership Inventory by an average of 9","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"279 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136077381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Resource-Aware Soundness for Big-Step Semantics 大步语义的资源感知合理性

Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Proceedings of the ACM on Programming Languages

Pub Date : 2023-10-16 DOI: 10.1145/3622843

Riccardo Bianchini, Francesco Dagnino, Paola Giannini, Elena Zucca

We extend the semantics and type system of a lambda calculus equipped with common constructs to be resource-aware . That is, reduction is instrumented to keep track of the usage of resources, and the type system guarantees, besides standard soundness, that for well-typed programs there is a computation where no needed resource gets exhausted. The resource-aware extension is parametric on an arbitrary grade algebra , and does not require ad-hoc changes to the underlying language. To this end, the semantics needs to be formalized in big-step style; as a consequence, expressing and proving (resource-aware) soundness is challenging, and is achieved by applying recent techniques based on coinductive reasoning.

我们将lambda演算的语义和类型系统扩展为具有资源意识的公共结构。也就是说，减少被用来跟踪资源的使用，并且类型系统保证，除了标准的可靠性之外，对于类型良好的程序，有一个不耗尽所需资源的计算。资源感知扩展在任意等级代数上是参数化的，并且不需要对底层语言进行特别的更改。为此，语义需要以大踏步的方式进行形式化;因此，表达和证明(资源感知)合理性是具有挑战性的，并且可以通过应用基于协归纳推理的最新技术来实现。

引用次数: 0

Formal Abstractions for Packet Scheduling 包调度的形式化抽象

Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Proceedings of the ACM on Programming Languages

Pub Date : 2023-10-16 DOI: 10.1145/3622845

Anshuman Mohan, Yunhe Liu, Nate Foster, Tobias Kappé, Dexter Kozen

Early programming models for software-defined networking (SDN) focused on basic features for controlling network-wide forwarding paths, but more recent work has considered richer features, such as packet scheduling and queueing, that affect performance. In particular, PIFO trees , proposed by Sivaraman et al., offer a flexible and efficient primitive for programmable packet scheduling. Prior work has shown that PIFO trees can express a wide range of practical algorithms including strict priority, weighted fair queueing, and hierarchical schemes. However, the semantic properties of PIFO trees are not well understood. This paper studies PIFO trees from a programming language perspective. We formalize the syntax and semantics of PIFO trees in an operational model that decouples the scheduling policy running on a tree from the topology of the tree. Building on this formalization, we develop compilation algorithms that allow the behavior of a PIFO tree written against one topology to be realized using a tree with a different topology. Such a compiler could be used to optimize an implementation of PIFO trees, or realize a logical PIFO tree on a target with a fixed topology baked into the hardware. To support experimentation, we develop a software simulator for PIFO trees, and we present case studies illustrating its behavior on standard and custom algorithms.

软件定义网络(SDN)的早期编程模型侧重于控制全网转发路径的基本功能，但最近的工作考虑了影响性能的更丰富的功能，如数据包调度和排队。特别是由Sivaraman等人提出的PIFO树，为可编程分组调度提供了灵活高效的原语。先前的研究表明，PIFO树可以表达广泛的实用算法，包括严格优先级，加权公平排队和分层方案。然而，PIFO树的语义特性还没有被很好地理解。本文从编程语言的角度研究PIFO树。我们在一个操作模型中形式化了PIFO树的语法和语义，该模型将运行在树上的调度策略与树的拓扑解耦。在此形式化的基础上，我们开发了编译算法，允许使用具有不同拓扑的树来实现针对一种拓扑编写的PIFO树的行为。这样的编译器可用于优化PIFO树的实现，或者在具有固定拓扑结构的目标上实现逻辑PIFO树。为了支持实验，我们开发了一个PIFO树的软件模拟器，并给出了案例研究，说明了它在标准和自定义算法上的行为。

{"title":"Formal Abstractions for Packet Scheduling","authors":"Anshuman Mohan, Yunhe Liu, Nate Foster, Tobias Kappé, Dexter Kozen","doi":"10.1145/3622845","DOIUrl":"https://doi.org/10.1145/3622845","url":null,"abstract":"Early programming models for software-defined networking (SDN) focused on basic features for controlling network-wide forwarding paths, but more recent work has considered richer features, such as packet scheduling and queueing, that affect performance. In particular, PIFO trees , proposed by Sivaraman et al., offer a flexible and efficient primitive for programmable packet scheduling. Prior work has shown that PIFO trees can express a wide range of practical algorithms including strict priority, weighted fair queueing, and hierarchical schemes. However, the semantic properties of PIFO trees are not well understood. This paper studies PIFO trees from a programming language perspective. We formalize the syntax and semantics of PIFO trees in an operational model that decouples the scheduling policy running on a tree from the topology of the tree. Building on this formalization, we develop compilation algorithms that allow the behavior of a PIFO tree written against one topology to be realized using a tree with a different topology. Such a compiler could be used to optimize an implementation of PIFO trees, or realize a logical PIFO tree on a target with a fixed topology baked into the hardware. To support experimentation, we develop a software simulator for PIFO trees, and we present case studies illustrating its behavior on standard and custom algorithms.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136112410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Inference of Resource Management Specifications 资源管理规范的推理

Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Proceedings of the ACM on Programming Languages

Pub Date : 2023-10-16 DOI: 10.1145/3622858

Narges Shadab, Pritam Gharat, Shrey Tiwari, Michael D. Ernst, Martin Kellogg, Shuvendu K. Lahiri, Akash Lal, Manu Sridharan

A resource leak occurs when a program fails to free some finite resource after it is no longer needed. Such leaks are a significant cause of real-world crashes and performance problems. Recent work proposed an approach to prevent resource leaks based on checking resource management specifications. A resource management specification expresses how the program allocates resources, passes them around, and releases them; it also tracks the ownership relationship between objects and resources, and aliasing relationships between objects. While this specify-and-verify approach has several advantages compared to prior techniques, the need to manually write annotations presents a significant barrier to its practical adoption. This paper presents a novel technique to automatically infer a resource management specification for a program, broadening the applicability of specify-and-check verification for resource leaks. Inference in this domain is challenging because resource management specifications differ significantly in nature from the types that most inference techniques target. Further, for practical effectiveness, we desire a technique that can infer the resource management specification intended by the developer, even in cases when the code does not fully adhere to that specification. We address these challenges through a set of inference rules carefully designed to capture real-world coding patterns, yielding an effective fixed-point-based inference algorithm. We have implemented our inference algorithm in two different systems, targeting programs written in Java and C#. In an experimental evaluation, our technique inferred 85.5% of the annotations that programmers had written manually for the benchmarks. Further, the verifier issued nearly the same rate of false alarms with the manually-written and automatically-inferred annotations.

当程序无法释放不再需要的有限资源时，就会发生资源泄漏。这种泄漏是导致现实世界崩溃和性能问题的重要原因。最近的工作提出了一种基于检查资源管理规范来防止资源泄漏的方法。资源管理规范表达了程序如何分配资源、传递资源和释放资源;它还跟踪对象和资源之间的所有权关系，以及对象之间的混叠关系。虽然与以前的技术相比，这种指定并验证的方法有几个优点，但手工编写注释的需要对其实际采用构成了重大障碍。本文提出了一种自动推断程序资源管理规范的新技术，扩大了规范-检查验证对资源泄漏的适用性。这个领域的推理是具有挑战性的，因为资源管理规范在本质上与大多数推理技术所针对的类型有很大的不同。此外，为了实际的有效性，我们需要一种能够推断开发人员所期望的资源管理规范的技术，即使在代码没有完全遵循该规范的情况下也是如此。我们通过一组精心设计的推理规则来解决这些挑战，这些规则旨在捕获现实世界的编码模式，从而产生有效的基于定点的推理算法。我们在两个不同的系统中实现了我们的推理算法，目标是用Java和c#编写的程序。在一次实验评估中，我们的技术推断出85.5%的注释是程序员为基准手动编写的。此外，验证者使用手动编写和自动推断的注释发出的假警报率几乎相同。

{"title":"Inference of Resource Management Specifications","authors":"Narges Shadab, Pritam Gharat, Shrey Tiwari, Michael D. Ernst, Martin Kellogg, Shuvendu K. Lahiri, Akash Lal, Manu Sridharan","doi":"10.1145/3622858","DOIUrl":"https://doi.org/10.1145/3622858","url":null,"abstract":"A resource leak occurs when a program fails to free some finite resource after it is no longer needed. Such leaks are a significant cause of real-world crashes and performance problems. Recent work proposed an approach to prevent resource leaks based on checking resource management specifications. A resource management specification expresses how the program allocates resources, passes them around, and releases them; it also tracks the ownership relationship between objects and resources, and aliasing relationships between objects. While this specify-and-verify approach has several advantages compared to prior techniques, the need to manually write annotations presents a significant barrier to its practical adoption. This paper presents a novel technique to automatically infer a resource management specification for a program, broadening the applicability of specify-and-check verification for resource leaks. Inference in this domain is challenging because resource management specifications differ significantly in nature from the types that most inference techniques target. Further, for practical effectiveness, we desire a technique that can infer the resource management specification intended by the developer, even in cases when the code does not fully adhere to that specification. We address these challenges through a set of inference rules carefully designed to capture real-world coding patterns, yielding an effective fixed-point-based inference algorithm. We have implemented our inference algorithm in two different systems, targeting programs written in Java and C#. In an experimental evaluation, our technique inferred 85.5% of the annotations that programmers had written manually for the benchmarks. Further, the verifier issued nearly the same rate of false alarms with the manually-written and automatically-inferred annotations.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136112414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Data Extraction via Semantic Regular Expression Synthesis 基于语义正则表达式合成的数据提取

Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Proceedings of the ACM on Programming Languages

Pub Date : 2023-10-16 DOI: 10.1145/3622863

Qiaochu Chen, Arko Banerjee, Çağatay Demiralp, Greg Durrett, Işıl Dillig

Many data extraction tasks of practical relevance require not only syntactic pattern matching but also semantic reasoning about the content of the underlying text. While regular expressions are very well suited for tasks that require only syntactic pattern matching, they fall short for data extraction tasks that involve both a syntactic and semantic component. To address this issue, we introduce semantic regexes, a generalization of regular expressions that facilitates combined syntactic and semantic reasoning about textual data. We also propose a novel learning algorithm that can synthesize semantic regexes from a small number of positive and negative examples. Our proposed learning algorithm uses a combination of neural sketch generation and compositional type-directed synthesis for fast and effective generalization from a small number of examples. We have implemented these ideas in a new tool called Smore and evaluated it on representative data extraction tasks involving several textual datasets. Our evaluation shows that semantic regexes can better support complex data extraction tasks than standard regular expressions and that our learning algorithm significantly outperforms existing tools, including state-of-the-art neural networks and program synthesis tools.

许多实际相关的数据提取任务不仅需要语法模式匹配，还需要对底层文本的内容进行语义推理。虽然正则表达式非常适合只需要语法模式匹配的任务，但对于同时涉及语法和语义组件的数据提取任务来说，它们就显得有些不足了。为了解决这个问题，我们引入了语义正则表达式，这是正则表达式的一种泛化，有助于对文本数据进行语法和语义推理的结合。我们还提出了一种新的学习算法，可以从少量的正反例中合成语义正则。我们提出的学习算法使用神经草图生成和组合类型导向合成相结合的方法，从少量示例中快速有效地泛化。我们在一个名为Smore的新工具中实现了这些想法，并在涉及多个文本数据集的代表性数据提取任务上对其进行了评估。我们的评估表明，与标准正则表达式相比，语义正则表达式可以更好地支持复杂的数据提取任务，并且我们的学习算法明显优于现有的工具，包括最先进的神经网络和程序合成工具。

{"title":"Data Extraction via Semantic Regular Expression Synthesis","authors":"Qiaochu Chen, Arko Banerjee, Çağatay Demiralp, Greg Durrett, Işıl Dillig","doi":"10.1145/3622863","DOIUrl":"https://doi.org/10.1145/3622863","url":null,"abstract":"Many data extraction tasks of practical relevance require not only syntactic pattern matching but also semantic reasoning about the content of the underlying text. While regular expressions are very well suited for tasks that require only syntactic pattern matching, they fall short for data extraction tasks that involve both a syntactic and semantic component. To address this issue, we introduce semantic regexes, a generalization of regular expressions that facilitates combined syntactic and semantic reasoning about textual data. We also propose a novel learning algorithm that can synthesize semantic regexes from a small number of positive and negative examples. Our proposed learning algorithm uses a combination of neural sketch generation and compositional type-directed synthesis for fast and effective generalization from a small number of examples. We have implemented these ideas in a new tool called Smore and evaluated it on representative data extraction tasks involving several textual datasets. Our evaluation shows that semantic regexes can better support complex data extraction tasks than standard regular expressions and that our learning algorithm significantly outperforms existing tools, including state-of-the-art neural networks and program synthesis tools.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136112534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Beacons: An End-to-End Compiler Framework for Predicting and Utilizing Dynamic Loop Characteristics 信标:预测和利用动态循环特性的端到端编译器框架

Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Proceedings of the ACM on Programming Languages

Pub Date : 2023-10-16 DOI: 10.1145/3622803

Girish Mururu, Sharjeel Khan, Bodhisatwa Chatterjee, Chao Chen, Chris Porter, Ada Gavrilovska, Santosh Pande

Efficient management of shared resources is a critical problem in high-performance computing (HPC) environments. Existing workload management systems often promote non-sharing of resources among different co-executing applications to achieve performance isolation. Such schemes lead to poor resource utilization and suboptimal process throughput, adversely affecting user productivity. Tackling this problem in a scalable fashion is extremely challenging, since it requires the workload scheduler to possess an in-depth knowledge about various application resource requirements and runtime phases at fine granularities within individual applications. In this work, we show that applications’ resource requirements and execution phase behaviour can be captured in a scalable and lightweight manner at runtime by estimating important program artifacts termed as “ dynamic loop characteristics ”. Specifically, we propose a solution to the problem of efficient workload scheduling by designing a compiler and runtime cooperative framework that leverages novel loop-based compiler analysis for resource allocation . We present Beacons Framework , an end-to-end compiler and scheduling framework, that estimates dynamic loop characteristics, encapsulates them in compiler-instrumented beacons in an application, and broadcasts them during application runtime, for proactive workload scheduling. We focus on estimating four important loop characteristics : loop trip-count , loop timing , loop memory footprint , and loop data-reuse behaviour , through a combination of compiler analysis and machine learning. The novelty of the Beacons Framework also lies in its ability to tackle irregular loops that exhibit complex control flow with indeterminate loop bounds involving structure fields, aliased variables and function calls , which are highly prevalent in modern workloads. At the backend, Beacons Framework entails a proactive workload scheduler that leverages the runtime information to orchestrate aggressive process co-locations, for maximizing resource concurrency, without causing cache thrashing . Our results show that Beacons Framework can predict different loop characteristics with an accuracy of 85% to 95% on average, and the proactive scheduler obtains an average throughput improvement of 1.9x (up to 3.2x ) over the state-of-the-art schedulers on an Amazon Graviton2 machine on consolidated workloads involving 1000-10000 co-executing processes, across 51 benchmarks.

共享资源的有效管理是高性能计算环境中的一个关键问题。现有的工作负载管理系统通常提倡在不同的协同执行应用程序之间不共享资源，以实现性能隔离。这样的方案导致资源利用率低下和次优流程吞吐量，对用户生产力产生不利影响。以可伸缩的方式解决这个问题是极具挑战性的，因为它要求工作负载调度器对各个应用程序中的各种应用程序资源需求和运行时阶段有深入的了解。在这项工作中，我们展示了应用程序的资源需求和执行阶段行为可以通过评估被称为“动态循环特征”的重要程序工件，在运行时以可伸缩和轻量级的方式捕获。具体来说，我们通过设计一个编译器和运行时协作框架来解决高效工作负载调度问题，该框架利用新颖的基于循环的编译器分析来进行资源分配。我们提出了Beacons Framework，这是一个端到端编译器和调度框架，它估计动态循环特征，将它们封装在应用程序中的编译器配置的信标中，并在应用程序运行时广播它们，以进行主动工作负载调度。通过编译器分析和机器学习的结合，我们专注于估计四个重要的循环特征:循环行程计数、循环定时、循环内存占用和循环数据重用行为。Beacons框架的新颖之处还在于它能够处理不规则循环，这些循环表现出复杂的控制流，包含不确定的循环边界，涉及结构字段、别名变量和函数调用，这在现代工作负载中非常普遍。在后端，Beacons Framework需要一个主动的工作负载调度器，该调度器利用运行时信息编排积极的进程共存，以最大化资源并发性，而不会导致缓存抖动。我们的结果表明，Beacons Framework可以预测不同的循环特征，平均准确率为85%至95%，并且在涉及1000-10000个协同执行进程的合并工作负载上，主动调度器在51个基准测试中，比Amazon Graviton2机器上最先进的调度器平均吞吐量提高1.9倍(最高可达3.2倍)。

{"title":"Beacons: An End-to-End Compiler Framework for Predicting and Utilizing Dynamic Loop Characteristics","authors":"Girish Mururu, Sharjeel Khan, Bodhisatwa Chatterjee, Chao Chen, Chris Porter, Ada Gavrilovska, Santosh Pande","doi":"10.1145/3622803","DOIUrl":"https://doi.org/10.1145/3622803","url":null,"abstract":"Efficient management of shared resources is a critical problem in high-performance computing (HPC) environments. Existing workload management systems often promote non-sharing of resources among different co-executing applications to achieve performance isolation. Such schemes lead to poor resource utilization and suboptimal process throughput, adversely affecting user productivity. Tackling this problem in a scalable fashion is extremely challenging, since it requires the workload scheduler to possess an in-depth knowledge about various application resource requirements and runtime phases at fine granularities within individual applications. In this work, we show that applications’ resource requirements and execution phase behaviour can be captured in a scalable and lightweight manner at runtime by estimating important program artifacts termed as “ dynamic loop characteristics ”. Specifically, we propose a solution to the problem of efficient workload scheduling by designing a compiler and runtime cooperative framework that leverages novel loop-based compiler analysis for resource allocation . We present Beacons Framework , an end-to-end compiler and scheduling framework, that estimates dynamic loop characteristics, encapsulates them in compiler-instrumented beacons in an application, and broadcasts them during application runtime, for proactive workload scheduling. We focus on estimating four important loop characteristics : loop trip-count , loop timing , loop memory footprint , and loop data-reuse behaviour , through a combination of compiler analysis and machine learning. The novelty of the Beacons Framework also lies in its ability to tackle irregular loops that exhibit complex control flow with indeterminate loop bounds involving structure fields, aliased variables and function calls , which are highly prevalent in modern workloads. At the backend, Beacons Framework entails a proactive workload scheduler that leverages the runtime information to orchestrate aggressive process co-locations, for maximizing resource concurrency, without causing cache thrashing . Our results show that Beacons Framework can predict different loop characteristics with an accuracy of 85% to 95% on average, and the proactive scheduler obtains an average throughput improvement of 1.9x (up to 3.2x ) over the state-of-the-art schedulers on an Amazon Graviton2 machine on consolidated workloads involving 1000-10000 co-executing processes, across 51 benchmarks.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136112666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Interactive Debugging of Datalog Programs 数据程序的交互式调试

Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Proceedings of the ACM on Programming Languages

Pub Date : 2023-10-16 DOI: 10.1145/3622824

André Pacak, Sebastian Erdweg

Datalog is used for complex programming tasks nowadays, consisting of numerous inter-dependent predicates. But Datalog lacks interactive debugging techniques that support the stepwise execution and inspection of the execution state. In this paper, we propose interactive debugging of Datalog programs following a top-down evaluation strategy called recursive query/subquery. While the recursive query/subquery approach is well-known in the literature, we are the first to provide a complete programming-language semantics based on it. Specifically, we develop the first small-step operational semantics for top-down Datalog, where subqueries occur as nested intermediate terms. The small-step semantics forms the basis of step-into interactions in the debugger. Moreover, we show how step-over interactions can be realized efficiently based on a hybrid Datalog semantics that adds a bottom-up database to our top-down operational semantics. We implemented a debugger for core Datalog following these semantics and explain how to adopt it for debugging the frontend languages of Soufflé and IncA. Our evaluation shows that our hybrid Datalog semantics can be used to debug real-world Datalog programs with realistic workloads.

如今，数据表被用于复杂的编程任务，它由许多相互依赖的谓词组成。但是Datalog缺乏支持逐步执行和检查执行状态的交互式调试技术。在本文中，我们提出了一种自顶向下的评估策略，称为递归查询/子查询，以交互式调试Datalog程序。虽然递归查询/子查询方法在文献中是众所周知的，但我们是第一个基于它提供完整的编程语言语义的人。具体来说，我们为自顶向下的Datalog开发了第一个小步骤操作语义，其中子查询作为嵌套的中间项出现。小步骤语义构成了调试器中分步进入交互的基础。此外，我们还展示了如何基于混合Datalog语义有效地实现跨步交互，该语义将自底向上的数据库添加到自顶向下的操作语义中。我们按照这些语义为core Datalog实现了一个调试器，并解释了如何采用它来调试souffl和IncA的前端语言。我们的评估表明，我们的混合Datalog语义可以用于调试具有实际工作负载的真实Datalog程序。

{"title":"Interactive Debugging of Datalog Programs","authors":"André Pacak, Sebastian Erdweg","doi":"10.1145/3622824","DOIUrl":"https://doi.org/10.1145/3622824","url":null,"abstract":"Datalog is used for complex programming tasks nowadays, consisting of numerous inter-dependent predicates. But Datalog lacks interactive debugging techniques that support the stepwise execution and inspection of the execution state. In this paper, we propose interactive debugging of Datalog programs following a top-down evaluation strategy called recursive query/subquery. While the recursive query/subquery approach is well-known in the literature, we are the first to provide a complete programming-language semantics based on it. Specifically, we develop the first small-step operational semantics for top-down Datalog, where subqueries occur as nested intermediate terms. The small-step semantics forms the basis of step-into interactions in the debugger. Moreover, we show how step-over interactions can be realized efficiently based on a hybrid Datalog semantics that adds a bottom-up database to our top-down operational semantics. We implemented a debugger for core Datalog following these semantics and explain how to adopt it for debugging the frontend languages of Soufflé and IncA. Our evaluation shows that our hybrid Datalog semantics can be used to debug real-world Datalog programs with realistic workloads.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136112798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0