Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation最新文献_第2页

Question selection for interactive program synthesis 交互式程序合成的问题选择

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2020-06-11 DOI: 10.1145/3385412.3386025

Ruyi Ji, Jingjing Liang, Yingfei Xiong, Lu Zhang, Zhenjiang Hu

Interactive program synthesis aims to solve the ambiguity in specifications, and selecting the proper question to minimize the rounds of interactions is critical to the performance of interactive program synthesis. In this paper we address this question selection problem and propose two algorithms. SampleSy approximates a state-of-the-art strategy proposed for optimal decision tree and has a short response time to enable interaction. EpsSy further reduces the rounds of interactions by approximating SampleSy with a bounded error rate. To implement the two algorithms, we further propose VSampler, an approach to sampling programs from a probabilistic context-free grammar based on version space algebra. The evaluation shows the effectiveness of both algorithms.

交互式程序综合旨在解决规范中的歧义问题，选择合适的问题以减少交互的回合数是交互式程序综合性能的关键。在本文中，我们解决了这一问题，并提出了两种算法。SampleSy近似于最优决策树提出的最先进策略，并且具有较短的响应时间以实现交互。EpsSy通过用有界错误率近似SampleSy进一步减少了交互的轮数。为了实现这两种算法，我们进一步提出了VSampler，一种基于版本空间代数的概率上下文无关语法的程序采样方法。仿真结果表明了两种算法的有效性。

引用次数: 23

First-order quantified separators 一阶量化分隔符

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2020-06-11 DOI: 10.1145/3385412.3386018

Jason R. Koenig, O. Padon, N. Immerman, A. Aiken

Quantified first-order formulas, often with quantifier alternations, are increasingly used in the verification of complex systems. While automated theorem provers for first-order logic are becoming more robust, invariant inference tools that handle quantifiers are currently restricted to purely universal formulas. We define and analyze first-order quantified separators and their application to inferring quantified invariants with alternations. A separator for a given set of positively and negatively labeled structures is a formula that is true on positive structures and false on negative structures. We investigate the problem of finding a separator from the class of formulas in prenex normal form with a bounded number of quantifiers and show this problem is NP-complete by reduction to and from SAT. We also give a practical separation algorithm, which we use to demonstrate the first invariant inference procedure able to infer invariants with quantifier alternations.

量化的一阶公式在复杂系统的验证中越来越多地使用，通常带有量词的变化。虽然一阶逻辑的自动定理证明变得越来越健壮，但处理量词的不变推理工具目前仅限于纯通用公式。我们定义并分析了一阶量化分隔符及其在推断有变的量化不变量中的应用。给定一组正、负标记结构的分隔符是一个公式，该公式在正结构上为真，在负结构上为假。我们研究了从一类具有有限数量量词的前缀范式公式中找到一个分隔符的问题，并通过与SAT的约简证明了这个问题是np完全的。我们还给出了一个实用的分离算法，我们用它来证明第一个能够推断具有量词变化的不变量的不变推理过程。

引用次数: 23

Type error feedback via analytic program repair 类型错误反馈通过分析程序修复

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2020-06-11 DOI: 10.1145/3385412.3386005

Georgios Sakkas, Madeline Endres, B. Cosman, Westley Weimer, Ranjit Jhala

We introduce Analytic Program Repair, a data-driven strategy for providing feedback for type-errors via repairs for the erroneous program. Our strategy is based on insight that similar errors have similar repairs. Thus, we show how to use a training dataset of pairs of ill-typed programs and their fixed versions to: (1) learn a collection of candidate repair templates by abstracting and partitioning the edits made in the training set into a representative set of templates; (2) predict the appropriate template from a given error, by training multi-class classifiers on the repair templates used in the training set; (3) synthesize a concrete repair from the template by enumerating and ranking correct (e.g. well-typed) terms matching the predicted template. We have implemented our approach in Rite: a type error reporting tool for OCaml programs. We present an evaluation of the accuracy and efficiency of Rite on a corpus of 4,500 ill-typed Ocaml programs drawn from two instances of an introductory programming course, and a user-study of the quality of the generated error messages that shows the locations and final repair quality to be better than the state-of-the-art tool in a statistically-significant manner.

我们介绍了分析程序修复，这是一种数据驱动的策略，通过对错误程序的修复为类型错误提供反馈。我们的策略是基于类似的错误有类似的修复。因此，我们展示了如何使用病态程序对及其固定版本的训练数据集:(1)通过将训练集中的编辑抽象并划分为具有代表性的模板集来学习候选修复模板的集合;(2)通过在训练集中使用的修复模板上训练多类分类器，从给定的错误中预测出合适的模板;(3)通过枚举和排序与预测模板匹配的正确(例如良好类型)术语，从模板中合成具体修复。我们已经在Rite中实现了我们的方法:一个用于OCaml程序的类型错误报告工具。我们在一个编程入门课程的两个实例中对4,500个错误类型的Ocaml程序的语料进行了Rite的准确性和效率评估，并对生成的错误信息的质量进行了用户研究，该研究以统计显著的方式显示了位置和最终修复质量优于最先进的工具。

{"title":"Type error feedback via analytic program repair","authors":"Georgios Sakkas, Madeline Endres, B. Cosman, Westley Weimer, Ranjit Jhala","doi":"10.1145/3385412.3386005","DOIUrl":"https://doi.org/10.1145/3385412.3386005","url":null,"abstract":"We introduce Analytic Program Repair, a data-driven strategy for providing feedback for type-errors via repairs for the erroneous program. Our strategy is based on insight that similar errors have similar repairs. Thus, we show how to use a training dataset of pairs of ill-typed programs and their fixed versions to: (1) learn a collection of candidate repair templates by abstracting and partitioning the edits made in the training set into a representative set of templates; (2) predict the appropriate template from a given error, by training multi-class classifiers on the repair templates used in the training set; (3) synthesize a concrete repair from the template by enumerating and ranking correct (e.g. well-typed) terms matching the predicted template. We have implemented our approach in Rite: a type error reporting tool for OCaml programs. We present an evaluation of the accuracy and efficiency of Rite on a corpus of 4,500 ill-typed Ocaml programs drawn from two instances of an introductory programming course, and a user-study of the quality of the generated error messages that shows the locations and final repair quality to be better than the state-of-the-art tool in a statistically-significant manner.","PeriodicalId":20580,"journal":{"name":"Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73024820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

Debugging and detecting numerical errors in computation with posits 调试和检测位计算中的数值误差

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2020-06-11 DOI: 10.1145/3385412.3386004

Sangeeta Chowdhary, Jay P. Lim, Santosh Nagarakatte

Posit is a recently proposed alternative to the floating point representation (FP). It provides tapered accuracy. Given a fixed number of bits, the posit representation can provide better precision for some numbers compared to FP, which has generated significant interest in numerous domains. Being a representation with tapered accuracy, it can introduce high rounding errors for numbers outside the above golden zone. Programmers currently lack tools to detect and debug errors while programming with posits. This paper presents PositDebug, a compile-time instrumentation that performs shadow execution with high precision values to detect various errors in computation using posits. To assist the programmer in debugging the reported error, PositDebug also provides directed acyclic graphs of instructions, which are likely responsible for the error. A contribution of this paper is the design of the metadata per memory location for shadow execution that enables productive debugging of errors with long-running programs. We have used PositDebug to detect and debug errors in various numerical applications written using posits. To demonstrate that these ideas are applicable even for FP programs, we have built a shadow execution framework for FP programs that is an order of magnitude faster than Herbgrind.

Posit是最近提出的浮点表示(FP)的替代方案。它提供锥形精度。给定固定的位数，与FP相比，正位表示法可以为某些数字提供更好的精度，这在许多领域引起了极大的兴趣。作为一种具有锥形精度的表示，它可以为上述黄金区域以外的数字引入很高的舍入误差。程序员目前缺乏在使用位置编程时检测和调试错误的工具。本文介绍了PositDebug，一种编译时工具，使用高精度值执行影子执行，以检测计算中的各种错误。为了帮助程序员调试报告的错误，PositDebug还提供了可能导致错误的有向无循环指令图。本文的一个贡献是设计了用于影子执行的每个内存位置的元数据，从而能够有效地调试长时间运行的程序的错误。我们已经使用PositDebug来检测和调试使用位置编写的各种数值应用程序中的错误。为了证明这些想法甚至适用于FP程序，我们为FP程序构建了一个影子执行框架，它比Herbgrind快一个数量级。

{"title":"Debugging and detecting numerical errors in computation with posits","authors":"Sangeeta Chowdhary, Jay P. Lim, Santosh Nagarakatte","doi":"10.1145/3385412.3386004","DOIUrl":"https://doi.org/10.1145/3385412.3386004","url":null,"abstract":"Posit is a recently proposed alternative to the floating point representation (FP). It provides tapered accuracy. Given a fixed number of bits, the posit representation can provide better precision for some numbers compared to FP, which has generated significant interest in numerous domains. Being a representation with tapered accuracy, it can introduce high rounding errors for numbers outside the above golden zone. Programmers currently lack tools to detect and debug errors while programming with posits. This paper presents PositDebug, a compile-time instrumentation that performs shadow execution with high precision values to detect various errors in computation using posits. To assist the programmer in debugging the reported error, PositDebug also provides directed acyclic graphs of instructions, which are likely responsible for the error. A contribution of this paper is the design of the metadata per memory location for shadow execution that enables productive debugging of errors with long-running programs. We have used PositDebug to detect and debug errors in various numerical applications written using posits. To demonstrate that these ideas are applicable even for FP programs, we have built a shadow execution framework for FP programs that is an order of magnitude faster than Herbgrind.","PeriodicalId":20580,"journal":{"name":"Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78555751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Verifying concurrent search structure templates 验证并发搜索结构模板

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2020-06-11 DOI: 10.1145/3385412.3386029

Siddharth Krishna, Nisarg Patel, D. Shasha

Concurrent separation logics have had great success reasoning about concurrent data structures. This success stems from their application of modularity on multiple levels, leading to proofs that are decomposed according to program structure, program state, and individual threads. Despite these advances, it remains difficult to achieve proof reuse across different data structure implementations. For the large class of search structures, we demonstrate how one can achieve further proof modularity by decoupling the proof of thread safety from the proof of structural integrity. We base our work on the template algorithms of Shasha and Goodman that dictate how threads interact but abstract from the concrete layout of nodes in memory. Building on the recently proposed flow framework of compositional abstractions and the separation logic Iris, we show how to prove correctness of template algorithms, and how to instantiate them to obtain multiple verified implementations. We demonstrate our approach by mechanizing the proofs of three concurrent search structure templates, based on link, give-up, and lock-coupling synchronization, and deriving verified implementations based on B-trees, hash tables, and linked lists. These case studies include algorithms used in real-world file systems and databases, which have been beyond the capability of prior automated or mechanized verification techniques. In addition, our approach reduces proof complexity and is able to achieve significant proof reuse.

并发分离逻辑在并发数据结构推理方面取得了巨大成功。这种成功源于他们在多个层次上对模块化的应用，导致了根据程序结构、程序状态和单个线程进行分解的证明。尽管取得了这些进步，但跨不同数据结构实现实现证明重用仍然很困难。对于大型搜索结构类，我们演示了如何通过将线程安全性的证明与结构完整性的证明解耦来实现进一步的证明模块化。我们的工作基于Shasha和Goodman的模板算法，该算法规定了线程如何交互，但从内存中节点的具体布局中抽象出来。基于最近提出的组合抽象流框架和分离逻辑Iris，我们展示了如何证明模板算法的正确性，以及如何实例化它们以获得多个经过验证的实现。我们通过机械化三个并发搜索结构模板的证明(基于链接、放弃和锁耦合同步)来演示我们的方法，并基于b树、哈希表和链表派生经过验证的实现。这些案例研究包括在现实世界的文件系统和数据库中使用的算法，这些算法已经超出了以前的自动化或机械化验证技术的能力。此外，我们的方法降低了证明的复杂性，并能够实现显著的证明重用。

{"title":"Verifying concurrent search structure templates","authors":"Siddharth Krishna, Nisarg Patel, D. Shasha","doi":"10.1145/3385412.3386029","DOIUrl":"https://doi.org/10.1145/3385412.3386029","url":null,"abstract":"Concurrent separation logics have had great success reasoning about concurrent data structures. This success stems from their application of modularity on multiple levels, leading to proofs that are decomposed according to program structure, program state, and individual threads. Despite these advances, it remains difficult to achieve proof reuse across different data structure implementations. For the large class of search structures, we demonstrate how one can achieve further proof modularity by decoupling the proof of thread safety from the proof of structural integrity. We base our work on the template algorithms of Shasha and Goodman that dictate how threads interact but abstract from the concrete layout of nodes in memory. Building on the recently proposed flow framework of compositional abstractions and the separation logic Iris, we show how to prove correctness of template algorithms, and how to instantiate them to obtain multiple verified implementations. We demonstrate our approach by mechanizing the proofs of three concurrent search structure templates, based on link, give-up, and lock-coupling synchronization, and deriving verified implementations based on B-trees, hash tables, and linked lists. These case studies include algorithms used in real-world file systems and databases, which have been beyond the capability of prior automated or mechanized verification techniques. In addition, our approach reduces proof complexity and is able to achieve significant proof reuse.","PeriodicalId":20580,"journal":{"name":"Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"103 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90584749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

From folklore to fact: comparing implementations of stacks and continuations 从民间传说到事实:比较堆栈和延续的实现

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2020-06-11 DOI: 10.1145/3385412.3385994

K. Farvardin, John H. Reppy

The efficient implementation of function calls and non-local control transfers is a critical part of modern language implementations and is important in the implementation of everything from recursion, higher-order functions, concurrency and coroutines, to task-based parallelism. In a compiler, these features can be supported by a variety of mechanisms, including call stacks, segmented stacks, and heap-allocated continuation closures. An implementor of a high-level language with advanced control features might ask the question ``what is the best choice for my implementation?'' Unfortunately, the current literature does not provide much guidance, since previous studies suffer from various flaws in methodology and are outdated for modern hardware. In the absence of recent, well-normalized measurements and a holistic overview of their implementation specifics, the path of least resistance when choosing a strategy is to trust folklore, but the folklore is also suspect. This paper attempts to remedy this situation by providing an ``apples-to-apples'' comparison of six different approaches to implementing call stacks and continuations. This comparison uses the same source language, compiler pipeline, LLVM-backend, and runtime system, with the only differences being those required by the differences in implementation strategy. We compare the implementation challenges of the different approaches, their sequential performance, and their suitability to support advanced control mechanisms, including supporting heavily threaded code. In addition to the comparison of implementation strategies, the paper's contributions also include a number of useful implementation techniques that we discovered along the way.

函数调用和非局部控制传输的有效实现是现代语言实现的关键部分，在实现从递归、高阶函数、并发和协程到基于任务的并行性的所有内容中都很重要。在编译器中，可以通过各种机制来支持这些特性，包括调用堆栈、分段堆栈和堆分配的延续闭包。具有高级控制功能的高级语言的实现者可能会问这样的问题:“我的实现的最佳选择是什么?”不幸的是，目前的文献并没有提供太多的指导，因为以前的研究在方法论上存在各种缺陷，并且对于现代硬件来说已经过时了。在缺乏最近的、规范化的测量和对其实施细节的整体概述的情况下，在选择策略时阻力最小的途径是相信民间传说，但民间传说也是可疑的。本文试图通过对实现调用堆栈和延续的六种不同方法进行“苹果对苹果”的比较来纠正这种情况。这种比较使用相同的源语言、编译器管道、llvm后端和运行时系统，唯一的区别是实现策略的差异。我们比较了不同方法的实现挑战，它们的顺序性能，以及它们是否适合支持高级控制机制，包括支持高线程代码。除了对实现策略的比较之外，本文的贡献还包括我们在此过程中发现的一些有用的实现技术。

{"title":"From folklore to fact: comparing implementations of stacks and continuations","authors":"K. Farvardin, John H. Reppy","doi":"10.1145/3385412.3385994","DOIUrl":"https://doi.org/10.1145/3385412.3385994","url":null,"abstract":"The efficient implementation of function calls and non-local control transfers is a critical part of modern language implementations and is important in the implementation of everything from recursion, higher-order functions, concurrency and coroutines, to task-based parallelism. In a compiler, these features can be supported by a variety of mechanisms, including call stacks, segmented stacks, and heap-allocated continuation closures. An implementor of a high-level language with advanced control features might ask the question ``what is the best choice for my implementation?'' Unfortunately, the current literature does not provide much guidance, since previous studies suffer from various flaws in methodology and are outdated for modern hardware. In the absence of recent, well-normalized measurements and a holistic overview of their implementation specifics, the path of least resistance when choosing a strategy is to trust folklore, but the folklore is also suspect. This paper attempts to remedy this situation by providing an ``apples-to-apples'' comparison of six different approaches to implementing call stacks and continuations. This comparison uses the same source language, compiler pipeline, LLVM-backend, and runtime system, with the only differences being those required by the differences in implementation strategy. We compare the implementation challenges of the different approaches, their sequential performance, and their suitability to support advanced control mechanisms, including supporting heavily threaded code. In addition to the comparison of implementation strategies, the paper's contributions also include a number of useful implementation techniques that we discovered along the way.","PeriodicalId":20580,"journal":{"name":"Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"41 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88401975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Adaptive low-overhead scheduling for periodic and reactive intermittent execution 用于周期性和反应性间歇执行的自适应低开销调度

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2020-06-11 DOI: 10.1145/3385412.3385998

Kiwan Maeng, Brandon Lucia

Batteryless energy-harvesting devices eliminate the need in batteries for deployed sensor systems, enabling longer lifetime and easier maintenance. However, such devices cannot support an event-driven execution model (e.g., periodic or reactive execution), restricting the use cases and hampering real-world deployment. Without knowing exactly how much energy can be harvested in the future, robustly scheduling periodic and reactive workloads is challenging. We introduce CatNap, an event-driven energy-harvesting system with a new programming model that asks the programmer to express a subset of the code that is time-critical. CatNap isolates and reserves energy for the time-critical code, reliably executing it on schedule while deferring execution of the rest of the code. CatNap degrades execution quality when a decrease in the incoming power renders it impossible to maintain its schedule. Our evaluation on a real energy-harvesting setup shows that CatNap works well with end-to-end, real-world deployment settings. CatNap reliably runs periodic events when a prior system misses the deadline by 7.3x and supports reactive applications with a 100% success rate when a prior work shows less than a 2% success rate.

无电池能量收集设备消除了部署传感器系统对电池的需求，实现了更长的使用寿命和更容易维护。然而，这些设备不能支持事件驱动的执行模型(例如，周期性或响应性执行)，从而限制了用例并阻碍了实际部署。在不确切知道未来可以收获多少能量的情况下，对周期性和响应性工作负载进行健壮的调度是具有挑战性的。我们介绍了CatNap，这是一个事件驱动的能量收集系统，它具有一个新的编程模型，要求程序员表达时间关键的代码子集。CatNap为时间关键代码隔离并保留能量，在延迟其他代码执行的同时可靠地按计划执行它。当输入功率下降导致无法维持其调度时，CatNap会降低执行质量。我们对一个真实的能量收集设置的评估表明，CatNap在端到端、真实世界的部署设置中工作得很好。当先前的系统错过截止日期的7.3倍时，CatNap可靠地运行周期性事件，并且当先前的工作显示低于2%的成功率时，CatNap支持具有100%成功率的响应性应用程序。

{"title":"Adaptive low-overhead scheduling for periodic and reactive intermittent execution","authors":"Kiwan Maeng, Brandon Lucia","doi":"10.1145/3385412.3385998","DOIUrl":"https://doi.org/10.1145/3385412.3385998","url":null,"abstract":"Batteryless energy-harvesting devices eliminate the need in batteries for deployed sensor systems, enabling longer lifetime and easier maintenance. However, such devices cannot support an event-driven execution model (e.g., periodic or reactive execution), restricting the use cases and hampering real-world deployment. Without knowing exactly how much energy can be harvested in the future, robustly scheduling periodic and reactive workloads is challenging. We introduce CatNap, an event-driven energy-harvesting system with a new programming model that asks the programmer to express a subset of the code that is time-critical. CatNap isolates and reserves energy for the time-critical code, reliably executing it on schedule while deferring execution of the rest of the code. CatNap degrades execution quality when a decrease in the incoming power renders it impossible to maintain its schedule. Our evaluation on a real energy-harvesting setup shows that CatNap works well with end-to-end, real-world deployment settings. CatNap reliably runs periodic events when a prior system misses the deadline by 7.3x and supports reactive applications with a 100% success rate when a prior work shows less than a 2% success rate.","PeriodicalId":20580,"journal":{"name":"Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"69 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90906585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 50

Improving program locality in the GC using hotness 利用热度改进GC中的程序局部性

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2020-06-11 DOI: 10.1145/3385412.3385977

A. Yang, Erik Österlund, Tobias Wrigstad

The hierarchical memory system with increasingly small and increasingly fast memory closer to the CPU has for long been at the heart of hiding, or mitigating the performance gap between memories and processors. To utilise this hardware, programs must be written to exhibit good object locality. In languages like C/C++, programmers can carefully plan how objects should be laid out (albeit time consuming and error-prone); for managed languages, especially ones with moving garbage collectors, a manually created optimal layout may be destroyed in the process of object relocation. For managed languages that present an abstract view of memory, the solution lies in making the garbage collector aware of object locality, and strive to achieve and maintain good locality, even in the face of multi-phased programs that exhibit different behaviour across different phases. This paper presents a GC design that dynamically reorganises objects in the order mutators access them, and additionally strives to separate frequently and infrequently used objects in memory. This improves locality and the efficiency of hardware prefetching. Identifying frequently used objects is done at run-time, with small overhead. HCSGC also offers tunability, for shifting relocation work towards mutators, or for more or less aggressive object relocation. The ideas are evaluated in the context of the ZGC collector on OpenJDK and yields performance improvements of 5% (tradebeans), 9% (h2) and an impressive 25–45% (JGraphT), all with 95% confidence. For SPECjbb, results are inconclusive due to a fluctuating baseline.

分级存储系统的内存越来越小，越来越快，更靠近CPU，长期以来一直是隐藏或减轻内存和处理器之间性能差距的核心。为了利用这种硬件，必须编写程序以显示良好的对象局部性。在像C/ c++这样的语言中，程序员可以仔细规划对象应该如何布局(尽管耗时且容易出错);对于托管语言，特别是具有移动垃圾收集器的语言，手动创建的最佳布局可能会在对象重定位过程中被破坏。对于呈现抽象内存视图的托管语言，解决方案在于使垃圾收集器意识到对象局部性，并努力实现和保持良好的局部性，即使面对在不同阶段表现出不同行为的多阶段程序。本文提出了一种动态重组对象的GC设计方法，该方法根据变量访问对象的顺序对对象进行动态重组，并在内存中努力分离频繁使用和不频繁使用的对象。这提高了局部性和硬件预取的效率。识别经常使用的对象是在运行时完成的，开销很小。HCSGC还提供了可调性，用于将重定位工作转移到突变体，或多或少地进行主动对象重定位。这些想法在OpenJDK上的ZGC收集器的上下文中进行了评估，并产生了5% (tradebeans)， 9% (h2)和令人印象深刻的25-45% (JGraphT)的性能改进，所有这些都有95%的置信度。对于SPECjbb，由于基线的波动，结果是不确定的。

{"title":"Improving program locality in the GC using hotness","authors":"A. Yang, Erik Österlund, Tobias Wrigstad","doi":"10.1145/3385412.3385977","DOIUrl":"https://doi.org/10.1145/3385412.3385977","url":null,"abstract":"The hierarchical memory system with increasingly small and increasingly fast memory closer to the CPU has for long been at the heart of hiding, or mitigating the performance gap between memories and processors. To utilise this hardware, programs must be written to exhibit good object locality. In languages like C/C++, programmers can carefully plan how objects should be laid out (albeit time consuming and error-prone); for managed languages, especially ones with moving garbage collectors, a manually created optimal layout may be destroyed in the process of object relocation. For managed languages that present an abstract view of memory, the solution lies in making the garbage collector aware of object locality, and strive to achieve and maintain good locality, even in the face of multi-phased programs that exhibit different behaviour across different phases. This paper presents a GC design that dynamically reorganises objects in the order mutators access them, and additionally strives to separate frequently and infrequently used objects in memory. This improves locality and the efficiency of hardware prefetching. Identifying frequently used objects is done at run-time, with small overhead. HCSGC also offers tunability, for shifting relocation work towards mutators, or for more or less aggressive object relocation. The ideas are evaluated in the context of the ZGC collector on OpenJDK and yields performance improvements of 5% (tradebeans), 9% (h2) and an impressive 25–45% (JGraphT), all with 95% confidence. For SPECjbb, results are inconclusive due to a fluctuating baseline.","PeriodicalId":20580,"journal":{"name":"Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"41 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85962213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

SympleGraph: distributed graph processing with precise loop-carried dependency guarantee SympleGraph:分布式图形处理，具有精确的循环承载依赖性保证

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2020-06-11 DOI: 10.1145/3385412.3385961

Youwei Zhuo, Jingji Chen, Qinyi Luo, Yanzhi Wang, Hailong Yang, D. Qian, Xuehai Qian

Graph analytics is an important way to understand relationships in real-world applications. At the age of big data, graphs have grown to billions of edges. This motivates distributed graph processing. Graph processing frameworks ask programmers to specify graph computations in user- defined functions (UDFs) of graph-oriented programming model. Due to the nature of distributed execution, current frameworks cannot precisely enforce the semantics of UDFs, leading to unnecessary computation and communication. In essence, there exists a gap between programming model and runtime execution. This paper proposes SympleGraph, a novel distributed graph processing framework that precisely enforces loop-carried dependency, i.e., when a condition is satisfied by a neighbor, all following neighbors can be skipped. SympleGraph instruments the UDFs to express the loop-carried dependency, then the distributed execution framework enforces the precise semantics by performing dependency propagation dynamically. Enforcing loop-carried dependency requires the sequential processing of the neighbors of each vertex distributed in different nodes. Therefore, the major challenge is to enable sufficient parallelism to achieve high performance. We propose to use circulant scheduling in the framework to allow different machines to process disjoint sets of edges/vertices in parallel while satisfying the sequential requirement. It achieves a good trade-off between precise semantics and parallelism. The significant speedups in most graphs and algorithms indicate that the benefits of eliminating unnecessary computation and communication overshadow the reduced parallelism. Communication efficiency is further optimized by 1) selectively propagating dependency for large-degree vertices to increase net benefits; 2) double buffering to hide communication latency. In a 16-node cluster, SympleGraph outperforms the state-of-the-art system Gemini and D-Galois on average by 1.42× and 3.30×, and up to 2.30× and 7.76×, respectively. The communication reduction compared to Gemini is 40.95% on average and up to 67.48%.

图分析是理解实际应用程序中关系的重要方法。在大数据时代，图形已经发展到数十亿条边。这激发了分布式图形处理。图处理框架要求程序员在面向图的编程模型的用户定义函数(udf)中指定图计算。由于分布式执行的特性，当前的框架不能精确地执行udf的语义，从而导致不必要的计算和通信。实际上，在编程模型和运行时执行之间存在着差距。本文提出了一种新的分布式图处理框架SympleGraph，它精确地执行了环携带依赖性，即当一个邻居满足一个条件时，所有后续的邻居都可以被跳过。SympleGraph使用udf来表示循环携带的依赖，然后分布式执行框架通过动态执行依赖传播来强制执行精确的语义。强制循环依赖要求对分布在不同节点的每个顶点的邻居进行顺序处理。因此，主要的挑战是启用足够的并行性来实现高性能。我们建议在框架中使用循环调度，以允许不同的机器在满足顺序要求的同时并行处理不相交的边/顶点集。它在精确语义和并行性之间实现了很好的平衡。大多数图和算法的显著加速表明，消除不必要的计算和通信的好处掩盖了减少的并行性。进一步优化通信效率:1)有选择地传播大程度顶点的依赖关系，增加净效益;2)双重缓冲，隐藏通信延迟。在16节点集群中，SympleGraph比最先进的Gemini和D-Galois系统平均分别高出1.42倍和3.30倍，最高可达2.30倍和7.76倍。与双子座相比，交流减少平均为40.95%，最高可达67.48%。

{"title":"SympleGraph: distributed graph processing with precise loop-carried dependency guarantee","authors":"Youwei Zhuo, Jingji Chen, Qinyi Luo, Yanzhi Wang, Hailong Yang, D. Qian, Xuehai Qian","doi":"10.1145/3385412.3385961","DOIUrl":"https://doi.org/10.1145/3385412.3385961","url":null,"abstract":"Graph analytics is an important way to understand relationships in real-world applications. At the age of big data, graphs have grown to billions of edges. This motivates distributed graph processing. Graph processing frameworks ask programmers to specify graph computations in user- defined functions (UDFs) of graph-oriented programming model. Due to the nature of distributed execution, current frameworks cannot precisely enforce the semantics of UDFs, leading to unnecessary computation and communication. In essence, there exists a gap between programming model and runtime execution. This paper proposes SympleGraph, a novel distributed graph processing framework that precisely enforces loop-carried dependency, i.e., when a condition is satisfied by a neighbor, all following neighbors can be skipped. SympleGraph instruments the UDFs to express the loop-carried dependency, then the distributed execution framework enforces the precise semantics by performing dependency propagation dynamically. Enforcing loop-carried dependency requires the sequential processing of the neighbors of each vertex distributed in different nodes. Therefore, the major challenge is to enable sufficient parallelism to achieve high performance. We propose to use circulant scheduling in the framework to allow different machines to process disjoint sets of edges/vertices in parallel while satisfying the sequential requirement. It achieves a good trade-off between precise semantics and parallelism. The significant speedups in most graphs and algorithms indicate that the benefits of eliminating unnecessary computation and communication overshadow the reduced parallelism. Communication efficiency is further optimized by 1) selectively propagating dependency for large-degree vertices to increase net benefits; 2) double buffering to hide communication latency. In a 16-node cluster, SympleGraph outperforms the state-of-the-art system Gemini and D-Galois on average by 1.42× and 3.30×, and up to 2.30× and 7.76×, respectively. The communication reduction compared to Gemini is 40.95% on average and up to 67.48%.","PeriodicalId":20580,"journal":{"name":"Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"1102 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76745372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Inductive sequentialization of asynchronous programs 异步程序的感应顺序化

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2020-06-11 DOI: 10.1145/3385412.3385980

Bernhard Kragl, C. Enea, T. Henzinger, Suha Orhun Mutluergil, S. Qadeer

Asynchronous programs are notoriously difficult to reason about because they spawn computation tasks which take effect asynchronously in a nondeterministic way. Devising inductive invariants for such programs requires understanding and stating complex relationships between an unbounded number of computation tasks in arbitrarily long executions. In this paper, we introduce inductive sequentialization, a new proof rule that sidesteps this complexity via a sequential reduction, a sequential program that captures every behavior of the original program up to reordering of coarse-grained commutative actions. A sequential reduction of a concurrent program is easy to reason about since it corresponds to a simple execution of the program in an idealized synchronous environment, where processes act in a fixed order and at the same speed. We have implemented and integrated our proof rule in the CIVL verifier, allowing us to provably derive fine-grained implementations of asynchronous programs. We have successfully applied our proof rule to a diverse set of message-passing protocols, including leader election protocols, two-phase commit, and Paxos.

众所周知，异步程序很难推理，因为它们产生的计算任务以不确定的方式异步生效。为这样的程序设计归纳不变量需要理解和说明在任意长时间执行的无限数量的计算任务之间的复杂关系。在本文中，我们引入了归纳序列化，这是一种新的证明规则，它通过顺序约简来避免这种复杂性，这是一种顺序程序，它捕获了原始程序的每个行为，直到粗粒度交换动作的重新排序。并行程序的顺序缩减很容易理解，因为它对应于理想同步环境中程序的简单执行，其中进程以固定的顺序和相同的速度运行。我们已经在CIVL验证器中实现并集成了我们的证明规则，允许我们证明地派生出异步程序的细粒度实现。我们已经成功地将我们的证明规则应用于一组不同的消息传递协议，包括领导者选举协议、两阶段提交和Paxos。

{"title":"Inductive sequentialization of asynchronous programs","authors":"Bernhard Kragl, C. Enea, T. Henzinger, Suha Orhun Mutluergil, S. Qadeer","doi":"10.1145/3385412.3385980","DOIUrl":"https://doi.org/10.1145/3385412.3385980","url":null,"abstract":"Asynchronous programs are notoriously difficult to reason about because they spawn computation tasks which take effect asynchronously in a nondeterministic way. Devising inductive invariants for such programs requires understanding and stating complex relationships between an unbounded number of computation tasks in arbitrarily long executions. In this paper, we introduce inductive sequentialization, a new proof rule that sidesteps this complexity via a sequential reduction, a sequential program that captures every behavior of the original program up to reordering of coarse-grained commutative actions. A sequential reduction of a concurrent program is easy to reason about since it corresponds to a simple execution of the program in an idealized synchronous environment, where processes act in a fixed order and at the same speed. We have implemented and integrated our proof rule in the CIVL verifier, allowing us to provably derive fine-grained implementations of asynchronous programs. We have successfully applied our proof rule to a diverse set of message-passing protocols, including leader election protocols, two-phase commit, and Paxos.","PeriodicalId":20580,"journal":{"name":"Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"87 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75452840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18