Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation最新文献_第2页

Task parallel assembly language for uncompromising parallelism 任务并行汇编语言的不妥协的并行性

Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation

Pub Date : 2021-06-19 DOI: 10.1145/3453483.3460969

Mike Rainey, Kyle C. Hale, Nikos Hardavellas, Simone Campanoni, Umut A. Acar

Achieving parallel performance and scalability involves making compromises between parallel and sequential computation. If not contained, the overheads of parallelism can easily outweigh its benefits, sometimes by orders of magnitude. Today, we expect programmers to implement this compromise by optimizing their code manually. This process is labor intensive, requires deep expertise, and reduces code quality. Recent work on heartbeat scheduling shows a promising approach that manifests the potentially vast amounts of available, latent parallelism, at a regular rate, based on even beats in time. The idea is to amortize the overheads of parallelism over the useful work performed between the beats. Heartbeat scheduling is promising in theory, but the reality is complicated: it has no known practical implementation. In this paper, we propose a practical approach to heartbeat scheduling that involves equipping the assembly language with a small set of primitives. These primitives leverage existing kernel and hardware support for interrupts to allow parallelism to remain latent, until a heartbeat, when it can be manifested with low cost. Our Task Parallel Assembly Language (TPAL) is a compact, RISC-like assembly language. We specify TPAL through an abstract machine and implement the abstract machine as compiler transformations for C/C++ code and a specialized run-time system. We present an evaluation on both the Linux and the Nautilus kernels, considering a range of heartbeat interrupt mechanisms. The evaluation shows that TPAL can dramatically reduce the overheads of parallelism without compromising scalability.

实现并行性能和可伸缩性需要在并行计算和顺序计算之间做出妥协。如果不加以控制，并行性的开销很容易超过它的好处，有时甚至超过它的数量级。今天，我们期望程序员通过手动优化他们的代码来实现这种折衷。这个过程是劳动密集型的，需要深厚的专业知识，并且降低了代码质量。最近关于心跳调度的工作显示了一种很有前途的方法，该方法显示了潜在的大量可用性，潜在的并行性，以规律的速率，基于时间上的均匀心跳。其思想是将并行性的开销分摊到节拍之间执行的有用工作上。心跳调度在理论上很有希望，但现实很复杂:它没有已知的实际实现。在本文中，我们提出了一种实用的心跳调度方法，该方法包括为汇编语言配备一小组原语。这些原语利用现有的内核和硬件对中断的支持，使并行性保持潜伏状态，直到出现心跳时，才能以低成本表现出来。我们的任务并行汇编语言(TPAL)是一种紧凑的、类似risc的汇编语言。我们通过一个抽象机器来指定TPAL，并将抽象机器实现为C/ c++代码的编译器转换和一个专门的运行时系统。我们对Linux和Nautilus内核进行了评估，考虑了一系列心跳中断机制。评估表明，TPAL可以在不影响可伸缩性的情况下显著降低并行性的开销。

{"title":"Task parallel assembly language for uncompromising parallelism","authors":"Mike Rainey, Kyle C. Hale, Nikos Hardavellas, Simone Campanoni, Umut A. Acar","doi":"10.1145/3453483.3460969","DOIUrl":"https://doi.org/10.1145/3453483.3460969","url":null,"abstract":"Achieving parallel performance and scalability involves making compromises between parallel and sequential computation. If not contained, the overheads of parallelism can easily outweigh its benefits, sometimes by orders of magnitude. Today, we expect programmers to implement this compromise by optimizing their code manually. This process is labor intensive, requires deep expertise, and reduces code quality. Recent work on heartbeat scheduling shows a promising approach that manifests the potentially vast amounts of available, latent parallelism, at a regular rate, based on even beats in time. The idea is to amortize the overheads of parallelism over the useful work performed between the beats. Heartbeat scheduling is promising in theory, but the reality is complicated: it has no known practical implementation. In this paper, we propose a practical approach to heartbeat scheduling that involves equipping the assembly language with a small set of primitives. These primitives leverage existing kernel and hardware support for interrupts to allow parallelism to remain latent, until a heartbeat, when it can be manifested with low cost. Our Task Parallel Assembly Language (TPAL) is a compact, RISC-like assembly language. We specify TPAL through an abstract machine and implement the abstract machine as compiler transformations for C/C++ code and a specialized run-time system. We present an evaluation on both the Linux and the Nautilus kernels, considering a range of heartbeat interrupt mechanisms. The evaluation shows that TPAL can dramatically reduce the overheads of parallelism without compromising scalability.","PeriodicalId":20557,"journal":{"name":"Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":"os-48 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87244087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

CoStar: a verified ALL(*) parser CoStar:一个经过验证的ALL(*)解析器

Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation

Pub Date : 2021-06-19 DOI: 10.1145/3453483.3454053

Sam Lasser, Chris Casinghino, Kathleen Fisher, Cody Roux

Parsers are security-critical components of many software systems, and verified parsing therefore has a key role to play in secure software design. However, existing verified parsers for context-free grammars are limited in their expressiveness, termination properties, or performance characteristics. They are only compatible with a restricted class of grammars, they are not guaranteed to terminate on all inputs, or they are not designed to be performant on grammars for real-world programming languages and data formats. In this work, we present CoStar, a verified parser that addresses these limitations. The parser is implemented with the Coq Proof Assistant and is based on the ALL(*) parsing algorithm. CoStar is sound and complete for all non-left-recursive grammars; it produces a correct parse tree for its input whenever such a tree exists, and it correctly detects ambiguous inputs. CoStar also provides strong termination guarantees; it terminates without error on all inputs when applied to a non-left-recursive grammar. Finally, CoStar achieves linear-time performance on a range of unambiguous grammars for commonly used languages and data formats.

解析器是许多软件系统的安全关键组件，因此经过验证的解析在安全软件设计中起着关键作用。然而，针对上下文无关语法的现有经过验证的解析器在表达性、终止属性或性能特征方面受到限制。它们只与一类受限制的语法兼容，它们不能保证在所有输入上都终止，或者它们不是设计为在实际编程语言和数据格式的语法上表现良好。在这项工作中，我们提出了CoStar，一个经过验证的解析器，可以解决这些限制。解析器使用Coq Proof Assistant实现，并基于ALL(*)解析算法。CoStar对于所有非左递归语法都是健全和完备的;只要存在正确的解析树，它就会为其输入生成正确的解析树，并正确检测不明确的输入。CoStar还提供强有力的终止保证;当应用于非左递归语法时，它会在所有输入上无错误地终止。最后，CoStar在一系列常用语言和数据格式的明确语法上实现了线性时间性能。

{"title":"CoStar: a verified ALL(*) parser","authors":"Sam Lasser, Chris Casinghino, Kathleen Fisher, Cody Roux","doi":"10.1145/3453483.3454053","DOIUrl":"https://doi.org/10.1145/3453483.3454053","url":null,"abstract":"Parsers are security-critical components of many software systems, and verified parsing therefore has a key role to play in secure software design. However, existing verified parsers for context-free grammars are limited in their expressiveness, termination properties, or performance characteristics. They are only compatible with a restricted class of grammars, they are not guaranteed to terminate on all inputs, or they are not designed to be performant on grammars for real-world programming languages and data formats. In this work, we present CoStar, a verified parser that addresses these limitations. The parser is implemented with the Coq Proof Assistant and is based on the ALL(*) parsing algorithm. CoStar is sound and complete for all non-left-recursive grammars; it produces a correct parse tree for its input whenever such a tree exists, and it correctly detects ambiguous inputs. CoStar also provides strong termination guarantees; it terminates without error on all inputs when applied to a non-left-recursive grammar. Finally, CoStar achieves linear-time performance on a range of unambiguous grammars for commonly used languages and data formats.","PeriodicalId":20557,"journal":{"name":"Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84954556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Distance-in-time versus distance-in-space 时间距离和空间距离

Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation

Pub Date : 2021-06-18 DOI: 10.1145/3453483.3454069

M. Kandemir, Xulong Tang, Hui Zhao, Jihyun Ryoo, M. Karakoy

Cache behavior is one of the major factors that influence the performance of applications. Most of the existing compiler techniques that target cache memories focus exclusively on reducing data reuse distances in time (DIT). However, current manycore systems employ distributed on-chip caches that are connected using an on-chip network. As a result, a reused data element/block needs to travel over this on-chip network, and the distance to be traveled -- reuse distance in space (DIS) -- can be as influential in dictating application performance as reuse DIT. This paper represents the first attempt at defining a compiler framework that accommodates both DIT and DIS. Specifically, it first classifies data reuses into four groups: G1: (low DIT, low DIS), G2: (high DIT, low DIS), G3: (low DIT, high DIS), and G4: (high DIT, high DIS). Then, observing that reuses in G1 represent the ideal case and there is nothing much to be done in computations in G4, it proposes a "reuse transfer" strategy that transfers select reuses between G2 and G3, eventually, transforming each reuse to either G1 or G4. Finally, it evaluates the proposed strategy using a set of 10 multithreaded applications. The collected results reveal that the proposed strategy reduces parallel execution times of the tested applications between 19.3% and 33.3%.

缓存行为是影响应用程序性能的主要因素之一。大多数针对缓存内存的现有编译器技术都只关注于减少数据重用的时间间隔(DIT)。然而，目前的多核系统采用分布式片上缓存，这些缓存使用片上网络连接。因此，被重用的数据元素/块需要通过这个片上网络传输，而传输的距离——空间中的重用距离(DIS)——在决定应用程序性能方面的影响可能与重用DIT一样大。本文首次尝试定义一个既包含DIT又包含DIS的编译器框架。具体来说，它首先将数据重用分为四组:G1:(低DIT，低DIS)， G2:(高DIT，低DIS)， G3:(低DIT，高DIS)和G4:(高DIT，高DIS)。然后，观察到G1中的重用代表了理想的情况，并且G4中的计算没有太多要做的事情，它提出了一种“重用传输”策略，在G2和G3之间传输选定的重用，最终将每个重用转换为G1或G4。最后，它使用一组10个多线程应用程序来评估所建议的策略。收集的结果表明，该策略将测试应用程序的并行执行时间减少了19.3%至33.3%。

{"title":"Distance-in-time versus distance-in-space","authors":"M. Kandemir, Xulong Tang, Hui Zhao, Jihyun Ryoo, M. Karakoy","doi":"10.1145/3453483.3454069","DOIUrl":"https://doi.org/10.1145/3453483.3454069","url":null,"abstract":"Cache behavior is one of the major factors that influence the performance of applications. Most of the existing compiler techniques that target cache memories focus exclusively on reducing data reuse distances in time (DIT). However, current manycore systems employ distributed on-chip caches that are connected using an on-chip network. As a result, a reused data element/block needs to travel over this on-chip network, and the distance to be traveled -- reuse distance in space (DIS) -- can be as influential in dictating application performance as reuse DIT. This paper represents the first attempt at defining a compiler framework that accommodates both DIT and DIS. Specifically, it first classifies data reuses into four groups: G1: (low DIT, low DIS), G2: (high DIT, low DIS), G3: (low DIT, high DIS), and G4: (high DIT, high DIS). Then, observing that reuses in G1 represent the ideal case and there is nothing much to be done in computations in G4, it proposes a \"reuse transfer\" strategy that transfers select reuses between G2 and G3, eventually, transforming each reuse to either G1 or G4. Finally, it evaluates the proposed strategy using a set of 10 multithreaded applications. The collected results reveal that the proposed strategy reduces parallel execution times of the tested applications between 19.3% and 33.3%.","PeriodicalId":20557,"journal":{"name":"Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":"71 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85508824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Incremental whole-program analysis in Datalog with lattices 带格Datalog的增量全程序分析

Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation

Pub Date : 2021-06-18 DOI: 10.1145/3453483.3454026

Tamás Szabó, Sebastian Erdweg, Gábor Bergmann

Incremental static analyses provide up-to-date analysis results in time proportional to the size of a code change, not the entire code base. This promises fast feedback to programmers in IDEs and when checking in commits. However, existing incremental analysis frameworks fail to deliver on this promise for whole-program lattice-based data-flow analyses. In particular, prior Datalog-based frameworks yield good incremental performance only for intra-procedural analyses. In this paper, we first present a methodology to empirically test if a computation is amenable to incrementalization. Using this methodology, we find that incremental whole-program analysis may be possible. Second, we present a new incremental Datalog solver called LADDDER to eliminate the shortcomings of prior Datalog-based analysis frameworks. Our Datalog solver uses a non-standard aggregation semantics which allows us to loosen monotonicity requirements on analyses and to improve the performance of lattice aggregators considerably. Our evaluation on real-world Java code confirms that LADDDER provides up-to-date points-to, constant propagation, and interval information in milliseconds.

增量静态分析及时提供与代码更改大小成比例的最新分析结果，而不是整个代码库。这保证了在ide中以及检入提交时向程序员提供快速反馈。然而，现有的增量分析框架无法实现基于整个程序格的数据流分析的承诺。特别是，以前基于datalog的框架仅对过程内分析产生良好的增量性能。在本文中，我们首先提出了一种方法来实证检验计算是否适用于增量化。使用这种方法，我们发现增量全程序分析是可能的。其次，我们提出了一种新的增量数据表求解器LADDDER，以消除先前基于数据表的分析框架的缺点。我们的Datalog求解器使用一种非标准的聚合语义，这允许我们放松对分析的单调性要求，并大大提高了格聚合器的性能。我们对真实Java代码的评估证实，LADDDER以毫秒为单位提供了最新的指向点、恒定传播和间隔信息。

{"title":"Incremental whole-program analysis in Datalog with lattices","authors":"Tamás Szabó, Sebastian Erdweg, Gábor Bergmann","doi":"10.1145/3453483.3454026","DOIUrl":"https://doi.org/10.1145/3453483.3454026","url":null,"abstract":"Incremental static analyses provide up-to-date analysis results in time proportional to the size of a code change, not the entire code base. This promises fast feedback to programmers in IDEs and when checking in commits. However, existing incremental analysis frameworks fail to deliver on this promise for whole-program lattice-based data-flow analyses. In particular, prior Datalog-based frameworks yield good incremental performance only for intra-procedural analyses. In this paper, we first present a methodology to empirically test if a computation is amenable to incrementalization. Using this methodology, we find that incremental whole-program analysis may be possible. Second, we present a new incremental Datalog solver called LADDDER to eliminate the shortcomings of prior Datalog-based analysis frameworks. Our Datalog solver uses a non-standard aggregation semantics which allows us to loosen monotonicity requirements on analyses and to improve the performance of lattice aggregators considerably. Our evaluation on real-world Java code confirms that LADDDER provides up-to-date points-to, constant propagation, and interval information in milliseconds.","PeriodicalId":20557,"journal":{"name":"Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88889892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Wire sorts: a language abstraction for safe hardware composition 线排序:一种用于安全硬件组合的语言抽象

Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation

Pub Date : 2021-06-18 DOI: 10.1145/3453483.3454037

Michael Christensen, T. Sherwood, Jonathan Balkind, B. Hardekopf

Effective digital hardware design fundamentally requires decomposing a design into a set of interconnected modules, each a distinct unit of computation and state. However, naively connecting hardware modules leads to real-world pathological cases which are surprisingly far from obvious when looking at the interfaces alone and which are very difficult to debug after synthesis. We show for the first time that it is possible to soundly abstract even complex combinational dependencies of arbitrary hardware modules through the assignment of IO ports to one of four new sorts which we call: to-sync, to-port, from-sync, and from-port. This new taxonomy, and the reasoning it enables, facilitates modularity by escalating problematic aspects of module input/output interaction to the language-level interface specification. We formalize and prove the soundness of our new wire sorts, implement them in a practical hardware description language, and demonstrate they can be applied and even inferred automatically at scale. Through an examination of the BaseJump STL, the OpenPiton manycore research platform, and a complete RISC-V implementation, we find that even on our biggest design containing 1.5 million primitive gates, analysis takes less than 31 seconds; that across 172 unique modules analyzed, the inferred sorts are widely distributed across our taxonomy; and that by using wire sorts, our tool is 2.6–33.9x faster at finding loops than standard synthesis-time cycle detection.

有效的数字硬件设计从根本上需要将设计分解为一组相互连接的模块，每个模块都有不同的计算和状态单元。然而，天真地连接硬件模块会导致现实世界的病态情况，这些情况在单独查看接口时远不明显，并且在综合后很难调试。我们首次展示了通过将IO端口分配给我们称之为同步、端口、从同步和端口的四种新类型之一，可以很好地抽象任意硬件模块的复杂组合依赖关系。这种新的分类法及其支持的推理通过将模块输入/输出交互的问题方面升级到语言级接口规范来促进模块化。我们将形式化并证明我们的新线分类的合理性，用实用的硬件描述语言实现它们，并演示它们可以大规模地应用甚至自动推断。通过对BaseJump STL, OpenPiton多核研究平台和完整的RISC-V实现的检查，我们发现即使在包含150万个原始门的最大设计上，分析时间也不到31秒;在分析的172个独特模块中，推断的排序广泛分布在我们的分类法中;通过使用线排序，我们的工具在查找循环方面比标准的合成时间周期检测快2.6 - 33.9倍。

{"title":"Wire sorts: a language abstraction for safe hardware composition","authors":"Michael Christensen, T. Sherwood, Jonathan Balkind, B. Hardekopf","doi":"10.1145/3453483.3454037","DOIUrl":"https://doi.org/10.1145/3453483.3454037","url":null,"abstract":"Effective digital hardware design fundamentally requires decomposing a design into a set of interconnected modules, each a distinct unit of computation and state. However, naively connecting hardware modules leads to real-world pathological cases which are surprisingly far from obvious when looking at the interfaces alone and which are very difficult to debug after synthesis. We show for the first time that it is possible to soundly abstract even complex combinational dependencies of arbitrary hardware modules through the assignment of IO ports to one of four new sorts which we call: to-sync, to-port, from-sync, and from-port. This new taxonomy, and the reasoning it enables, facilitates modularity by escalating problematic aspects of module input/output interaction to the language-level interface specification. We formalize and prove the soundness of our new wire sorts, implement them in a practical hardware description language, and demonstrate they can be applied and even inferred automatically at scale. Through an examination of the BaseJump STL, the OpenPiton manycore research platform, and a complete RISC-V implementation, we find that even on our biggest design containing 1.5 million primitive gates, analysis takes less than 31 seconds; that across 172 unique modules analyzed, the inferred sorts are widely distributed across our taxonomy; and that by using wire sorts, our tool is 2.6–33.9x faster at finding loops than standard synthesis-time cycle detection.","PeriodicalId":20557,"journal":{"name":"Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87312286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Filling typed holes with live GUIs 用实时gui填充键入的漏洞

Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation

Pub Date : 2021-06-18 DOI: 10.1145/3453483.3454059

Cyrus Omar, David Moon, Andrew Blinn, Ian Voysey, N. Collins, Ravi Chugh

Text editing is powerful, but some types of expressions are more naturally represented and manipulated graphically. Examples include expressions that compute colors, music, animations, tabular data, plots, diagrams, and other domain-specific data structures. This paper introduces live literals, or livelits, which allow clients to fill holes of types like these by directly manipulating a user-defined GUI embedded persistently into code. Uniquely, livelits are compositional: a livelit GUI can itself embed spliced expressions, which are typed, lexically scoped, and can in turn embed other livelits. Livelits are also uniquely live: a livelit can provide continuous feedback about the run-time implications of the client’s choices even when splices mention bound variables, because the system continuously gathers closures associated with the hole that the livelit is filling. We integrate livelits into Hazel, a live hole-driven programming environment, and describe case studies that exercise these novel capabilities. We then define a simply typed livelit calculus, which specifies how livelits operate as live graphical macros. The metatheory of macro expansion has been mechanized in Agda.

文本编辑功能强大，但某些类型的表达式更自然地表示和图形化处理。示例包括计算颜色、音乐、动画、表格数据、绘图、图表和其他特定于领域的数据结构的表达式。本文介绍了活文字，或活文字，它允许客户端通过直接操作用户定义的GUI来填补这些类型的漏洞，这些GUI持久化嵌入到代码中。独特的是，livelit是组合的:livelit GUI本身可以嵌入拼接的表达式，这些表达式是类型化的，具有词法作用域，并且可以反过来嵌入其他的livelit。livelit也是唯一的live:即使在拼接提到绑定变量时，livelit也可以提供关于客户端选择的运行时含义的连续反馈，因为系统不断收集与livelit正在填充的漏洞相关的闭包。我们将livelits集成到Hazel中，这是一个实时的孔驱动编程环境，并描述了应用这些新功能的案例研究。然后定义一个简单类型的livelit演算，它指定livelit如何作为实时图形宏进行操作。宏观膨胀的元理论在《议程》中被机械化了。

{"title":"Filling typed holes with live GUIs","authors":"Cyrus Omar, David Moon, Andrew Blinn, Ian Voysey, N. Collins, Ravi Chugh","doi":"10.1145/3453483.3454059","DOIUrl":"https://doi.org/10.1145/3453483.3454059","url":null,"abstract":"Text editing is powerful, but some types of expressions are more naturally represented and manipulated graphically. Examples include expressions that compute colors, music, animations, tabular data, plots, diagrams, and other domain-specific data structures. This paper introduces live literals, or livelits, which allow clients to fill holes of types like these by directly manipulating a user-defined GUI embedded persistently into code. Uniquely, livelits are compositional: a livelit GUI can itself embed spliced expressions, which are typed, lexically scoped, and can in turn embed other livelits. Livelits are also uniquely live: a livelit can provide continuous feedback about the run-time implications of the client’s choices even when splices mention bound variables, because the system continuously gathers closures associated with the hole that the livelit is filling. We integrate livelits into Hazel, a live hole-driven programming environment, and describe case studies that exercise these novel capabilities. We then define a simply typed livelit calculus, which specifies how livelits operate as live graphical macros. The metatheory of macro expansion has been mechanized in Agda.","PeriodicalId":20557,"journal":{"name":"Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83093943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Polynomial reachability witnesses via Stellensätze 多项式可达性见证通过Stellensätze

Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation

Pub Date : 2021-06-18 DOI: 10.1145/3453483.3454076

A. Asadi, K. Chatterjee, Hongfei Fu, A. K. Goharshady, Mohammad Mahdavi

We consider the fundamental problem of reachability analysis over imperative programs with real variables. Previous works that tackle reachability are either unable to handle programs consisting of general loops (e.g. symbolic execution), or lack completeness guarantees (e.g. abstract interpretation), or are not automated (e.g. incorrectness logic). In contrast, we propose a novel approach for reachability analysis that can handle general and complex loops, is complete, and can be entirely automated for a wide family of programs. Through the notion of Inductive Reachability Witnesses (IRWs), our approach extends ideas from both invariant generation and termination to reachability analysis. We first show that our IRW-based approach is sound and complete for reachability analysis of imperative programs. Then, we focus on linear and polynomial programs and develop automated methods for synthesizing linear and polynomial IRWs. In the linear case, we follow the well-known approaches using Farkas' Lemma. Our main contribution is in the polynomial case, where we present a push-button semi-complete algorithm. We achieve this using a novel combination of classical theorems in real algebraic geometry, such as Putinar's Positivstellensatz and Hilbert's Strong Nullstellensatz. Finally, our experimental results show we can prove complex reachability objectives over various benchmarks that were beyond the reach of previous methods.

考虑了具有实变量的命令式程序的可达性分析的基本问题。以前解决可达性的工作要么无法处理由一般循环组成的程序(例如符号执行)，要么缺乏完整性保证(例如抽象解释)，要么不是自动化的(例如逻辑不正确)。相比之下，我们提出了一种新的可达性分析方法，它可以处理一般和复杂的循环，是完整的，并且可以完全自动化用于广泛的程序族。通过归纳可达性见证(irw)的概念，我们的方法将不变生成和终止的思想扩展到可达性分析。我们首先表明，我们基于irw的方法对于命令式程序的可达性分析是健全和完整的。然后，我们将重点放在线性和多项式程序上，并开发自动合成线性和多项式irw的方法。在线性情况下，我们使用Farkas引理遵循众所周知的方法。我们的主要贡献是在多项式情况下，我们提出了一个按钮半完全算法。我们使用实际代数几何中的经典定理的新组合来实现这一点，例如Putinar的正stellensatz和Hilbert的强Nullstellensatz。最后，我们的实验结果表明，我们可以在各种基准上证明复杂的可达性目标，这是以前的方法无法达到的。

{"title":"Polynomial reachability witnesses via Stellensätze","authors":"A. Asadi, K. Chatterjee, Hongfei Fu, A. K. Goharshady, Mohammad Mahdavi","doi":"10.1145/3453483.3454076","DOIUrl":"https://doi.org/10.1145/3453483.3454076","url":null,"abstract":"We consider the fundamental problem of reachability analysis over imperative programs with real variables. Previous works that tackle reachability are either unable to handle programs consisting of general loops (e.g. symbolic execution), or lack completeness guarantees (e.g. abstract interpretation), or are not automated (e.g. incorrectness logic). In contrast, we propose a novel approach for reachability analysis that can handle general and complex loops, is complete, and can be entirely automated for a wide family of programs. Through the notion of Inductive Reachability Witnesses (IRWs), our approach extends ideas from both invariant generation and termination to reachability analysis. We first show that our IRW-based approach is sound and complete for reachability analysis of imperative programs. Then, we focus on linear and polynomial programs and develop automated methods for synthesizing linear and polynomial IRWs. In the linear case, we follow the well-known approaches using Farkas' Lemma. Our main contribution is in the polynomial case, where we present a push-button semi-complete algorithm. We achieve this using a novel combination of classical theorems in real algebraic geometry, such as Putinar's Positivstellensatz and Hilbert's Strong Nullstellensatz. Finally, our experimental results show we can prove complex reachability objectives over various benchmarks that were beyond the reach of previous methods.","PeriodicalId":20557,"journal":{"name":"Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87380787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Chianina: an evolving graph system for flow- and context-sensitive analyses of million lines of C code Chianina:一个不断发展的图形系统，用于百万行C代码的流程和上下文敏感分析

Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation

Pub Date : 2021-06-18 DOI: 10.1145/3453483.3454085

Zhiqiang Zuo, Yiyu Zhang, Qiuhong Pan, S. Lu, Yue Li, Linzhang Wang, Xuandong Li, G. Xu

Sophisticated static analysis techniques often have complicated implementations, much of which provides logic for tuning and scaling rather than basic analysis functionalities. This tight coupling of basic algorithms with special treatments for scalability makes an analysis implementation hard to (1) make correct, (2) understand/work with, and (3) reuse for other clients. This paper presents Chianina, a graph system we developed for fully context- and flow-sensitive analysis of large C programs. Chianina overcomes these challenges by allowing the developer to provide only the basic algorithm of an analysis and pushing the tuning/scaling work to the underlying system. Key to the success of Chianina is (1) an evolving graph formulation of flow sensitivity and (2) the leverage of out-of-core, disk support to deal with memory blowup resulting from context sensitivity. We implemented three context- and flow-sensitive analyses on top of Chianina and scaled them to large C programs like Linux (17M LoC) on a single commodity PC.

复杂的静态分析技术通常具有复杂的实现，其中大部分提供了调优和伸缩的逻辑，而不是基本的分析功能。基本算法与可伸缩性的特殊处理之间的这种紧密耦合使得分析实现很难(1)正确，(2)理解/使用，以及(3)为其他客户重用。本文介绍了Chianina，这是我们为大型C程序的上下文和流敏感分析而开发的图形系统。Chianina通过允许开发人员只提供分析的基本算法并将调优/缩放工作推到底层系统来克服这些挑战。Chianina成功的关键是:(1)不断发展的流敏感性图形公式;(2)利用out- core，磁盘支持来处理上下文敏感性导致的内存爆炸。我们在中国实现了三个上下文和流敏感分析，并将它们扩展到大型C程序，如Linux (17M LoC)在单个商用PC上。

{"title":"Chianina: an evolving graph system for flow- and context-sensitive analyses of million lines of C code","authors":"Zhiqiang Zuo, Yiyu Zhang, Qiuhong Pan, S. Lu, Yue Li, Linzhang Wang, Xuandong Li, G. Xu","doi":"10.1145/3453483.3454085","DOIUrl":"https://doi.org/10.1145/3453483.3454085","url":null,"abstract":"Sophisticated static analysis techniques often have complicated implementations, much of which provides logic for tuning and scaling rather than basic analysis functionalities. This tight coupling of basic algorithms with special treatments for scalability makes an analysis implementation hard to (1) make correct, (2) understand/work with, and (3) reuse for other clients. This paper presents Chianina, a graph system we developed for fully context- and flow-sensitive analysis of large C programs. Chianina overcomes these challenges by allowing the developer to provide only the basic algorithm of an analysis and pushing the tuning/scaling work to the underlying system. Key to the success of Chianina is (1) an evolving graph formulation of flow sensitivity and (2) the leverage of out-of-core, disk support to deal with memory blowup resulting from context sensitivity. We implemented three context- and flow-sensitive analyses on top of Chianina and scaled them to large C programs like Linux (17M LoC) on a single commodity PC.","PeriodicalId":20557,"journal":{"name":"Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":"95 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81433707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Revamping hardware persistency models: view-based and axiomatic persistency models for Intel-x86 and Armv8 改进硬件持久性模型:针对Intel-x86和Armv8的基于视图和公理的持久性模型

Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation

Pub Date : 2021-06-18 DOI: 10.1145/3453483.3454027

K. Cho, Sung-Hwan Lee, Azalea Raad, Jeehoon Kang

Non-volatile memory (NVM) is a cutting-edge storage technology that promises the performance of DRAM with the durability of SSD. Recent work has proposed several persistency models for mainstream architectures such as Intel-x86 and Armv8, describing the order in which writes are propagated to NVM. However, these models have several limitations; most notably, they either lack operational models or do not support persistent synchronization patterns. We close this gap by revamping the existing persistency models. First, inspired by the recent work on promising semantics, we propose a unified operational style for describing persistency using views, and develop view-based operational persistency models for Intel-x86 and Armv8, thus presenting the first operational model for Armv8 persistency. Next, we propose a unified axiomatic style for describing hardware persistency, allowing us to recast and repair the existing axiomatic models of Intel-x86 and Armv8 persistency. We prove that our axiomatic models are equivalent to the authoritative semantics reviewed by Intel and Arm engineers. We further prove that each axiomatic hardware persistency model is equivalent to its operational counterpart. Finally, we develop a persistent model checking algorithm and tool, and use it to verify several representative examples.

非易失性内存(Non-volatile memory, NVM)是一种尖端存储技术，它既具有DRAM的性能，又具有SSD的耐用性。最近的工作为主流架构(如Intel-x86和Armv8)提出了几个持久性模型，描述了写操作传播到NVM的顺序。然而，这些模型有一些局限性;最值得注意的是，它们要么缺乏操作模型，要么不支持持久同步模式。我们通过改进现有的持久性模型来缩小这一差距。首先，受到最近关于有前途的语义工作的启发，我们提出了一种统一的操作风格来使用视图描述持久性，并为Intel-x86和Armv8开发了基于视图的操作持久性模型，从而提出了Armv8持久性的第一个操作模型。接下来，我们提出了一个统一的公理风格来描述硬件持久性，允许我们重塑和修复现有的Intel-x86和Armv8持久性的公理模型。我们证明了我们的公理模型与Intel和Arm工程师审查的权威语义是等价的。我们进一步证明了每个公理硬件持久性模型都等价于它的操作性对应模型。最后，我们开发了一个持久模型检验算法和工具，并用它对几个代表性的例子进行了验证。

{"title":"Revamping hardware persistency models: view-based and axiomatic persistency models for Intel-x86 and Armv8","authors":"K. Cho, Sung-Hwan Lee, Azalea Raad, Jeehoon Kang","doi":"10.1145/3453483.3454027","DOIUrl":"https://doi.org/10.1145/3453483.3454027","url":null,"abstract":"Non-volatile memory (NVM) is a cutting-edge storage technology that promises the performance of DRAM with the durability of SSD. Recent work has proposed several persistency models for mainstream architectures such as Intel-x86 and Armv8, describing the order in which writes are propagated to NVM. However, these models have several limitations; most notably, they either lack operational models or do not support persistent synchronization patterns. We close this gap by revamping the existing persistency models. First, inspired by the recent work on promising semantics, we propose a unified operational style for describing persistency using views, and develop view-based operational persistency models for Intel-x86 and Armv8, thus presenting the first operational model for Armv8 persistency. Next, we propose a unified axiomatic style for describing hardware persistency, allowing us to recast and repair the existing axiomatic models of Intel-x86 and Armv8 persistency. We prove that our axiomatic models are equivalent to the authoritative semantics reviewed by Intel and Arm engineers. We further prove that each axiomatic hardware persistency model is equivalent to its operational counterpart. Finally, we develop a persistent model checking algorithm and tool, and use it to verify several representative examples.","PeriodicalId":20557,"journal":{"name":"Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77608559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

IOOpt: automatic derivation of I/O complexity bounds for affine programs IOOpt:为仿射程序自动派生I/O复杂度界限

Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation

Pub Date : 2021-06-18 DOI: 10.1145/3453483.3454103

Auguste Olivry, Guillaume Iooss, Nicolas Tollenaere, A. Rountev, P. Sadayappan, F. Rastello

Evaluating the complexity of an algorithm is an important step when developing applications, as it impacts both its time and energy performance. Computational complexity, which is the number of dynamic operations regardless of the execution order, is easy to characterize for affine programs. Data movement (or, I/O) complexity is more complex to evaluate as it refers, when considering all possible valid schedules, to the minimum required number of I/O between a slow (e.g. main memory) and a fast (e.g. local scratchpad) storage location. This paper presents IOOpt, a fully automated tool that automatically bounds the data movement of an affine (tilable) program. Given a tilable program described in a DSL, it automatically computes: 1. a lower bound of the I/O complexity as a symbolic expression of the cache size and program parameters; 2. an upper bound that allows one to assess the tightness of the lower bound; 3. a tiling recommendation (loop permutation and tile sizes) that matches the upper bound. For the lower bound algorithm which can be applied to any affine program, a substantial effort has been made to provide bounds that are as tight as possible for neural networks: In particular, it extends the previous work of Olivry et al. to handle multi-dimensional reductions and expose the constraints associated with small dimensions that are present in convolutions. For the upper bound algorithm that reasons on the tile band of the program (e.g. output of a polyhedral compiler such as PluTo), the algebraic computations involved have been tuned to behave well on tensor computations such as direct tensor contractions or direct convolutions. As a bonus, the upper bound algorithm that has been extended to multi-level cache can provide the programmer with a useful tiling recommendation. We demonstrate the effectiveness of our tool by deriving the symbolic lower and upper bounds for several tensor contraction and convolution kernels. Then we evaluate numerically the tightness of our bound using the convolution layers of Yolo9000 and representative tensor contractions from the TCCG benchmark suite. Finally, we show the pertinence of our I/O complexity model by reporting the running time of the recommended tiled code for the convolution layers of Yolo9000.

在开发应用程序时，评估算法的复杂性是一个重要步骤，因为它会影响其时间和能量性能。计算复杂性，即动态操作的数量，与执行顺序无关，很容易表征仿射程序。当考虑所有可能的有效调度时，数据移动(或I/O)复杂性的评估更加复杂，因为它指的是慢速(例如主存)和快速(例如本地刮擦板)存储位置之间所需的最小I/O数量。本文介绍了IOOpt，一个全自动工具，可以自动限制仿射(可调)程序的数据移动。给定一个用DSL描述的可编程程序，它会自动计算:I/O复杂度的下界，作为缓存大小和程序参数的符号表达式;2. 上界:允许人们评估下界的严密性的上界;3.匹配上界的平铺建议(循环排列和平铺大小)。对于可以应用于任何仿射程序的下界算法，已经做出了大量的努力来为神经网络提供尽可能紧密的边界:特别是，它扩展了Olivry等人的先前工作，以处理多维约简并暴露与卷积中存在的小维相关的约束。对于在程序的条带上进行推理的上界算法(例如，PluTo等多面体编译器的输出)，所涉及的代数计算已被调整为在张量计算(如直接张量收缩或直接卷积)上表现良好。作为奖励，上界算法已经扩展到多级缓存，可以为程序员提供有用的平铺建议。我们通过推导几个张量收缩和卷积核的符号下界和上界来证明我们的工具的有效性。然后，我们使用Yolo9000的卷积层和TCCG基准套件的代表性张量收缩来数值评估边界的紧密性。最后，我们通过报告Yolo9000的卷积层的推荐平铺代码的运行时间来展示我们的I/O复杂性模型的相关性。

{"title":"IOOpt: automatic derivation of I/O complexity bounds for affine programs","authors":"Auguste Olivry, Guillaume Iooss, Nicolas Tollenaere, A. Rountev, P. Sadayappan, F. Rastello","doi":"10.1145/3453483.3454103","DOIUrl":"https://doi.org/10.1145/3453483.3454103","url":null,"abstract":"Evaluating the complexity of an algorithm is an important step when developing applications, as it impacts both its time and energy performance. Computational complexity, which is the number of dynamic operations regardless of the execution order, is easy to characterize for affine programs. Data movement (or, I/O) complexity is more complex to evaluate as it refers, when considering all possible valid schedules, to the minimum required number of I/O between a slow (e.g. main memory) and a fast (e.g. local scratchpad) storage location. This paper presents IOOpt, a fully automated tool that automatically bounds the data movement of an affine (tilable) program. Given a tilable program described in a DSL, it automatically computes: 1. a lower bound of the I/O complexity as a symbolic expression of the cache size and program parameters; 2. an upper bound that allows one to assess the tightness of the lower bound; 3. a tiling recommendation (loop permutation and tile sizes) that matches the upper bound. For the lower bound algorithm which can be applied to any affine program, a substantial effort has been made to provide bounds that are as tight as possible for neural networks: In particular, it extends the previous work of Olivry et al. to handle multi-dimensional reductions and expose the constraints associated with small dimensions that are present in convolutions. For the upper bound algorithm that reasons on the tile band of the program (e.g. output of a polyhedral compiler such as PluTo), the algebraic computations involved have been tuned to behave well on tensor computations such as direct tensor contractions or direct convolutions. As a bonus, the upper bound algorithm that has been extended to multi-level cache can provide the programmer with a useful tiling recommendation. We demonstrate the effectiveness of our tool by deriving the symbolic lower and upper bounds for several tensor contraction and convolution kernels. Then we evaluate numerically the tightness of our bound using the convolution layers of Yolo9000 and representative tensor contractions from the TCCG benchmark suite. Finally, we show the pertinence of our I/O complexity model by reporting the running time of the recommended tiled code for the convolution layers of Yolo9000.","PeriodicalId":20557,"journal":{"name":"Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89705128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7