Proceedings of the ACM on Programming Languages最新文献_第9页

AST vs. Bytecode: Interpreters in the Age of Meta-Compilation AST与字节码:元编译时代的解释器

Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Proceedings of the ACM on Programming Languages

Pub Date : 2023-10-16 DOI: 10.1145/3622808

Octave Larose, Sophie Kaleba, Humphrey Burchell, Stefan Marr

Thanks to partial evaluation and meta-tracing, it became practical to build language implementations that reach state-of-the-art peak performance by implementing only an interpreter. Systems such as RPython and GraalVM provide components such as a garbage collector and just-in-time compiler in a language-agnostic manner, greatly reducing implementation effort. However, meta-compilation-based language implementations still need to improve further to reach the low memory use and fast warmup behavior that custom-built systems provide. A key element in this endeavor is interpreter performance. Folklore tells us that bytecode interpreters are superior to abstract-syntax-tree (AST) interpreters both in terms of memory use and run-time performance. This work assesses the trade-offs between AST and bytecode interpreters to verify common assumptions and whether they hold in the context of meta-compilation systems. We implemented four interpreters, each an AST and a bytecode one using RPython and GraalVM. We keep the difference between the interpreters as small as feasible to be able to evaluate interpreter performance, peak performance, warmup, memory use, and the impact of individual optimizations. Our results show that both systems indeed reach performance close to Node.js/V8. Looking at interpreter-only performance, our AST interpreters are on par with, or even slightly faster than their bytecode counterparts. After just-in-time compilation, the results are roughly on par. This means bytecode interpreters do not have their widely assumed performance advantage. However, we can confirm that bytecodes are more compact in memory than ASTs, which becomes relevant for larger applications. However, for smaller applications, we noticed that bytecode interpreters allocate more memory because boxing avoidance is not as applicable, and because the bytecode interpreter structure requires memory, e.g., for a reified stack. Our results show AST interpreters to be competitive on top of meta-compilation systems. Together with possible engineering benefits, they should thus not be discounted so easily in favor of bytecode interpreters.

由于部分求值和元跟踪，通过仅实现一个解释器来构建达到最高性能的语言实现变得切实可行。像RPython和GraalVM这样的系统以语言无关的方式提供了垃圾收集器和即时编译器等组件，大大减少了实现的工作量。然而，基于元编译的语言实现仍然需要进一步改进，以达到定制系统提供的低内存使用和快速预热行为。这方面的一个关键因素是解释器的性能。民间传说告诉我们，字节码解释器在内存使用和运行时性能方面都优于抽象语法树(AST)解释器。这项工作评估AST和字节码解释器之间的权衡，以验证常见的假设，以及它们是否适用于元编译系统的上下文中。我们使用RPython和GraalVM实现了四个解释器，每个解释器是AST和字节码解释器。我们尽量减小解释器之间的差异，以便能够评估解释器性能、峰值性能、预热、内存使用以及单个优化的影响。我们的结果表明，这两个系统的性能确实接近Node.js/V8。从纯解释器的性能来看，我们的AST解释器与字节码解释器相当，甚至略快。在即时编译之后，结果大致相当。这意味着字节码解释器不具有人们普遍认为的性能优势。但是，我们可以确认字节码在内存中比ast更紧凑，这与较大的应用程序相关。然而，对于较小的应用程序，我们注意到字节码解释器分配更多的内存，因为避免装箱不适用，因为字节码解释器结构需要内存，例如，用于具体化的堆栈。我们的结果表明AST解释器在元编译系统之上具有竞争力。考虑到可能的工程优势，它们不应因此被轻易地贬低为字节码解释器。

{"title":"AST vs. Bytecode: Interpreters in the Age of Meta-Compilation","authors":"Octave Larose, Sophie Kaleba, Humphrey Burchell, Stefan Marr","doi":"10.1145/3622808","DOIUrl":"https://doi.org/10.1145/3622808","url":null,"abstract":"Thanks to partial evaluation and meta-tracing, it became practical to build language implementations that reach state-of-the-art peak performance by implementing only an interpreter. Systems such as RPython and GraalVM provide components such as a garbage collector and just-in-time compiler in a language-agnostic manner, greatly reducing implementation effort. However, meta-compilation-based language implementations still need to improve further to reach the low memory use and fast warmup behavior that custom-built systems provide. A key element in this endeavor is interpreter performance. Folklore tells us that bytecode interpreters are superior to abstract-syntax-tree (AST) interpreters both in terms of memory use and run-time performance. This work assesses the trade-offs between AST and bytecode interpreters to verify common assumptions and whether they hold in the context of meta-compilation systems. We implemented four interpreters, each an AST and a bytecode one using RPython and GraalVM. We keep the difference between the interpreters as small as feasible to be able to evaluate interpreter performance, peak performance, warmup, memory use, and the impact of individual optimizations. Our results show that both systems indeed reach performance close to Node.js/V8. Looking at interpreter-only performance, our AST interpreters are on par with, or even slightly faster than their bytecode counterparts. After just-in-time compilation, the results are roughly on par. This means bytecode interpreters do not have their widely assumed performance advantage. However, we can confirm that bytecodes are more compact in memory than ASTs, which becomes relevant for larger applications. However, for smaller applications, we noticed that bytecode interpreters allocate more memory because boxing avoidance is not as applicable, and because the bytecode interpreter structure requires memory, e.g., for a reified stack. Our results show AST interpreters to be competitive on top of meta-compilation systems. Together with possible engineering benefits, they should thus not be discounted so easily in favor of bytecode interpreters.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"184 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136112799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Cocktail Approach to Practical Call Graph Construction 实用调用图构建的鸡尾酒方法

Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Proceedings of the ACM on Programming Languages

Pub Date : 2023-10-16 DOI: 10.1145/3622833

Yuandao Cai, Charles Zhang

After decades of research, constructing call graphs for modern C-based software remains either imprecise or inefficient when scaling up to the ever-growing complexity. The main culprit is the difficulty of resolving function pointers, as precise pointer analyses are cubic in nature and become exponential when considering calling contexts. This paper takes a practical stance by first conducting a comprehensive empirical study of function pointer manipulations in the wild. By investigating 5355 indirect calls in five popular open-source systems, we conclude that, instead of the past uniform treatments for function pointers, a cocktail approach can be more effective in “squeezing” the number of difficult pointers to a minimum using a potpourri of cheap methods. In particular, we decompose the costs of constructing highly precise call graphs of big code by tailoring several increasingly precise algorithms and synergizing them into a concerted workflow. As a result, many indirect calls can be precisely resolved in an efficient and principled fashion, thereby reducing the final, expensive refinements. This is, in spirit, similar to the well-known cocktail medical therapy. The results are encouraging — our implemented prototype called Coral can achieve similar precision versus the previous field-, flow-, and context-sensitive Andersen-style call graph construction, yet scale up to millions of lines of code for the first time, to the best of our knowledge. Moreover, by evaluating the produced call graphs through the lens of downstream clients (i.e., use-after-free detection, thin slicing, and directed grey-box fuzzing), the results show that Coral can dramatically improve their effectiveness for better vulnerability hunting, understanding, and reproduction. More excitingly, we found twelve confirmed bugs (six impacted by indirect calls) in popular systems (e.g., MariaDB), spreading across multiple historical versions.

经过几十年的研究，为现代基于c语言的软件构建调用图在扩展到不断增长的复杂性时仍然不精确或效率低下。罪魁祸首是解析函数指针的困难，因为精确的指针分析本质上是立方的，而在考虑调用上下文时就会变成指数的。本文从实际出发，首先对函数指针操作进行了全面的实证研究。通过调查五个流行的开源系统中的5355个间接调用，我们得出结论，与过去对函数指针的统一处理不同，混合方法可以更有效地将困难指针的数量“压缩”到最小，使用各种廉价的方法。特别是，我们通过剪裁几个越来越精确的算法并将它们协同到一个协调的工作流中，来分解构建大代码的高精度调用图的成本。因此，许多间接调用可以以有效和有原则的方式精确地解决，从而减少了最终的、昂贵的改进。这在精神上类似于众所周知的鸡尾酒疗法。结果是令人鼓舞的——我们实现的名为Coral的原型与之前的字段、流程和上下文敏感的andersen风格的调用图构造相比，可以达到类似的精度，但据我们所知，这是第一次扩展到数百万行代码。此外，通过下游客户端(即使用后免费检测，薄切片和定向灰盒模糊)的视角评估生成的调用图，结果表明，Coral可以显着提高其有效性，以便更好地寻找，理解和复制漏洞。更令人兴奋的是，我们在流行的系统(例如MariaDB)中发现了12个已确认的漏洞(其中6个受到间接调用的影响)，分布在多个历史版本中。

{"title":"A Cocktail Approach to Practical Call Graph Construction","authors":"Yuandao Cai, Charles Zhang","doi":"10.1145/3622833","DOIUrl":"https://doi.org/10.1145/3622833","url":null,"abstract":"After decades of research, constructing call graphs for modern C-based software remains either imprecise or inefficient when scaling up to the ever-growing complexity. The main culprit is the difficulty of resolving function pointers, as precise pointer analyses are cubic in nature and become exponential when considering calling contexts. This paper takes a practical stance by first conducting a comprehensive empirical study of function pointer manipulations in the wild. By investigating 5355 indirect calls in five popular open-source systems, we conclude that, instead of the past uniform treatments for function pointers, a cocktail approach can be more effective in “squeezing” the number of difficult pointers to a minimum using a potpourri of cheap methods. In particular, we decompose the costs of constructing highly precise call graphs of big code by tailoring several increasingly precise algorithms and synergizing them into a concerted workflow. As a result, many indirect calls can be precisely resolved in an efficient and principled fashion, thereby reducing the final, expensive refinements. This is, in spirit, similar to the well-known cocktail medical therapy. The results are encouraging — our implemented prototype called Coral can achieve similar precision versus the previous field-, flow-, and context-sensitive Andersen-style call graph construction, yet scale up to millions of lines of code for the first time, to the best of our knowledge. Moreover, by evaluating the produced call graphs through the lens of downstream clients (i.e., use-after-free detection, thin slicing, and directed grey-box fuzzing), the results show that Coral can dramatically improve their effectiveness for better vulnerability hunting, understanding, and reproduction. More excitingly, we found twelve confirmed bugs (six impacted by indirect calls) in popular systems (e.g., MariaDB), spreading across multiple historical versions.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"36 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136114711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Synthesizing Efficient Memoization Algorithms 综合高效记忆算法

Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Proceedings of the ACM on Programming Languages

Pub Date : 2023-10-16 DOI: 10.1145/3622800

Yican Sun, Xuanyu Peng, Yingfei Xiong

In this paper, we propose an automated approach to finding correct and efficient memoization algorithms from a given declarative specification. This problem has two major challenges: (i) a memoization algorithm is too large to be handled by conventional program synthesizers; (ii) we need to guarantee the efficiency of the memoization algorithm. To address this challenge, we structure the synthesis of memoization algorithms by introducing the local objective function and the memoization partition function and reduce the synthesis task to two smaller independent program synthesis tasks. Moreover, the number of distinct outputs of the function synthesized in the second synthesis task also decides the efficiency of the synthesized memoization algorithm, and we only need to minimize the number of different output values of the synthesized function. However, the generated synthesis task is still too complex for existing synthesizers. Thus, we propose a novel synthesis algorithm that combines the deductive and inductive methods to solve these tasks. To evaluate our algorithm, we collect 42 real-world benchmarks from Leetcode, the National Olympiad in Informatics in Provinces-Junior (a national-wide algorithmic programming contest in China), and previous approaches. Our approach successfully synhesizes 39/42 problems in a reasonable time, outperforming the baselines.

在本文中，我们提出了一种自动化的方法来从给定的声明性规范中找到正确和有效的记忆算法。这个问题有两个主要的挑战:(i)记忆算法太大，传统的程序合成器无法处理;(ii)我们需要保证记忆算法的效率。为了解决这一挑战，我们通过引入局部目标函数和记忆配分函数来构建记忆算法的综合，并将综合任务简化为两个较小的独立程序综合任务。而且，在第二个合成任务中合成的函数不同输出的个数也决定了合成记忆算法的效率，我们只需要最小化合成函数不同输出值的个数就可以了。然而，生成的合成任务对于现有的合成器来说仍然过于复杂。因此，我们提出了一种新的综合算法，结合演绎和归纳方法来解决这些任务。为了评估我们的算法，我们收集了42个真实世界的基准，这些基准来自Leetcode、全国省级信息学奥林匹克竞赛(中国的全国性算法编程竞赛)和以前的方法。我们的方法在合理的时间内成功地综合了39/42个问题，优于基线。

{"title":"Synthesizing Efficient Memoization Algorithms","authors":"Yican Sun, Xuanyu Peng, Yingfei Xiong","doi":"10.1145/3622800","DOIUrl":"https://doi.org/10.1145/3622800","url":null,"abstract":"In this paper, we propose an automated approach to finding correct and efficient memoization algorithms from a given declarative specification. This problem has two major challenges: (i) a memoization algorithm is too large to be handled by conventional program synthesizers; (ii) we need to guarantee the efficiency of the memoization algorithm. To address this challenge, we structure the synthesis of memoization algorithms by introducing the local objective function and the memoization partition function and reduce the synthesis task to two smaller independent program synthesis tasks. Moreover, the number of distinct outputs of the function synthesized in the second synthesis task also decides the efficiency of the synthesized memoization algorithm, and we only need to minimize the number of different output values of the synthesized function. However, the generated synthesis task is still too complex for existing synthesizers. Thus, we propose a novel synthesis algorithm that combines the deductive and inductive methods to solve these tasks. To evaluate our algorithm, we collect 42 real-world benchmarks from Leetcode, the National Olympiad in Informatics in Provinces-Junior (a national-wide algorithmic programming contest in China), and previous approaches. Our approach successfully synhesizes 39/42 problems in a reasonable time, outperforming the baselines.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136115200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Compositional Verification of Efficient Masking Countermeasures against Side-Channel Attacks 有效掩码对抗侧信道攻击的组成验证

Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Proceedings of the ACM on Programming Languages

Pub Date : 2023-10-16 DOI: 10.1145/3622862

Pengfei Gao, Yedi Zhang, Fu Song, Taolue Chen, Francois-Xavier Standaert

Masking is one of the most effective countermeasures for securely implementing cryptographic algorithms against power side-channel attacks, the design of which however turns out to be intricate and error-prone. While techniques have been proposed to rigorously verify implementations of cryptographic algorithms, currently they are limited in scalability. To address this issue, compositional approaches have been investigated, but insofar they fail to prove the security of recent efficient implementations. To fill this gap, we propose a novel compositional verification approach. In particular, we introduce two new language-level security notions based on which we propose composition strategies and verification algorithms. Our approach is able to prove efficient implementations, which cannot be done by prior compositional approaches. We implement our approach as a tool CONVINCE and conduct extensive experiments to confirm its efficacy. We also use CONVINCE to further explore the design space of the AES Sbox with least refreshing by replacing its implementation for finite-field multiplication with more efficient counterparts. We automatically prove leakage-freeness of these new versions. As a result, we can effectively reduce 1,600 randomness and 3,200 XOR-operations of the state-of-the-art AES implementation.

掩蔽是安全实现加密算法抵御功率侧信道攻击的最有效对策之一，但其设计复杂且容易出错。虽然已经提出了严格验证加密算法实现的技术，但目前它们在可扩展性方面受到限制。为了解决这个问题，组合方法已经被研究过了，但是到目前为止，它们还不能证明最近有效实现的安全性。为了填补这一空白，我们提出了一种新的成分验证方法。我们特别介绍了两个新的语言级安全概念，并在此基础上提出了组合策略和验证算法。我们的方法能够证明有效的实现，这是以前的组合方法无法做到的。我们将我们的方法作为一种工具来实施，并进行了广泛的实验来证实其有效性。我们还使用说服进一步探索AES Sbox的设计空间，以最少的刷新，用更有效的对等体替换其有限域乘法的实现。我们自动证明这些新版本的无泄漏性。因此，我们可以有效地减少最先进的AES实现的1600个随机性和3200个异或操作。

{"title":"Compositional Verification of Efficient Masking Countermeasures against Side-Channel Attacks","authors":"Pengfei Gao, Yedi Zhang, Fu Song, Taolue Chen, Francois-Xavier Standaert","doi":"10.1145/3622862","DOIUrl":"https://doi.org/10.1145/3622862","url":null,"abstract":"Masking is one of the most effective countermeasures for securely implementing cryptographic algorithms against power side-channel attacks, the design of which however turns out to be intricate and error-prone. While techniques have been proposed to rigorously verify implementations of cryptographic algorithms, currently they are limited in scalability. To address this issue, compositional approaches have been investigated, but insofar they fail to prove the security of recent efficient implementations. To fill this gap, we propose a novel compositional verification approach. In particular, we introduce two new language-level security notions based on which we propose composition strategies and verification algorithms. Our approach is able to prove efficient implementations, which cannot be done by prior compositional approaches. We implement our approach as a tool CONVINCE and conduct extensive experiments to confirm its efficacy. We also use CONVINCE to further explore the design space of the AES Sbox with least refreshing by replacing its implementation for finite-field multiplication with more efficient counterparts. We automatically prove leakage-freeness of these new versions. As a result, we can effectively reduce 1,600 randomness and 3,200 XOR-operations of the state-of-the-art AES implementation.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"74 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136115985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Equality Saturation Theory Exploration à la Carte 平等饱和理论探讨，la Carte

Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Proceedings of the ACM on Programming Languages

Pub Date : 2023-10-16 DOI: 10.1145/3622834

Anjali Pal, Brett Saiki, Ryan Tjoa, Cynthia Richey, Amy Zhu, Oliver Flatt, Max Willsey, Zachary Tatlock, Chandrakana Nandi

Rewrite rules are critical in equality saturation, an increasingly popular technique in optimizing compilers, synthesizers, and verifiers. Unfortunately, developing high-quality rulesets is difficult and error-prone. Recent work on automatically inferring rewrite rules does not scale to large terms or grammars, and existing rule inference tools are monolithic and opaque. Equality saturation users therefore struggle to guide inference and incrementally construct rulesets. As a result, most users still manually develop and maintain rulesets. This paper proposes Enumo, a new domain-specific language for programmable theory exploration. Enumo provides a small set of core operators that enable users to strategically guide rule inference and incrementally build rulesets. Short Enumo programs easily replicate results from state-of-the-art tools, but Enumo programs can also scale to infer deeper rules from larger grammars than prior approaches. Its composable operators even facilitate developing new strategies for ruleset inference. We introduce a new fast-forwarding strategy that does not require evaluating terms in the target language, and can thus support domains that were out of scope for prior work. We evaluate Enumo and fast-forwarding across a variety of domains. Compared to state-of-the-art techniques, enumo can synthesize better rulesets over a diverse set of domains, in some cases matching the effects of manually-developed rulesets in systems driven by equality saturation.

重写规则对于均衡性饱和至关重要，均衡性饱和是优化编译器、合成器和验证器的一种日益流行的技术。不幸的是，开发高质量的规则集很困难，而且容易出错。最近关于自动推断重写规则的工作没有扩展到大的术语或语法，现有的规则推断工具是单一的和不透明的。因此，等式饱和的用户很难引导推理并逐步构建规则集。因此，大多数用户仍然手动开发和维护规则集。本文提出了一种新的领域专用语言Enumo，用于可编程理论的探索。Enumo提供了一小组核心操作符，使用户能够策略性地指导规则推理并增量地构建规则集。简短的Enumo程序很容易复制最先进的工具的结果，但是Enumo程序也可以扩展，从更大的语法中推断出比以前的方法更深的规则。它的可组合运算符甚至有助于开发用于规则集推理的新策略。我们引入了一种新的快速转发策略，它不需要评估目标语言中的术语，因此可以支持以前工作范围之外的领域。我们评估Enumo和跨多个域的快速转发。与最先进的技术相比，enumo可以在不同的域集上合成更好的规则集，在某些情况下，与由相等饱和驱动的系统中手动开发的规则集的效果相匹配。

{"title":"Equality Saturation Theory Exploration à la Carte","authors":"Anjali Pal, Brett Saiki, Ryan Tjoa, Cynthia Richey, Amy Zhu, Oliver Flatt, Max Willsey, Zachary Tatlock, Chandrakana Nandi","doi":"10.1145/3622834","DOIUrl":"https://doi.org/10.1145/3622834","url":null,"abstract":"Rewrite rules are critical in equality saturation, an increasingly popular technique in optimizing compilers, synthesizers, and verifiers. Unfortunately, developing high-quality rulesets is difficult and error-prone. Recent work on automatically inferring rewrite rules does not scale to large terms or grammars, and existing rule inference tools are monolithic and opaque. Equality saturation users therefore struggle to guide inference and incrementally construct rulesets. As a result, most users still manually develop and maintain rulesets. This paper proposes Enumo, a new domain-specific language for programmable theory exploration. Enumo provides a small set of core operators that enable users to strategically guide rule inference and incrementally build rulesets. Short Enumo programs easily replicate results from state-of-the-art tools, but Enumo programs can also scale to infer deeper rules from larger grammars than prior approaches. Its composable operators even facilitate developing new strategies for ruleset inference. We introduce a new fast-forwarding strategy that does not require evaluating terms in the target language, and can thus support domains that were out of scope for prior work. We evaluate Enumo and fast-forwarding across a variety of domains. Compared to state-of-the-art techniques, enumo can synthesize better rulesets over a diverse set of domains, in some cases matching the effects of manually-developed rulesets in systems driven by equality saturation.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"171 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136115987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Secure RDTs: Enforcing Access Control Policies for Offline Available JSON Data 安全rdt:对离线可用JSON数据实施访问控制策略

Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Proceedings of the ACM on Programming Languages

Pub Date : 2023-10-16 DOI: 10.1145/3622802

Thierry Renaux, Sam Van den Vonder, Wolfgang De Meuter

Replicated Data Types (RDTs) are a type of data structure that can be replicated over a network, where each replica can be kept (eventually) consistent with the other replicas. They are used in applications with intermittent network connectivity, since local (offline) edits can later be merged with the other replicas. Applications that want to use RDTs often have an inherent security component that restricts data access for certain clients. However, access control for RDTs is difficult to enforce for clients that are not running within a secure environment, e.g., web applications where the client-side software can be freely tampered with. In essence, an application cannot prevent a client from reading data which they are not supposed to read, and any malicious changes will also affect well-behaved clients. This paper proposes Secure RDTs (SRDTs), a data type that specifies role-based access control for offline-available JSON data. In brief, a trusted application server specifies a security policy based on roles with read and write privileges for certain fields of an SRDT. The server enforces read privileges by projecting the data and security policy to omit any non-readable fields for the user's given role, and it acts as an intermediary to enforce write privileges. The approach is presented as an operational semantics engineered in PLT Redex, which is validated by formal proofs and randomised testing in Redex to ensure that the formal specification is secure.

复制数据类型(rdt)是一种可以在网络上复制的数据结构，其中每个副本可以(最终)与其他副本保持一致。它们用于具有间歇性网络连接的应用程序，因为本地(脱机)编辑以后可以与其他副本合并。想要使用rdt的应用程序通常有一个固有的安全组件来限制某些客户机的数据访问。然而，对于没有在安全环境中运行的客户端(例如，客户端软件可以自由篡改的web应用程序)，rdt的访问控制很难实施。从本质上讲，应用程序无法阻止客户端读取它们不应该读取的数据，任何恶意更改也会影响行为良好的客户端。本文提出了安全rdt (srdt)，这是一种数据类型，为离线可用的JSON数据指定基于角色的访问控制。简而言之，受信任的应用服务器根据对SRDT的某些字段具有读写权限的角色指定安全策略。服务器通过投射数据和安全策略来强制读取特权，以省略用户给定角色的任何不可读字段，并且它充当强制写入特权的中介。该方法在PLT Redex中作为操作语义提出，并通过Redex中的形式证明和随机测试进行验证，以确保形式规范的安全性。

{"title":"Secure RDTs: Enforcing Access Control Policies for Offline Available JSON Data","authors":"Thierry Renaux, Sam Van den Vonder, Wolfgang De Meuter","doi":"10.1145/3622802","DOIUrl":"https://doi.org/10.1145/3622802","url":null,"abstract":"Replicated Data Types (RDTs) are a type of data structure that can be replicated over a network, where each replica can be kept (eventually) consistent with the other replicas. They are used in applications with intermittent network connectivity, since local (offline) edits can later be merged with the other replicas. Applications that want to use RDTs often have an inherent security component that restricts data access for certain clients. However, access control for RDTs is difficult to enforce for clients that are not running within a secure environment, e.g., web applications where the client-side software can be freely tampered with. In essence, an application cannot prevent a client from reading data which they are not supposed to read, and any malicious changes will also affect well-behaved clients. This paper proposes Secure RDTs (SRDTs), a data type that specifies role-based access control for offline-available JSON data. In brief, a trusted application server specifies a security policy based on roles with read and write privileges for certain fields of an SRDT. The server enforces read privileges by projecting the data and security policy to omit any non-readable fields for the user's given role, and it acts as an intermediary to enforce write privileges. The approach is presented as an operational semantics engineered in PLT Redex, which is validated by formal proofs and randomised testing in Redex to ensure that the formal specification is secure.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136116376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Synthesizing Precise Static Analyzers for Automatic Differentiation 用于自动微分的精密静态分析仪的合成

Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Proceedings of the ACM on Programming Languages

Pub Date : 2023-10-16 DOI: 10.1145/3622867

Jacob Laurel, Siyuan Brant Qian, Gagandeep Singh, Sasa Misailovic

We present Pasado, a technique for synthesizing precise static analyzers for Automatic Differentiation. Our technique allows one to automatically construct a static analyzer specialized for the Chain Rule, Product Rule, and Quotient Rule computations for Automatic Differentiation in a way that abstracts all of the nonlinear operations of each respective rule simultaneously. By directly synthesizing an abstract transformer for the composite expressions of these 3 most common rules of AD, we are able to obtain significant precision improvement compared to prior works which compose standard abstract transformers together suboptimally. We prove our synthesized static analyzers sound and additionally demonstrate the generality of our approach by instantiating these AD static analyzers with different nonlinear functions, different abstract domains (both intervals and zonotopes) and both forward-mode and reverse-mode AD. We evaluate Pasado on multiple case studies, namely soundly computing bounds on a neural network’s local Lipschitz constant, soundly bounding the sensitivities of financial models, certifying monotonicity, and lastly, bounding sensitivities of the solutions of differential equations from climate science and chemistry for verified ranges of initial conditions and parameters. The local Lipschitz constants computed by Pasado on our largest CNN are up to 2750× more precise compared to the existing state-of-the-art zonotope analysis. The bounds obtained on the sensitivities of the climate, chemical, and financial differential equation solutions are between 1.31 − 2.81× more precise (on average) compared to a state-of-the-art zonotope analysis.

我们提出了Pasado，一种合成精确静态分析仪的技术，用于自动微分。我们的技术允许自动构建一个静态分析器，专门用于自动微分的链式规则、乘积规则和商规则计算，以一种同时抽象每个规则的所有非线性操作的方式。通过将这3个最常见的AD规则的复合表达式直接合成一个抽象变压器，与以往将标准抽象变压器以次优方式组合在一起的工作相比，我们可以获得显著的精度提高。我们通过实例化这些具有不同非线性函数、不同抽象域(包括区间和带拓扑)以及前向模式和反向模式AD的AD静态分析器，证明了我们的合成静态分析器是合理的，并进一步证明了我们方法的通用性。我们在多个案例研究中评估Pasado，即在神经网络的局部Lipschitz常数上合理地计算边界，合理地限定金融模型的敏感性，证明单调性，最后，在初始条件和参数的验证范围内，气候科学和化学微分方程解的边界敏感性。Pasado在我们最大的CNN上计算的局部Lipschitz常数比现有最先进的分区分析精确2750倍。与最先进的臭氧分析相比，气候、化学和金融微分方程解的灵敏度范围(平均)在1.31 - 2.81倍之间。

{"title":"Synthesizing Precise Static Analyzers for Automatic Differentiation","authors":"Jacob Laurel, Siyuan Brant Qian, Gagandeep Singh, Sasa Misailovic","doi":"10.1145/3622867","DOIUrl":"https://doi.org/10.1145/3622867","url":null,"abstract":"We present Pasado, a technique for synthesizing precise static analyzers for Automatic Differentiation. Our technique allows one to automatically construct a static analyzer specialized for the Chain Rule, Product Rule, and Quotient Rule computations for Automatic Differentiation in a way that abstracts all of the nonlinear operations of each respective rule simultaneously. By directly synthesizing an abstract transformer for the composite expressions of these 3 most common rules of AD, we are able to obtain significant precision improvement compared to prior works which compose standard abstract transformers together suboptimally. We prove our synthesized static analyzers sound and additionally demonstrate the generality of our approach by instantiating these AD static analyzers with different nonlinear functions, different abstract domains (both intervals and zonotopes) and both forward-mode and reverse-mode AD. We evaluate Pasado on multiple case studies, namely soundly computing bounds on a neural network’s local Lipschitz constant, soundly bounding the sensitivities of financial models, certifying monotonicity, and lastly, bounding sensitivities of the solutions of differential equations from climate science and chemistry for verified ranges of initial conditions and parameters. The local Lipschitz constants computed by Pasado on our largest CNN are up to 2750× more precise compared to the existing state-of-the-art zonotope analysis. The bounds obtained on the sensitivities of the climate, chemical, and financial differential equation solutions are between 1.31 − 2.81× more precise (on average) compared to a state-of-the-art zonotope analysis.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"428 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136078032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Run-Time Prevention of Software Integration Failures of Machine Learning APIs 机器学习api软件集成故障的运行时预防

Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Proceedings of the ACM on Programming Languages

Pub Date : 2023-10-16 DOI: 10.1145/3622806

Chengcheng Wan, Yuhan Liu, Kuntai Du, Henry Hoffmann, Junchen Jiang, Michael Maire, Shan Lu

Due to the under-specified interfaces, developers face challenges in correctly integrating machine learning (ML) APIs in software. Even when the ML API and the software are well designed on their own, the resulting application misbehaves when the API output is incompatible with the software. It is desirable to have an adapter that converts ML API output at runtime to better fit the software need and prevent integration failures. In this paper, we conduct an empirical study to understand ML API integration problems in real-world applications. Guided by this study, we present SmartGear, a tool that automatically detects and converts mismatching or incorrect ML API output at run time, serving as a middle layer between ML API and software. Our evaluation on a variety of open-source applications shows that SmartGear detects 70% incompatible API outputs and prevents 67% potential integration failures, outperforming alternative solutions.

由于未指定的接口，开发人员在正确地将机器学习(ML) api集成到软件中面临挑战。即使ML API和软件本身都设计得很好，当API输出与软件不兼容时，最终的应用程序也会出现错误行为。希望有一个适配器在运行时转换ML API输出，以更好地适应软件需求并防止集成失败。在本文中，我们进行了一项实证研究，以了解现实应用中的ML API集成问题。在这项研究的指导下，我们提出了SmartGear，一个在运行时自动检测和转换不匹配或不正确的ML API输出的工具，作为ML API和软件之间的中间层。我们对各种开源应用程序的评估表明，SmartGear可以检测到70%的不兼容API输出，并防止67%的潜在集成故障，优于其他解决方案。

引用次数: 0

Mobius: Synthesizing Relational Queries with Recursive and Invented Predicates Mobius:用递归谓词和虚构谓词综合关系查询

Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Proceedings of the ACM on Programming Languages

Pub Date : 2023-10-16 DOI: 10.1145/3622847

Aalok Thakkar, Nathaniel Sands, George Petrou, Rajeev Alur, Mayur Naik, Mukund Raghothaman

Synthesizing relational queries from data is challenging in the presence of recursion and invented predicates. We propose a fully automated approach to synthesize such queries. Our approach comprises of two steps: it first synthesizes a non-recursive query consistent with the given data, and then identifies recursion schemes in it and thereby generalizes to arbitrary data. This generalization is achieved by an iterative predicate unification procedure which exploits the notion of data provenance to accelerate convergence. In each iteration of the procedure, a constraint solver proposes a candidate query, and a query evaluator checks if the proposed program is consistent with the given data. The data provenance for a failed query allows us to construct additional constraints for the constraint solver and refine the search. We have implemented our approach in a tool named Mobius. On a suite of 21 challenging recursive query synthesis tasks, Mobius outperforms three state-of-the-art baselines Gensynth, ILASP, and Popper, both in terms of runtime and accuracy. We also demonstrate that the synthesized queries generalize well to unseen data.

在存在递归和虚构谓词的情况下，从数据中合成关系查询具有挑战性。我们提出了一种完全自动化的方法来合成这些查询。我们的方法包括两个步骤:首先合成与给定数据一致的非递归查询，然后识别其中的递归模式，从而推广到任意数据。这种泛化是通过迭代谓词统一过程实现的，该过程利用数据来源的概念来加速收敛。在该过程的每次迭代中，约束求解器提出一个候选查询，查询求值器检查所提出的程序是否与给定的数据一致。失败查询的数据来源允许我们为约束求解器构造额外的约束并改进搜索。我们已经在一个名为Mobius的工具中实现了我们的方法。在包含21个具有挑战性的递归查询合成任务的套件中，Mobius在运行时间和准确性方面都优于三个最先进的基线Gensynth、ILASP和Popper。我们还证明了合成查询可以很好地泛化到不可见的数据。

{"title":"Mobius: Synthesizing Relational Queries with Recursive and Invented Predicates","authors":"Aalok Thakkar, Nathaniel Sands, George Petrou, Rajeev Alur, Mayur Naik, Mukund Raghothaman","doi":"10.1145/3622847","DOIUrl":"https://doi.org/10.1145/3622847","url":null,"abstract":"Synthesizing relational queries from data is challenging in the presence of recursion and invented predicates. We propose a fully automated approach to synthesize such queries. Our approach comprises of two steps: it first synthesizes a non-recursive query consistent with the given data, and then identifies recursion schemes in it and thereby generalizes to arbitrary data. This generalization is achieved by an iterative predicate unification procedure which exploits the notion of data provenance to accelerate convergence. In each iteration of the procedure, a constraint solver proposes a candidate query, and a query evaluator checks if the proposed program is consistent with the given data. The data provenance for a failed query allows us to construct additional constraints for the constraint solver and refine the search. We have implemented our approach in a tool named Mobius. On a suite of 21 challenging recursive query synthesis tasks, Mobius outperforms three state-of-the-art baselines Gensynth, ILASP, and Popper, both in terms of runtime and accuracy. We also demonstrate that the synthesized queries generalize well to unseen data.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136112804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Explanation Method for Models of Code 代码模型的一种解释方法

Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Proceedings of the ACM on Programming Languages

Pub Date : 2023-10-16 DOI: 10.1145/3622826

Yu Wang, Ke Wang, Linzhang Wang

This paper introduces a novel method, called WheaCha, for explaining the predictions of code models. Similar to attribution methods, WheaCha seeks to identify input features that are responsible for a particular prediction that models make. On the other hand, it differs from attribution methods in crucial ways. Specifically, WheaCha separates an input program into "wheat" (i.e., defining features that are the reason for which models predict the label that they predict) and the rest "chaff" for any given prediction. We realize WheaCha in a tool, HuoYan, and use it to explain four prominent code models: code2vec, seq-GNN, GGNN, and CodeBERT. Results show that (1) HuoYan is efficient — taking on average under twenty seconds to compute wheat for an input program in an end-to-end fashion (i.e., including model prediction time); (2) the wheat that all models use to make predictions is predominantly comprised of simple syntactic or even lexical properties (i.e., identifier names); (3) neither the latest explainability methods for code models (i.e., SIVAND and CounterFactual Explanations) nor the most noteworthy attribution methods (i.e., Integrated Gradients and SHAP) can precisely capture wheat. Finally, we set out to demonstrate the usefulness of WheaCha, in particular, we assess if WheaCha’s explanations can help end users to identify defective code models (e.g., trained on mislabeled data or learned spurious correlations from biased data). We find that, with WheaCha, users achieve far higher accuracy in identifying faulty models than SIVAND, CounterFactual Explanations, Integrated Gradients and SHAP.

本文介绍了一种名为WheaCha的新方法，用于解释代码模型的预测。与归因方法类似，WheaCha寻求识别负责模型做出特定预测的输入特征。另一方面，它与归因方法在关键方面有所不同。具体来说，WheaCha将输入程序分离为“小麦”(即，定义模型预测其预测标签的原因的特征)和其他任何给定预测的“糠”。我们在一个工具HuoYan中实现了WheaCha，并用它来解释四个重要的代码模型:code2vec、seq-GNN、GGNN和CodeBERT。结果表明:(1)火言是高效的——以端到端方式计算输入程序的小麦平均用时不到20秒(即，包括模型预测时间);(2)所有模型用来进行预测的小麦主要由简单的语法甚至词汇属性(即标识符名称)组成;(3)最新的编码模型可解释性方法(即SIVAND和反事实解释)和最值得注意的归因方法(即Integrated Gradients和SHAP)都不能精确捕获小麦。最后，我们开始展示WheaCha的有用性，特别是，我们评估WheaCha的解释是否可以帮助最终用户识别有缺陷的代码模型(例如，在错误标记的数据上训练或从有偏差的数据中学习虚假的相关性)。我们发现，与SIVAND、反事实解释、集成梯度和SHAP相比，使用WheaCha，用户在识别错误模型方面的准确性要高得多。

{"title":"An Explanation Method for Models of Code","authors":"Yu Wang, Ke Wang, Linzhang Wang","doi":"10.1145/3622826","DOIUrl":"https://doi.org/10.1145/3622826","url":null,"abstract":"This paper introduces a novel method, called WheaCha, for explaining the predictions of code models. Similar to attribution methods, WheaCha seeks to identify input features that are responsible for a particular prediction that models make. On the other hand, it differs from attribution methods in crucial ways. Specifically, WheaCha separates an input program into \"wheat\" (i.e., defining features that are the reason for which models predict the label that they predict) and the rest \"chaff\" for any given prediction. We realize WheaCha in a tool, HuoYan, and use it to explain four prominent code models: code2vec, seq-GNN, GGNN, and CodeBERT. Results show that (1) HuoYan is efficient — taking on average under twenty seconds to compute wheat for an input program in an end-to-end fashion (i.e., including model prediction time); (2) the wheat that all models use to make predictions is predominantly comprised of simple syntactic or even lexical properties (i.e., identifier names); (3) neither the latest explainability methods for code models (i.e., SIVAND and CounterFactual Explanations) nor the most noteworthy attribution methods (i.e., Integrated Gradients and SHAP) can precisely capture wheat. Finally, we set out to demonstrate the usefulness of WheaCha, in particular, we assess if WheaCha’s explanations can help end users to identify defective code models (e.g., trained on mislabeled data or learned spurious correlations from biased data). We find that, with WheaCha, users achieve far higher accuracy in identifying faulty models than SIVAND, CounterFactual Explanations, Integrated Gradients and SHAP.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"14 2-3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136114720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0