Zhuo Cai, Soroush Farokhnia, Amir Kafshdar Goharshady, S. Hitarth
Modern programmable blockchains have built-in support for smart contracts, i.e. programs that are stored on the blockchain and whose state is subject to consensus. After a smart contract is deployed on the blockchain, anyone on the network can interact with it and call its functions by creating transactions. The blockchain protocol is then used to reach a consensus about the order of the transactions and, as a direct corollary, the state of every smart contract. Reaching such consensus necessarily requires every node on the network to execute all function calls. Thus, an attacker can perform DoS by creating expensive transactions and function calls that use considerable or even possibly infinite time and space. To avoid this, following Ethereum, virtually all programmable blockchains have introduced the concept of “gas”. A fixed hard-coded gas cost is assigned to every atomic operation and the user who calls a function has to pay for its total gas usage. This technique ensures that the protocol is not vulnerable to DoS attacks, but it has also had significant unintended consequences. Out-of-gas errors, i.e. when a user misunderestimates the gas usage of their function call and does not allocate enough gas, are a major source of security vulnerabilities in Ethereum. We focus on the well-studied problem of automatically finding upper-bounds on the gas usage of a smart contract. This is a classical problem in the blockchain community and has also been extensively studied by researchers in programming languages and verification. In this work, we provide a novel approach using theorems from polyhedral geometry and real algebraic geometry, namely Farkas’ Lemma, Handelman’s Theorem, and Putinar’s Positivstellensatz, to automatically synthesize linear and polynomial parametric bounds for the gas usage of smart contracts. Our approach is the first to provide completeness guarantees for the synthesis of such parametric upper-bounds. Moreover, our theoretical results are independent of the underlying consensus protocol and can be applied to smart contracts written in any language and run on any blockchain. As a proof of concept, we also provide a tool, called “Asparagus” that implements our algorithms for Ethereum contracts written in Solidity. Finally, we provide extensive experimental results over 24,188 real-world smart contracts that are currently deployed on the Ethereum blockchain. We compare Asparagus against GASTAP, which is the only previous tool that could provide parametric bounds, and show that our method significantly outperforms it, both in terms of applicability and the tightness of the resulting bounds. More specifically, our approach can handle 80.56% of the functions (126,269 out of 156,735) in comparison with GASTAP’s 58.62%. Additionally, even on the benchmarks where both approaches successfully synthesize a bound, our bound is tighter in 97.85% of the cases.
{"title":"Asparagus: Automated Synthesis of Parametric Gas Upper-Bounds for Smart Contracts","authors":"Zhuo Cai, Soroush Farokhnia, Amir Kafshdar Goharshady, S. Hitarth","doi":"10.1145/3622829","DOIUrl":"https://doi.org/10.1145/3622829","url":null,"abstract":"Modern programmable blockchains have built-in support for smart contracts, i.e. programs that are stored on the blockchain and whose state is subject to consensus. After a smart contract is deployed on the blockchain, anyone on the network can interact with it and call its functions by creating transactions. The blockchain protocol is then used to reach a consensus about the order of the transactions and, as a direct corollary, the state of every smart contract. Reaching such consensus necessarily requires every node on the network to execute all function calls. Thus, an attacker can perform DoS by creating expensive transactions and function calls that use considerable or even possibly infinite time and space. To avoid this, following Ethereum, virtually all programmable blockchains have introduced the concept of “gas”. A fixed hard-coded gas cost is assigned to every atomic operation and the user who calls a function has to pay for its total gas usage. This technique ensures that the protocol is not vulnerable to DoS attacks, but it has also had significant unintended consequences. Out-of-gas errors, i.e. when a user misunderestimates the gas usage of their function call and does not allocate enough gas, are a major source of security vulnerabilities in Ethereum. We focus on the well-studied problem of automatically finding upper-bounds on the gas usage of a smart contract. This is a classical problem in the blockchain community and has also been extensively studied by researchers in programming languages and verification. In this work, we provide a novel approach using theorems from polyhedral geometry and real algebraic geometry, namely Farkas’ Lemma, Handelman’s Theorem, and Putinar’s Positivstellensatz, to automatically synthesize linear and polynomial parametric bounds for the gas usage of smart contracts. Our approach is the first to provide completeness guarantees for the synthesis of such parametric upper-bounds. Moreover, our theoretical results are independent of the underlying consensus protocol and can be applied to smart contracts written in any language and run on any blockchain. As a proof of concept, we also provide a tool, called “Asparagus” that implements our algorithms for Ethereum contracts written in Solidity. Finally, we provide extensive experimental results over 24,188 real-world smart contracts that are currently deployed on the Ethereum blockchain. We compare Asparagus against GASTAP, which is the only previous tool that could provide parametric bounds, and show that our method significantly outperforms it, both in terms of applicability and the tightness of the resulting bounds. More specifically, our approach can handle 80.56% of the functions (126,269 out of 156,735) in comparison with GASTAP’s 58.62%. Additionally, even on the benchmarks where both approaches successfully synthesize a bound, our bound is tighter in 97.85% of the cases.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"1145 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136114735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Every program should be accompanied by a specification that describes important aspects of the code's behavior, but writing good specifications is often harder than writing the code itself. This paper addresses the problem of synthesizing specifications automatically, guided by user-supplied inputs of two kinds: i) a query posed about a set of function definitions, and ii) a domain-specific language L in which the extracted property is to be expressed (we call properties in the language L-properties). Each of the property is a best L-property for the query: there is no other L-property that is strictly more precise. Furthermore, the set of synthesized L-properties is exhaustive: no more L-properties can be added to it to make the conjunction more precise. We implemented our method in a tool, Spyro. The ability to modify both the query and L provides a Spyro user with ways to customize the kind of specification to be synthesized. We use this ability to show that Spyro can be used in a variety of applications, such as mining program specifications, performing abstract-domain operations, and synthesizing algebraic properties of program modules.
{"title":"Synthesizing Specifications","authors":"Kanghee Park, Loris D'Antoni, Thomas Reps","doi":"10.1145/3622861","DOIUrl":"https://doi.org/10.1145/3622861","url":null,"abstract":"Every program should be accompanied by a specification that describes important aspects of the code's behavior, but writing good specifications is often harder than writing the code itself. This paper addresses the problem of synthesizing specifications automatically, guided by user-supplied inputs of two kinds: i) a query posed about a set of function definitions, and ii) a domain-specific language L in which the extracted property is to be expressed (we call properties in the language L-properties). Each of the property is a best L-property for the query: there is no other L-property that is strictly more precise. Furthermore, the set of synthesized L-properties is exhaustive: no more L-properties can be added to it to make the conjunction more precise. We implemented our method in a tool, Spyro. The ability to modify both the query and L provides a Spyro user with ways to customize the kind of specification to be synthesized. We use this ability to show that Spyro can be used in a variety of applications, such as mining program specifications, performing abstract-domain operations, and synthesizing algebraic properties of program modules.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136115591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jaehwang Jung, Janggun Lee, Jaemin Choi, Jaewoo Kim, Sunho Park, Jeehoon Kang
Formal verification is an effective method to address the challenge of designing correct and efficient concurrent data structures. But verification efforts often ignore memory reclamation , which involves nontrivial synchronization between concurrent accesses and reclamation. When incorrectly implemented, it may lead to critical safety errors such as use-after-free and the ABA problem. Semi-automatic safe memory reclamation schemes such as hazard pointers and RCU encapsulate the complexity of manual memory management in modular interfaces. However, this modularity has not been carried over to formal verification. We propose modular specifications of hazard pointers and RCU, and formally verify realistic implementations of them in concurrent separation logic. Specifically, we design abstract predicates for hazard pointers that capture the meaning of validating the protection of nodes, and those for RCU that support optimistic traversal to possibly retired nodes. We demonstrate that the specifications indeed facilitate modular verification in three criteria: compositional verification, general applicability, and easy integration. In doing so, we present the first formal verification of Harris’s list, the Harris-Michael list, the Chase-Lev deque, and RDCSS with reclamation. We report the Coq mechanization of all our results in the Iris separation logic framework.
{"title":"Modular Verification of Safe Memory Reclamation in Concurrent Separation Logic","authors":"Jaehwang Jung, Janggun Lee, Jaemin Choi, Jaewoo Kim, Sunho Park, Jeehoon Kang","doi":"10.1145/3622827","DOIUrl":"https://doi.org/10.1145/3622827","url":null,"abstract":"Formal verification is an effective method to address the challenge of designing correct and efficient concurrent data structures. But verification efforts often ignore memory reclamation , which involves nontrivial synchronization between concurrent accesses and reclamation. When incorrectly implemented, it may lead to critical safety errors such as use-after-free and the ABA problem. Semi-automatic safe memory reclamation schemes such as hazard pointers and RCU encapsulate the complexity of manual memory management in modular interfaces. However, this modularity has not been carried over to formal verification. We propose modular specifications of hazard pointers and RCU, and formally verify realistic implementations of them in concurrent separation logic. Specifically, we design abstract predicates for hazard pointers that capture the meaning of validating the protection of nodes, and those for RCU that support optimistic traversal to possibly retired nodes. We demonstrate that the specifications indeed facilitate modular verification in three criteria: compositional verification, general applicability, and easy integration. In doing so, we present the first formal verification of Harris’s list, the Harris-Michael list, the Chase-Lev deque, and RDCSS with reclamation. We report the Coq mechanization of all our results in the Iris separation logic framework.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136116377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luke Cheeseman, Matthew J. Parkinson, Sylvan Clebsch, Marios Kogias, Sophia Drossopoulou, David Chisnall, Tobias Wrigstad, Paul Liétar
Expressing parallelism and coordination is central for modern concurrent programming. Many mechanisms exist for expressing both parallelism and coordination. However, the design decisions for these two mechanisms are tightly intertwined. We believe that the interdependence of these two mechanisms should be recognised and achieved through a single, powerful primitive. We are not the first to realise this: the prime example is actor model programming, where parallelism arises through fine-grained decomposition of a program’s state into actors that are able to execute independently in parallel. However, actor model programming has a serious pain point: updating multiple actors as a single atomic operation is a challenging task. We address this pain point by introducing a new concurrency paradigm: Behaviour-Oriented Concurrency (BoC). In BoC, we are revisiting the fundamental concept of a behaviour to provide a more transactional concurrency model. BoC enables asynchronously creating atomic and ordered units of work with exclusive access to a collection of independent resources. In this paper, we describe BoC informally in terms of examples, which demonstrate the advantages of exclusive access to several independent resources, as well as the need for ordering. We define it through a formal model. We demonstrate its practicality by implementing a C++ runtime. We argue its applicability through the Savina benchmark suite: benchmarks in this suite can be more compactly represented using BoC in place of Actors, and we observe comparable, if not better, performance.
{"title":"When Concurrency Matters: Behaviour-Oriented Concurrency","authors":"Luke Cheeseman, Matthew J. Parkinson, Sylvan Clebsch, Marios Kogias, Sophia Drossopoulou, David Chisnall, Tobias Wrigstad, Paul Liétar","doi":"10.1145/3622852","DOIUrl":"https://doi.org/10.1145/3622852","url":null,"abstract":"Expressing parallelism and coordination is central for modern concurrent programming. Many mechanisms exist for expressing both parallelism and coordination. However, the design decisions for these two mechanisms are tightly intertwined. We believe that the interdependence of these two mechanisms should be recognised and achieved through a single, powerful primitive. We are not the first to realise this: the prime example is actor model programming, where parallelism arises through fine-grained decomposition of a program’s state into actors that are able to execute independently in parallel. However, actor model programming has a serious pain point: updating multiple actors as a single atomic operation is a challenging task. We address this pain point by introducing a new concurrency paradigm: Behaviour-Oriented Concurrency (BoC). In BoC, we are revisiting the fundamental concept of a behaviour to provide a more transactional concurrency model. BoC enables asynchronously creating atomic and ordered units of work with exclusive access to a collection of independent resources. In this paper, we describe BoC informally in terms of examples, which demonstrate the advantages of exclusive access to several independent resources, as well as the need for ordering. We define it through a formal model. We demonstrate its practicality by implementing a C++ runtime. We argue its applicability through the Savina benchmark suite: benchmarks in this suite can be more compactly represented using BoC in place of Actors, and we observe comparable, if not better, performance.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136077867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arvidsson, Ellen, Castegren, Elias, Clebsch, Sylvan, Drossopoulou, Sophia, Noble, James, Parkinson, Matthew J., Wrigstad, Tobias
Verona is a concurrent object-oriented programming language that organises all the objects in a program into a forest of isolated regions. Memory is managed locally for each region, so programmers can control a program's memory use by adjusting objects' partition into regions, and by setting each region's memory management strategy. A thread can only mutate (allocate, deallocate) objects within one active region -- its "window of mutability". Memory management costs are localised to the active region, ensuring overheads can be predicted and controlled. Moving the mutability window between regions is explicit, so code can be executed wherever it is required, yet programs remain in control of memory use. An ownership type system based on reference capabilities enforces region isolation, controlling aliasing within and between regions, yet supporting objects moving between regions and threads. Data accesses never need expensive atomic operations, and are always thread-safe.
{"title":"Reference Capabilities for Flexible Memory Management","authors":"Arvidsson, Ellen, Castegren, Elias, Clebsch, Sylvan, Drossopoulou, Sophia, Noble, James, Parkinson, Matthew J., Wrigstad, Tobias","doi":"10.1145/3622846","DOIUrl":"https://doi.org/10.1145/3622846","url":null,"abstract":"Verona is a concurrent object-oriented programming language that organises all the objects in a program into a forest of isolated regions. Memory is managed locally for each region, so programmers can control a program's memory use by adjusting objects' partition into regions, and by setting each region's memory management strategy. A thread can only mutate (allocate, deallocate) objects within one active region -- its \"window of mutability\". Memory management costs are localised to the active region, ensuring overheads can be predicted and controlled. Moving the mutability window between regions is explicit, so code can be executed wherever it is required, yet programs remain in control of memory use. An ownership type system based on reference capabilities enforces region isolation, controlling aliasing within and between regions, yet supporting objects moving between regions and threads. Data accesses never need expensive atomic operations, and are always thread-safe.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136078200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Graph-based intermediate representations (IRs) are widely used for powerful compiler optimizations, either interprocedurally in pure functional languages, or intraprocedurally in imperative languages. Yet so far, no suitable graph IR exists for aggressive global optimizations in languages with both effects and higher-order functions: aliasing and indirect control transfers make it difficult to maintain sufficiently granular dependency information for optimizations to be effective. To close this long-standing gap, we propose a novel typed graph IR combining a notion of reachability types with an expressive effect system to compute precise and granular effect dependencies at an affordable cost while supporting local reasoning and separate compilation. Our high-level graph IR imposes lexical structure to represent structured control flow and nesting, enabling aggressive and yet inexpensive code motion and other optimizations for impure higher-order programs. We formalize the new graph IR based on a λ-calculus with a reachability type-and-effect system along with a specification of various optimizations. We present performance case studies for tensor loop fusion, CUDA kernel fusion, symbolic execution of LLVM IR, and SQL query compilation in the Scala LMS compiler framework using the new graph IR. We observe significant speedups of up to 21 x .
{"title":"Graph IRs for Impure Higher-Order Languages: Making Aggressive Optimizations Affordable with Precise Effect Dependencies","authors":"Oliver Bračevac, Guannan Wei, Songlin Jia, Supun Abeysinghe, Yuxuan Jiang, Yuyan Bao, Tiark Rompf","doi":"10.1145/3622813","DOIUrl":"https://doi.org/10.1145/3622813","url":null,"abstract":"Graph-based intermediate representations (IRs) are widely used for powerful compiler optimizations, either interprocedurally in pure functional languages, or intraprocedurally in imperative languages. Yet so far, no suitable graph IR exists for aggressive global optimizations in languages with both effects and higher-order functions: aliasing and indirect control transfers make it difficult to maintain sufficiently granular dependency information for optimizations to be effective. To close this long-standing gap, we propose a novel typed graph IR combining a notion of reachability types with an expressive effect system to compute precise and granular effect dependencies at an affordable cost while supporting local reasoning and separate compilation. Our high-level graph IR imposes lexical structure to represent structured control flow and nesting, enabling aggressive and yet inexpensive code motion and other optimizations for impure higher-order programs. We formalize the new graph IR based on a λ-calculus with a reachability type-and-effect system along with a specification of various optimizations. We present performance case studies for tensor loop fusion, CUDA kernel fusion, symbolic execution of LLVM IR, and SQL query compilation in the Scala LMS compiler framework using the new graph IR. We observe significant speedups of up to 21 x .","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136112536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Internet users rely on the protocols they use to protect their private information including their identity and the websites they visit. Formal verification of these protocols can detect subtle bugs that compromise these protections at design time, but is a challenging task as it involves probabilistic reasoning about random sampling, cryptographic primitives, and concurrent execution. Existing approaches either reason about symbolic models of the protocols that sacrifice precision for automation, or reason about more precise computational models that are harder to automate and require cryptographic expertise. In this paper we propose a novel approach to verifying privacy-preserving protocols that is more precise than symbolic models yet more accessible than computational models. Our approach permits direct-style proofs of privacy, as opposed to indirect game-based proofs in computational models, by formalizing privacy as indistinguishability of possible network traces induced by a protocol. We ease automation by leveraging insights from the distributed systems verification community to create sound synchronous models of concurrent protocols. Our verification framework is implemented in F* as a library we call Waldo. We describe two large case studies of using Waldo to verify indistinguishability; one on the Encrypted Client Hello (ECH) extension of the TLS protocol and another on a Private Information Retrieval (PIR) protocol. We uncover subtle flaws in the TLS ECH specification that were missed by other models.
{"title":"Verifying Indistinguishability of Privacy-Preserving Protocols","authors":"Kirby Linvill, Gowtham Kaki, Eric Wustrow","doi":"10.1145/3622849","DOIUrl":"https://doi.org/10.1145/3622849","url":null,"abstract":"Internet users rely on the protocols they use to protect their private information including their identity and the websites they visit. Formal verification of these protocols can detect subtle bugs that compromise these protections at design time, but is a challenging task as it involves probabilistic reasoning about random sampling, cryptographic primitives, and concurrent execution. Existing approaches either reason about symbolic models of the protocols that sacrifice precision for automation, or reason about more precise computational models that are harder to automate and require cryptographic expertise. In this paper we propose a novel approach to verifying privacy-preserving protocols that is more precise than symbolic models yet more accessible than computational models. Our approach permits direct-style proofs of privacy, as opposed to indirect game-based proofs in computational models, by formalizing privacy as indistinguishability of possible network traces induced by a protocol. We ease automation by leveraging insights from the distributed systems verification community to create sound synchronous models of concurrent protocols. Our verification framework is implemented in F* as a library we call Waldo. We describe two large case studies of using Waldo to verify indistinguishability; one on the Encrypted Client Hello (ECH) extension of the TLS protocol and another on a Private Information Retrieval (PIR) protocol. We uncover subtle flaws in the TLS ECH specification that were missed by other models.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136112802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Iso-recursive types are often taken as a type-theoretic model for type recursion as present in many programming languages, e.g., classes in object-oriented languages or algebraic datatypes in functional languages. Their main advantage over an equi-recursive semantics is that they are simpler and algorithmically less expensive, which is an important consideration when the cost of type checking matters, such as for intermediate or low-level code representations, virtual machines, or runtime casts. However, a closer look reveals that iso-recursion cannot, in its standard form, efficiently express essential type system features like mutual recursion or non-uniform recursion. While it has been folklore that mutual recursion and non-uniform type parameterisation can nicely be handled by generalising to higher kinds, this encoding breaks down when combined with subtyping: the classic “Amber” rule for subtyping iso-recursive types is too weak to express mutual recursion without falling back to encodings of quadratic size. We present a foundational core calculus of iso-recursive types with declared subtyping that can express both inter- and intra-recursion subtyping without such blowup, including subtyping between constructors of higher or mixed kind. In a second step, we identify a syntactic fragment of this general calculus that allows for more efficient type checking without “deep” substitutions, by observing that higher-kinded iso-recursive types can be inserted to “guard” against unwanted β-reductions. This fragment closely resembles the structure of typical nominal subtype systems, but without requiring nominal semantics. It has been used as the basis for a proposed extension of WebAssembly with recursive types.
{"title":"Mutually Iso-Recursive Subtyping","authors":"Andreas Rossberg","doi":"10.1145/3622809","DOIUrl":"https://doi.org/10.1145/3622809","url":null,"abstract":"Iso-recursive types are often taken as a type-theoretic model for type recursion as present in many programming languages, e.g., classes in object-oriented languages or algebraic datatypes in functional languages. Their main advantage over an equi-recursive semantics is that they are simpler and algorithmically less expensive, which is an important consideration when the cost of type checking matters, such as for intermediate or low-level code representations, virtual machines, or runtime casts. However, a closer look reveals that iso-recursion cannot, in its standard form, efficiently express essential type system features like mutual recursion or non-uniform recursion. While it has been folklore that mutual recursion and non-uniform type parameterisation can nicely be handled by generalising to higher kinds, this encoding breaks down when combined with subtyping: the classic “Amber” rule for subtyping iso-recursive types is too weak to express mutual recursion without falling back to encodings of quadratic size. We present a foundational core calculus of iso-recursive types with declared subtyping that can express both inter- and intra-recursion subtyping without such blowup, including subtyping between constructors of higher or mixed kind. In a second step, we identify a syntactic fragment of this general calculus that allows for more efficient type checking without “deep” substitutions, by observing that higher-kinded iso-recursive types can be inserted to “guard” against unwanted β-reductions. This fragment closely resembles the structure of typical nominal subtype systems, but without requiring nominal semantics. It has been used as the basis for a proposed extension of WebAssembly with recursive types.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136112803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yu-Fang Chen, David Chocholatý, Vojtěch Havlena, Lukáš Holík, Ondřej Lengál, Juraj Síč
We present a new algorithm for solving string constraints. The algorithm builds upon a recent method for solving word equations and regular constraints that interprets string variables as languages rather than strings and, consequently, mitigates the combinatorial explosion that plagues other approaches. We extend the approach to handle linear integer arithmetic length constraints by combination with a known principle of equation alignment and splitting, and by extension to other common types of string constraints, yielding a fully-fledged string solver. The ability of the framework to handle unrestricted disequalities even extends one of the largest decidable classes of string constraints, the chain-free fragment. We integrate our algorithm into a DPLL-based SMT solver. The performance of our implementation is competitive and even significantly better than state-of-the-art string solvers on several established benchmarks obtained from applications in verification of string programs.
{"title":"Solving String Constraints with Lengths by Stabilization","authors":"Yu-Fang Chen, David Chocholatý, Vojtěch Havlena, Lukáš Holík, Ondřej Lengál, Juraj Síč","doi":"10.1145/3622872","DOIUrl":"https://doi.org/10.1145/3622872","url":null,"abstract":"We present a new algorithm for solving string constraints. The algorithm builds upon a recent method for solving word equations and regular constraints that interprets string variables as languages rather than strings and, consequently, mitigates the combinatorial explosion that plagues other approaches. We extend the approach to handle linear integer arithmetic length constraints by combination with a known principle of equation alignment and splitting, and by extension to other common types of string constraints, yielding a fully-fledged string solver. The ability of the framework to handle unrestricted disequalities even extends one of the largest decidable classes of string constraints, the chain-free fragment. We integrate our algorithm into a DPLL-based SMT solver. The performance of our implementation is competitive and even significantly better than state-of-the-art string solvers on several established benchmarks obtained from applications in verification of string programs.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136112811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arash Sahebolamri, Langston Barrett, Scott Moore, Kristopher Micinski
The restricted logic programming language Datalog has become a popular implementation target for deductive-analytic workloads including social-media analytics and program analysis. Modern Datalog engines compile Datalog rules to joins over explicit representations of relations—often B-trees or hash maps. While these modern engines have enabled high scalability in many application domains, they have a crucial weakness: achieving the desired algorithmic complexity may be impossible due to representation-imposed overhead of the engine’s data structures. In this paper, we present the "Bring Your Own Data Structures" (Byods) approach, in the form of a DSL embedded in Rust. Using Byods, an engineer writes logical rules which are implicitly parametric on the concrete data structure representation; our implementation provides an interface to enable "bringing their own" data structures to represent relations, which harmoniously interact with code generated by our compiler (implemented as Rust procedural macros). We formalize the semantics of Byods as an extension of Datalog’s; our formalization captures the key properties demanded of data structures compatible with Byods, including properties required for incrementalized (semi-naïve) evaluation. We detail many applications of the Byods approach, implementing analyses requiring specialized data structures for transitive and equivalence relations to scale, including an optimized version of the Rust borrow checker Polonius; highly-parallel PageRank made possible by lattices; and a large-scale analysis of LLVM utilizing index-sharing to scale. Our results show that Byods offers both improved algorithmic scalability (reduced time and/or space complexity) and runtimes competitive with state-of-the-art parallelizing Datalog solvers.
{"title":"Bring Your Own Data Structures to Datalog","authors":"Arash Sahebolamri, Langston Barrett, Scott Moore, Kristopher Micinski","doi":"10.1145/3622840","DOIUrl":"https://doi.org/10.1145/3622840","url":null,"abstract":"The restricted logic programming language Datalog has become a popular implementation target for deductive-analytic workloads including social-media analytics and program analysis. Modern Datalog engines compile Datalog rules to joins over explicit representations of relations—often B-trees or hash maps. While these modern engines have enabled high scalability in many application domains, they have a crucial weakness: achieving the desired algorithmic complexity may be impossible due to representation-imposed overhead of the engine’s data structures. In this paper, we present the \"Bring Your Own Data Structures\" (Byods) approach, in the form of a DSL embedded in Rust. Using Byods, an engineer writes logical rules which are implicitly parametric on the concrete data structure representation; our implementation provides an interface to enable \"bringing their own\" data structures to represent relations, which harmoniously interact with code generated by our compiler (implemented as Rust procedural macros). We formalize the semantics of Byods as an extension of Datalog’s; our formalization captures the key properties demanded of data structures compatible with Byods, including properties required for incrementalized (semi-naïve) evaluation. We detail many applications of the Byods approach, implementing analyses requiring specialized data structures for transitive and equivalence relations to scale, including an optimized version of the Rust borrow checker Polonius; highly-parallel PageRank made possible by lattices; and a large-scale analysis of LLVM utilizing index-sharing to scale. Our results show that Byods offers both improved algorithmic scalability (reduced time and/or space complexity) and runtimes competitive with state-of-the-art parallelizing Datalog solvers.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136115188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}