Pub Date : 2022-08-17DOI: https://dl.acm.org/doi/10.1145/3527634
Matthijs Vákár, Tom Smeding
We introduce Combinatory Homomorphic Automatic Differentiation (CHAD), a principled, pure, provably correct define-then-run method for performing forward and reverse mode automatic differentiation (AD) on programming languages with expressive features. It implements AD as a compositional, type-respecting source-code transformation that generates purely functional code. This code transformation is principled in the sense that it is the unique homomorphic (structure preserving) extension to expressive languages of Elliott’s well-known and unambiguous definitions of AD for a first-order functional language. Correctness of the method follows by a (compositional) logical relations argument that shows that the semantics of the syntactic derivative is the usual calculus derivative of the semantics of the original program.
In their most elegant formulation, the transformations generate code with linear types. However, the code transformations can be implemented in a standard functional language lacking linear types: While the correctness proof requires tracking of linearity, the actual transformations do not. In fact, even in a standard functional language, we can get all of the type-safety that linear types give us: We can implement all linear types used to type the transformations as abstract types by using a basic module system.
In this article, we detail the method when applied to a simple higher-order language for manipulating statically sized arrays. However, we explain how the methodology applies, more generally, to functional languages with other expressive features. Finally, we discuss how the scope of CHAD extends beyond applications in AD to other dynamic program analyses that accumulate data in a commutative monoid.
{"title":"CHAD: Combinatory Homomorphic Automatic Differentiation","authors":"Matthijs Vákár, Tom Smeding","doi":"https://dl.acm.org/doi/10.1145/3527634","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3527634","url":null,"abstract":"<p>We introduce Combinatory Homomorphic Automatic Differentiation (CHAD), a principled, pure, provably correct define-then-run method for performing forward and reverse mode automatic differentiation (AD) on programming languages with expressive features. It implements AD as a compositional, type-respecting source-code transformation that generates purely functional code. This code transformation is principled in the sense that it is the unique homomorphic (structure preserving) extension to expressive languages of Elliott’s well-known and unambiguous definitions of AD for a first-order functional language. Correctness of the method follows by a (compositional) logical relations argument that shows that the semantics of the syntactic derivative is the usual calculus derivative of the semantics of the original program.</p><p>In their most elegant formulation, the transformations generate code with linear types. However, the code transformations can be implemented in a standard functional language lacking linear types: While the correctness proof requires tracking of linearity, the actual transformations do not. In fact, even in a standard functional language, we can get all of the type-safety that linear types give us: We can implement all linear types used to type the transformations as abstract types by using a basic module system.</p><p>In this article, we detail the method when applied to a simple higher-order language for manipulating statically sized arrays. However, we explain how the methodology applies, more generally, to functional languages with other expressive features. Finally, we discuss how the scope of CHAD extends beyond applications in AD to other dynamic program analyses that accumulate data in a commutative monoid.</p>","PeriodicalId":50939,"journal":{"name":"ACM Transactions on Programming Languages and Systems","volume":"81 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2022-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138531569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Vasconcelos, F. Martins, Hugo A. López, N. Yoshida
We present ParTypes, a type discipline for parallel programs. The model we have in mind comprises a fixed number of processes running in parallel and communicating via collective operations or point-to-point synchronous message exchanges. A type describes a protocol to be followed by each processes in a given program. We present the type theory, a core imperative programming language and its operational semantics, and prove that type checking is decidable (up to decidability of semantic entailment) and that well-typed programs do not deadlock and always terminate. The article is accompanied by a large number of examples drawn from the literature on parallel programming.
{"title":"A Type Discipline for Message Passing Parallel Programs","authors":"V. Vasconcelos, F. Martins, Hugo A. López, N. Yoshida","doi":"10.1145/3552519","DOIUrl":"https://doi.org/10.1145/3552519","url":null,"abstract":"We present ParTypes, a type discipline for parallel programs. The model we have in mind comprises a fixed number of processes running in parallel and communicating via collective operations or point-to-point synchronous message exchanges. A type describes a protocol to be followed by each processes in a given program. We present the type theory, a core imperative programming language and its operational semantics, and prove that type checking is decidable (up to decidability of semantic entailment) and that well-typed programs do not deadlock and always terminate. The article is accompanied by a large number of examples drawn from the literature on parallel programming.","PeriodicalId":50939,"journal":{"name":"ACM Transactions on Programming Languages and Systems","volume":"44 1","pages":"1 - 55"},"PeriodicalIF":1.3,"publicationDate":"2022-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48994494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-15DOI: https://dl.acm.org/doi/full/10.1145/3539656
Ankush Das, Henry Deyoung, Andreia Mordido, Frank Pfenning
Session types statically describe communication protocols between concurrent message-passing processes. Unfortunately, parametric polymorphism even in its restricted prenex form is not fully understood in the context of session types. In this article, we present the metatheory of session types extended with prenex polymorphism and, as a result, nested recursive datatypes. Remarkably, we prove that type equality is decidable by exhibiting a reduction to trace equivalence of deterministic first-order grammars. Recognizing the high theoretical complexity of the latter, we also propose a novel type equality algorithm and prove its soundness. We observe that the algorithm is surprisingly efficient and, despite its incompleteness, sufficient for all our examples. We have implemented our ideas by extending the Rast programming language with nested session types. We conclude with several examples illustrating the expressivity of our enhanced type system.
{"title":"Nested Session Types","authors":"Ankush Das, Henry Deyoung, Andreia Mordido, Frank Pfenning","doi":"https://dl.acm.org/doi/full/10.1145/3539656","DOIUrl":"https://doi.org/https://dl.acm.org/doi/full/10.1145/3539656","url":null,"abstract":"<p>Session types statically describe communication protocols between concurrent message-passing processes. Unfortunately, parametric polymorphism even in its restricted prenex form is not fully understood in the context of session types. In this article, we present the metatheory of session types extended with prenex polymorphism and, as a result, nested recursive datatypes. Remarkably, we prove that type equality is decidable by exhibiting a reduction to trace equivalence of deterministic first-order grammars. Recognizing the high theoretical complexity of the latter, we also propose a novel type equality algorithm and prove its soundness. We observe that the algorithm is surprisingly efficient and, despite its incompleteness, sufficient for all our examples. We have implemented our ideas by extending the Rast programming language with nested session types. We conclude with several examples illustrating the expressivity of our enhanced type system.</p>","PeriodicalId":50939,"journal":{"name":"ACM Transactions on Programming Languages and Systems","volume":"1 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2022-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138542085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-15DOI: https://dl.acm.org/doi/full/10.1145/3527633
Alex C. Keizer, Henning Basold, Jorge A. Pérez
Compositional methods are central to the verification of software systems. For concurrent and communicating systems, compositional techniques based on behavioural type systems have received much attention. By abstracting communication protocols as types, these type systems can statically check that channels in a program interact following a certain protocol—whether messages are exchanged in the intended order.
In this article, we put on our coalgebraic spectacles to investigate session types, a widely studied class of behavioural type systems. We provide a syntax-free description of session-based concurrency as states of coalgebras. As a result, we rediscover type equivalence, duality, and subtyping relations in terms of canonical coinductive presentations. In turn, this coinductive presentation enables us to derive a decidable type system with subtyping for the π-calculus, in which the states of a coalgebra will serve as channel protocols. Going full circle, we exhibit a coalgebra structure on an existing session type system, and show that the relations and type system resulting from our coalgebraic perspective coincide with existing ones. We further apply to session coalgebras the coalgebraic approach to regular languages via the so-called rational fixed point, inspired by the trinity of automata, regular languages, and regular expressions with session coalgebras, rational fixed point, and session types, respectively. We establish a suitable restriction on session coalgebras that determines a similar trinity, and reveals the mismatch between usual session types and our syntax-free coalgebraic approach. Furthermore, we extend our coalgebraic approach to account for context-free session types, by equipping session coalgebras with a stack.
{"title":"Session Coalgebras: A Coalgebraic View on Regular and Context-free Session Types","authors":"Alex C. Keizer, Henning Basold, Jorge A. Pérez","doi":"https://dl.acm.org/doi/full/10.1145/3527633","DOIUrl":"https://doi.org/https://dl.acm.org/doi/full/10.1145/3527633","url":null,"abstract":"<p>Compositional methods are central to the verification of software systems. For concurrent and communicating systems, compositional techniques based on <i>behavioural type systems</i> have received much attention. By abstracting communication protocols as types, these type systems can statically check that channels in a program interact following a certain protocol—whether messages are exchanged in the intended order.</p><p>In this article, we put on our coalgebraic spectacles to investigate <i>session types</i>, a widely studied class of behavioural type systems. We provide a syntax-free description of session-based concurrency as states of coalgebras. As a result, we rediscover type equivalence, duality, and subtyping relations in terms of canonical coinductive presentations. In turn, this coinductive presentation enables us to derive a decidable type system with subtyping for the π-calculus, in which the states of a coalgebra will serve as channel protocols. Going full circle, we exhibit a coalgebra structure on an existing session type system, and show that the relations and type system resulting from our coalgebraic perspective coincide with existing ones. We further apply to session coalgebras the coalgebraic approach to regular languages via the so-called rational fixed point, inspired by the trinity of automata, regular languages, and regular expressions with session coalgebras, rational fixed point, and session types, respectively. We establish a suitable restriction on session coalgebras that determines a similar trinity, and reveals the mismatch between usual session types and our syntax-free coalgebraic approach. Furthermore, we extend our coalgebraic approach to account for <i>context-free</i> session types, by equipping session coalgebras with a stack.</p>","PeriodicalId":50939,"journal":{"name":"ACM Transactions on Programming Languages and Systems","volume":"28 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2022-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138531544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-15DOI: https://dl.acm.org/doi/full/10.1145/3486169
Maximilian P. L. Haslbeck, Peter Lammich
We present a framework to verify both, functional correctness and (amortized) worst-case complexity of practically efficient algorithms. We implemented a stepwise refinement approach, using the novel concept of resource currencies to naturally structure the resource analysis along the refinement chain, and allow a fine-grained analysis of operation counts. Our framework targets the LLVM intermediate representation. We extend its semantics from earlier work with a cost model. As case studies, we verify the amortized constant time push operation on dynamic arrays and the O(nlog n) introsort algorithm, and refine them down to efficient LLVM implementations. Our sorting algorithm performs on par with the state-of-the-art implementation found in the GNU C++ Library, and provably satisfies the complexity required by the C++ standard.
{"title":"For a Few Dollars More: Verified Fine-Grained Algorithm Analysis Down to LLVM","authors":"Maximilian P. L. Haslbeck, Peter Lammich","doi":"https://dl.acm.org/doi/full/10.1145/3486169","DOIUrl":"https://doi.org/https://dl.acm.org/doi/full/10.1145/3486169","url":null,"abstract":"<p>We present a framework to verify both, functional correctness and (amortized) worst-case complexity of practically efficient algorithms. We implemented a stepwise refinement approach, using the novel concept of <i>resource currencies</i> to naturally structure the resource analysis along the refinement chain, and allow a fine-grained analysis of operation counts. Our framework targets the LLVM intermediate representation. We extend its semantics from earlier work with a cost model. As case studies, we verify the amortized constant time push operation on dynamic arrays and the <i>O</i>(<i>n</i>log <i>n</i>) introsort algorithm, and refine them down to efficient LLVM implementations. Our sorting algorithm performs on par with the state-of-the-art implementation found in the GNU C++ Library, and provably satisfies the complexity required by the C++ standard.</p>","PeriodicalId":50939,"journal":{"name":"ACM Transactions on Programming Languages and Systems","volume":"38 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2022-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138531567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-15DOI: https://dl.acm.org/doi/full/10.1145/3495529
Patrick Baillot, Alexis Ghyselen
Type systems as a technique to analyse or control programs have been extensively studied for functional programming languages. In particular, some systems allow one to extract from a typing derivation a complexity bound on the program. We explore how to extend such results to parallel complexity in the setting of pi-calculus, considered as a communication-based model for parallel computation. Two notions of time complexity are given: the total computation time without parallelism (the work) and the computation time under maximal parallelism (the span). We define operational semantics to capture those two notions and present two type systems from which one can extract a complexity bound on a process. The type systems are inspired both by sized types and by input/output types, with additional temporal information about communications.
{"title":"Types for Complexity of Parallel Computation in Pi-calculus","authors":"Patrick Baillot, Alexis Ghyselen","doi":"https://dl.acm.org/doi/full/10.1145/3495529","DOIUrl":"https://doi.org/https://dl.acm.org/doi/full/10.1145/3495529","url":null,"abstract":"<p>Type systems as a technique to analyse or control programs have been extensively studied for functional programming languages. In particular, some systems allow one to extract from a typing derivation a complexity bound on the program. We explore how to extend such results to parallel complexity in the setting of pi-calculus, considered as a communication-based model for parallel computation. Two notions of time complexity are given: the total computation time without parallelism (the work) and the computation time under maximal parallelism (the span). We define operational semantics to capture those two notions and present two type systems from which one can extract a complexity bound on a process. The type systems are inspired both by sized types and by input/output types, with additional temporal information about communications.</p>","PeriodicalId":50939,"journal":{"name":"ACM Transactions on Programming Languages and Systems","volume":"39 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2022-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138531570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-15DOI: https://dl.acm.org/doi/full/10.1145/3498847
Jens Pagel, Florian Zuleger
Most automated verifiers for separation logic are based on the symbolic-heap fragment, which disallows both the magic-wand operator and the application of classical Boolean operators to spatial formulas. This is not surprising, as support for the magic wand quickly leads to undecidability, especially when combined with inductive predicates for reasoning about data structures. To circumvent these undecidability results, we propose assigning a more restrictive semantics to the separating conjunction. We argue that the resulting logic, strong-separation logic, can be used for symbolic execution and abductive reasoning just like “standard” separation logic, while remaining decidable even in the presence of both the magic wand and inductive predicates (we consider a list-segment predicate and a tree predicate)—a combination of features that leads to undecidability for the standard semantics.
{"title":"Strong-separation Logic","authors":"Jens Pagel, Florian Zuleger","doi":"https://dl.acm.org/doi/full/10.1145/3498847","DOIUrl":"https://doi.org/https://dl.acm.org/doi/full/10.1145/3498847","url":null,"abstract":"<p>Most automated verifiers for separation logic are based on the symbolic-heap fragment, which disallows both the magic-wand operator and the application of classical Boolean operators to spatial formulas. This is not surprising, as support for the magic wand quickly leads to undecidability, especially when combined with inductive predicates for reasoning about data structures. To circumvent these undecidability results, we propose assigning a more restrictive semantics to the separating conjunction. We argue that the resulting logic, strong-separation logic, can be used for symbolic execution and abductive reasoning just like “standard” separation logic, while remaining decidable even in the presence of both the magic wand and inductive predicates (we consider a list-segment predicate and a tree predicate)—a combination of features that leads to undecidability for the standard semantics.</p>","PeriodicalId":50939,"journal":{"name":"ACM Transactions on Programming Languages and Systems","volume":"16 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2022-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138531568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Determining upper bounds on the time complexity of a program is a fundamental problem with a variety of applications, such as performance debugging, resource certification, and compile-time optimizations. Automated techniques for cost analysis excel at bounding the resource complexity of programs that use integer values and linear arithmetic. Unfortunately, they fall short when the complexity depends more intricately on the evolution of data during execution. In such cases, state-of-the-art analyzers have shown to produce loose bounds, or even no bound at all.
We propose a novel technique that generalizes the common notion of recurrence relations based on ranking functions. Existing methods usually unfold one loop iteration and examine the resulting arithmetic relations between variables. These relations assist in establishing a recurrence that bounds the number of loop iterations. We propose a different approach, where we derive recurrences by comparing whole traces with whole traces of a lower rank, avoiding the need to analyze the complexity of intermediate states. We offer a set of global properties, defined with respect to whole traces, that facilitate such a comparison and show that these properties can be checked efficiently using a handful of local conditions. To this end, we adapt state squeezers, an induction mechanism previously used for verifying safety properties. We demonstrate that this technique encompasses the reasoning power of bounded unfolding, and more. We present some seemingly innocuous, yet intricate, examples that previous tools based on cost relations and control flow analysis fail to solve, and that our squeezer-powered approach succeeds.
{"title":"Runtime Complexity Bounds Using Squeezers","authors":"Oren Ish-Shalom, Shachar Itzhaky, Noam Rinetzky, Sharon Shoham","doi":"https://dl.acm.org/doi/full/10.1145/3527632","DOIUrl":"https://doi.org/https://dl.acm.org/doi/full/10.1145/3527632","url":null,"abstract":"<p>Determining upper bounds on the time complexity of a program is a fundamental problem with a variety of applications, such as performance debugging, resource certification, and compile-time optimizations. Automated techniques for cost analysis excel at bounding the resource complexity of programs that use integer values and linear arithmetic. Unfortunately, they fall short when the complexity depends more intricately on the evolution of data during execution. In such cases, state-of-the-art analyzers have shown to produce loose bounds, or even no bound at all.</p><p>We propose a novel technique that generalizes the common notion of recurrence relations based on ranking functions. Existing methods usually unfold one loop iteration and examine the resulting arithmetic relations between variables. These relations assist in establishing a recurrence that bounds the number of loop iterations. We propose a different approach, where we derive recurrences by comparing <i>whole traces</i> with <i>whole traces</i> of a lower rank, avoiding the need to analyze the complexity of intermediate states. We offer a set of global properties, defined with respect to whole traces, that facilitate such a comparison and show that these properties can be checked efficiently using a handful of local conditions. To this end, we adapt <i>state squeezers</i>, an induction mechanism previously used for verifying safety properties. We demonstrate that this technique encompasses the reasoning power of bounded unfolding, and more. We present some seemingly innocuous, yet intricate, examples that previous tools based on <i>cost relations</i> and control flow analysis fail to solve, and that our squeezer-powered approach succeeds.</p>","PeriodicalId":50939,"journal":{"name":"ACM Transactions on Programming Languages and Systems","volume":"179 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2022-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138531547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-27DOI: https://dl.acm.org/doi/full/10.1145/3492428
Yuanbo Li, Qirun Zhang, Thomas Reps
Many program-analysis problems can be formulated as graph-reachability problems. Interleaved Dyck language reachability (InterDyck-reachability) is a fundamental framework to express a wide variety of program-analysis problems over edge-labeled graphs. The InterDyck language represents an intersection of multiple matched-parenthesis languages (i.e., Dyck languages). In practice, program analyses typically leverage one Dyck language to achieve context-sensitivity, and other Dyck languages to model data dependencies, such as field-sensitivity and pointer references/dereferences. In the ideal case, an InterDyck-reachability framework should model multiple Dyck languages simultaneously.
Unfortunately, precise InterDyck-reachability is undecidable. Any practical solution must over-approximate the exact answer. In the literature, a lot of work has been proposed to over-approximate the InterDyck-reachability formulation. This article offers a new perspective on improving both the precision and the scalability of InterDyck-reachability: we aim at simplifying the underlying input graph G. Our key insight is based on the observation that if an edge is not contributing to any InterDyck-paths, we can safely eliminate it from G. Our technique is orthogonal to the InterDyck-reachability formulation and can serve as a pre-processing step with any over-approximating approach for InterDyck-reachability. We have applied our graph simplification algorithm to pre-processing the graphs from a recent InterDyck-reachability-based taint analysis for Android. Our evaluation of three popular InterDyck-reachability algorithms yields promising results. In particular, our graph-simplification method improves both the scalability and precision of all three InterDyck-reachability algorithms, sometimes dramatically.
{"title":"Fast Graph Simplification for Interleaved-Dyck Reachability","authors":"Yuanbo Li, Qirun Zhang, Thomas Reps","doi":"https://dl.acm.org/doi/full/10.1145/3492428","DOIUrl":"https://doi.org/https://dl.acm.org/doi/full/10.1145/3492428","url":null,"abstract":"<p>Many program-analysis problems can be formulated as graph-reachability problems. Interleaved Dyck language reachability (<span>InterDyck</span>-reachability) is a fundamental framework to express a wide variety of program-analysis problems over edge-labeled graphs. The <span>InterDyck</span> language represents an intersection of multiple matched-parenthesis languages (i.e., Dyck languages). In practice, program analyses typically leverage one Dyck language to achieve context-sensitivity, and other Dyck languages to model data dependencies, such as field-sensitivity and pointer references/dereferences. In the ideal case, an <span>InterDyck</span>-reachability framework should model multiple Dyck languages <i>simultaneously</i>.</p><p>Unfortunately, precise <span>InterDyck</span>-reachability is undecidable. Any practical solution must over-approximate the exact answer. In the literature, a lot of work has been proposed to over-approximate the <span>InterDyck</span>-reachability formulation. This article offers a new perspective on improving both the precision and the scalability of <span>InterDyck</span>-reachability: we aim at simplifying the underlying input graph <i>G</i>. Our key insight is based on the observation that if an edge is not contributing to any <span>InterDyck</span>-paths, we can safely eliminate it from <i>G</i>. Our technique is orthogonal to the <span>InterDyck</span>-reachability formulation and can serve as a pre-processing step with any over-approximating approach for <span>InterDyck</span>-reachability. We have applied our graph simplification algorithm to pre-processing the graphs from a recent <span>InterDyck</span>-reachability-based taint analysis for Android. Our evaluation of three popular <span>InterDyck</span>-reachability algorithms yields promising results. In particular, our graph-simplification method improves both the scalability and precision of all three <span>InterDyck</span>-reachability algorithms, sometimes dramatically.</p>","PeriodicalId":50939,"journal":{"name":"ACM Transactions on Programming Languages and Systems","volume":"1 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2022-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138531592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-27DOI: https://dl.acm.org/doi/full/10.1145/3517034
Kensen Shi, David Bieber, Rishabh Singh
The success and popularity of deep learning is on the rise, partially due to powerful deep learning frameworks such as TensorFlow and PyTorch, which make it easier to develop deep learning models. However, these libraries also come with steep learning curves, since programming in these frameworks is quite different from traditional imperative programming with explicit loops and conditionals. In this work, we present a tool called TF-Coder for programming by example in TensorFlow. TF-Coder uses a bottom-up weighted enumerative search, with value-based pruning of equivalent expressions and flexible type- and value-based filtering to ensure that expressions adhere to various requirements imposed by the TensorFlow library. We train models to predict TensorFlow operations from features of the input and output tensors and natural language descriptions of tasks to prioritize relevant operations during search. TF-Coder solves 63 of 70 real-world tasks within 5 minutes, sometimes finding simpler solutions in less time compared to experienced human programmers.
{"title":"TF-Coder: Program Synthesis for Tensor Manipulations","authors":"Kensen Shi, David Bieber, Rishabh Singh","doi":"https://dl.acm.org/doi/full/10.1145/3517034","DOIUrl":"https://doi.org/https://dl.acm.org/doi/full/10.1145/3517034","url":null,"abstract":"<p>The success and popularity of deep learning is on the rise, partially due to powerful deep learning frameworks such as TensorFlow and PyTorch, which make it easier to develop deep learning models. However, these libraries also come with steep learning curves, since programming in these frameworks is quite different from traditional imperative programming with explicit loops and conditionals. In this work, we present a tool called TF-Coder for programming by example in TensorFlow. TF-Coder uses a bottom-up weighted enumerative search, with value-based pruning of equivalent expressions and flexible type- and value-based filtering to ensure that expressions adhere to various requirements imposed by the TensorFlow library. We train models to predict TensorFlow operations from features of the input and output tensors and natural language descriptions of tasks to prioritize relevant operations during search. TF-Coder solves 63 of 70 real-world tasks within 5 minutes, sometimes finding simpler solutions in less time compared to experienced human programmers.</p>","PeriodicalId":50939,"journal":{"name":"ACM Transactions on Programming Languages and Systems","volume":"2 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2022-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138531539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}