Marius Müller, Philipp Schuster, Jonathan Lindegaard Starup, Klaus Ostermann, Jonathan Immanuel Brachthäuser
Effect handlers are a high-level abstraction that enables programmers to use effects in a structured way. They have gained a lot of popularity within academia and subsequently also in industry. However, the abstraction often comes with a significant runtime cost and there has been intensive research recently on how to reduce this price. A promising approach in this regard is to implement effect handlers using a CPS translation and to provide sufficient information about the nesting of handlers. With this information the CPS translation can decide how effects have to be lifted through handlers, i.e., which handlers need to be skipped, in order to handle the effect at the correct place. A structured way to make this information available is to use a calculus with a region system and explicit subregion evidence. Such calculi, however, are quite verbose, which makes them impractical to use as a source-level language. We present a method to infer the lifting information for a calculus underlying a source-level language. This calculus uses second-class capabilities for the safe use of effects. To do so, we define a typed translation to a calculus with regions and evidence and we show that this lift-inference translation is typability- and semantics-preserving. On the one hand, this exposes the precise relation between the second-class property and the structure given by regions. On the other hand, it closes a gap in a compiler pipeline enabling efficient compilation of the source-level language. We have implemented lift inference in this compiler pipeline and conducted benchmarks which indicate that the approach is indeed working.
{"title":"From Capabilities to Regions: Enabling Efficient Compilation of Lexical Effect Handlers","authors":"Marius Müller, Philipp Schuster, Jonathan Lindegaard Starup, Klaus Ostermann, Jonathan Immanuel Brachthäuser","doi":"10.1145/3622831","DOIUrl":"https://doi.org/10.1145/3622831","url":null,"abstract":"Effect handlers are a high-level abstraction that enables programmers to use effects in a structured way. They have gained a lot of popularity within academia and subsequently also in industry. However, the abstraction often comes with a significant runtime cost and there has been intensive research recently on how to reduce this price. A promising approach in this regard is to implement effect handlers using a CPS translation and to provide sufficient information about the nesting of handlers. With this information the CPS translation can decide how effects have to be lifted through handlers, i.e., which handlers need to be skipped, in order to handle the effect at the correct place. A structured way to make this information available is to use a calculus with a region system and explicit subregion evidence. Such calculi, however, are quite verbose, which makes them impractical to use as a source-level language. We present a method to infer the lifting information for a calculus underlying a source-level language. This calculus uses second-class capabilities for the safe use of effects. To do so, we define a typed translation to a calculus with regions and evidence and we show that this lift-inference translation is typability- and semantics-preserving. On the one hand, this exposes the precise relation between the second-class property and the structure given by regions. On the other hand, it closes a gap in a compiler pipeline enabling efficient compilation of the source-level language. We have implemented lift inference in this compiler pipeline and conducted benchmarks which indicate that the approach is indeed working.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136112800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jin Zhou, Sam Silvestro, Steven (Jiaxun) Tang, Hanmei Yang, Hongyu Liu, Guangming Zeng, Bo Wu, Cong Liu, Tongping Liu
The memory allocator plays a key role in the performance of applications, but none of the existing profilers can pinpoint performance slowdowns caused by a memory allocator. Consequently, programmers may spend time improving application code incorrectly or unnecessarily, achieving low or no performance improvement. This paper designs the first profiler—MemPerf—to identify allocator-induced performance slowdowns without comparing against another allocator. Based on the key observation that an allocator may impact the whole life-cycle of heap objects, including the accesses (or uses) of these objects, MemPerf proposes a life-cycle based detection to identify slowdowns caused by slow memory management operations and slow accesses separately. For the prior one, MemPerf proposes a thread-aware and type-aware performance modeling to identify slow management operations. For slow memory accesses, MemPerf utilizes a top-down approach to identify all possible reasons for slow memory accesses introduced by the allocator, mainly due to cache and TLB misses, and further proposes a unified method to identify them correctly and efficiently. Based on our extensive evaluation, MemPerf reports 98% medium and large allocator-reduced slowdowns (larger than 5%) correctly without reporting any false positives. MemPerf also pinpoints multiple known and unknown design issues in widely-used allocators.
{"title":"MemPerf: Profiling Allocator-Induced Performance Slowdowns","authors":"Jin Zhou, Sam Silvestro, Steven (Jiaxun) Tang, Hanmei Yang, Hongyu Liu, Guangming Zeng, Bo Wu, Cong Liu, Tongping Liu","doi":"10.1145/3622848","DOIUrl":"https://doi.org/10.1145/3622848","url":null,"abstract":"The memory allocator plays a key role in the performance of applications, but none of the existing profilers can pinpoint performance slowdowns caused by a memory allocator. Consequently, programmers may spend time improving application code incorrectly or unnecessarily, achieving low or no performance improvement. This paper designs the first profiler—MemPerf—to identify allocator-induced performance slowdowns without comparing against another allocator. Based on the key observation that an allocator may impact the whole life-cycle of heap objects, including the accesses (or uses) of these objects, MemPerf proposes a life-cycle based detection to identify slowdowns caused by slow memory management operations and slow accesses separately. For the prior one, MemPerf proposes a thread-aware and type-aware performance modeling to identify slow management operations. For slow memory accesses, MemPerf utilizes a top-down approach to identify all possible reasons for slow memory accesses introduced by the allocator, mainly due to cache and TLB misses, and further proposes a unified method to identify them correctly and efficiently. Based on our extensive evaluation, MemPerf reports 98% medium and large allocator-reduced slowdowns (larger than 5%) correctly without reporting any false positives. MemPerf also pinpoints multiple known and unknown design issues in widely-used allocators.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136112806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a new general-purpose synthesis technique for generating programs from input-output examples. Our method, called metric program synthesis, relaxes the observational equivalence idea (used widely in bottom-up enumerative synthesis) into a weaker notion of observational similarity, with the goal of reducing the search space that the synthesizer needs to explore. Our method clusters programs into equivalence classes based on an expert-provided distance metric and constructs a version space that compactly represents “approximately correct” programs. Then, given a “close enough” program sampled from this version space, our approach uses a distance-guided repair algorithm to find a program that exactly matches the given input-output examples. We have implemented our proposed metric program synthesis technique in a tool called SyMetric and evaluate it in three different domains considered in prior work. Our evaluation shows that SyMetric outperforms other domain-agnostic synthesizers that use observational equivalence and that it achieves results competitive with domain-specific synthesizers that are either designed for or trained on those domains.
{"title":"Inductive Program Synthesis Guided by Observational Program Similarity","authors":"Jack Feser, Işıl Dillig, Armando Solar-Lezama","doi":"10.1145/3622830","DOIUrl":"https://doi.org/10.1145/3622830","url":null,"abstract":"We present a new general-purpose synthesis technique for generating programs from input-output examples. Our method, called metric program synthesis, relaxes the observational equivalence idea (used widely in bottom-up enumerative synthesis) into a weaker notion of observational similarity, with the goal of reducing the search space that the synthesizer needs to explore. Our method clusters programs into equivalence classes based on an expert-provided distance metric and constructs a version space that compactly represents “approximately correct” programs. Then, given a “close enough” program sampled from this version space, our approach uses a distance-guided repair algorithm to find a program that exactly matches the given input-output examples. We have implemented our proposed metric program synthesis technique in a tool called SyMetric and evaluate it in three different domains considered in prior work. Our evaluation shows that SyMetric outperforms other domain-agnostic synthesizers that use observational equivalence and that it achieves results competitive with domain-specific synthesizers that are either designed for or trained on those domains.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136112809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hervé Paulino, Ana Almeida Matos, Jan Cederquist, Marco Giunti, João Matos, António Ravara
Data-Centric Synchronization (DCS) shifts the reasoning about concurrency restrictions from control structures to data declaration. It is a high-level declarative approach that abstracts away from the actual concurrency control mechanism(s) in use. Despite its advantages, the practical use of DCS is hindered by the fact that it may require many annotations and/or multiple implementations of the same method to cope with differently qualified parameters. To overcome these limitations, in this paper we present AtomiS, a new DCS approach that requires only qualifying types of parameters and return values in interface definitions, and of fields in class definitions. The latter may also be abstracted away in type parameters, rendering class implementations virtually annotation-free. From this high level specification, a static analysis infers the atomicity constraints that are local to each method, considering valid only the method variants that are consistent with the specification, and performs code generation for all valid variants of each method. The generated code is then the target for automatic injection of concurrency control primitives that are responsible for ensuring the absence of data-races, atomicity-violations and deadlocks. We provide a Java implementation and showcase the applicability of AtomiS in real-life code. For the benchmarks analysed, AtomiS requires fewer annotations than the original number of regions requiring locks, as well as fewer annotations than Atomic Sets (a reference DCS proposal).
{"title":"AtomiS: Data-Centric Synchronization Made Practical","authors":"Hervé Paulino, Ana Almeida Matos, Jan Cederquist, Marco Giunti, João Matos, António Ravara","doi":"10.1145/3622801","DOIUrl":"https://doi.org/10.1145/3622801","url":null,"abstract":"Data-Centric Synchronization (DCS) shifts the reasoning about concurrency restrictions from control structures to data declaration. It is a high-level declarative approach that abstracts away from the actual concurrency control mechanism(s) in use. Despite its advantages, the practical use of DCS is hindered by the fact that it may require many annotations and/or multiple implementations of the same method to cope with differently qualified parameters. To overcome these limitations, in this paper we present AtomiS, a new DCS approach that requires only qualifying types of parameters and return values in interface definitions, and of fields in class definitions. The latter may also be abstracted away in type parameters, rendering class implementations virtually annotation-free. From this high level specification, a static analysis infers the atomicity constraints that are local to each method, considering valid only the method variants that are consistent with the specification, and performs code generation for all valid variants of each method. The generated code is then the target for automatic injection of concurrency control primitives that are responsible for ensuring the absence of data-races, atomicity-violations and deadlocks. We provide a Java implementation and showcase the applicability of AtomiS in real-life code. For the benchmarks analysed, AtomiS requires fewer annotations than the original number of regions requiring locks, as well as fewer annotations than Atomic Sets (a reference DCS proposal).","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136115034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Orr Tamir, Marcelo Taube, Kenneth L. McMillan, Sharon Shoham, Jon Howell, Guy Gueta, Mooly Sagiv
Formally verifying infinite-state systems can be a daunting task, especially when it comes to reasoning about quantifiers. In particular, quantifier alternations in conjunction with function symbols can create function cycles that result in infinitely many ground terms, making it difficult for solvers to instantiate quantifiers and causing them to diverge. This can leave users with no useful information on how to proceed. To address this issue, we propose an interactive verification methodology that uses a relational abstraction technique to mitigate solver divergence in the presence of quantifiers. This technique abstracts functions in the verification conditions (VCs) as one-to-one relations, which avoids the creation of function cycles and the resulting proliferation of ground terms. Relational abstraction is sound and guarantees correctness if the solver cannot find counter-models. However, it may also lead to false counterexamples, which can be addressed by refining the abstraction and requiring the existence of corresponding elements. In the domain of distributed protocols, we can refine the abstraction by diagnosing counterexamples and manually instantiating elements in the range of the original function. If the verification conditions are correct, there always exist finitely many refinement steps that eliminate all spurious counter-models, making the approach complete. We applied this approach in Ivy to verify the safety properties of consensus protocols and found that: (1) most verification goals can be automatically verified using relational abstraction, while SMT solvers often diverge when given the original VC, (2) only a few manual instantiations were needed, and the counterexamples provided valuable guidance for the user compared to timeouts produced by the traditional approach, and (3) the technique can be used to derive efficient low-level implementations of tricky algorithms.
{"title":"Counterexample Driven Quantifier Instantiations with Applications to Distributed Protocols","authors":"Orr Tamir, Marcelo Taube, Kenneth L. McMillan, Sharon Shoham, Jon Howell, Guy Gueta, Mooly Sagiv","doi":"10.1145/3622864","DOIUrl":"https://doi.org/10.1145/3622864","url":null,"abstract":"Formally verifying infinite-state systems can be a daunting task, especially when it comes to reasoning about quantifiers. In particular, quantifier alternations in conjunction with function symbols can create function cycles that result in infinitely many ground terms, making it difficult for solvers to instantiate quantifiers and causing them to diverge. This can leave users with no useful information on how to proceed. To address this issue, we propose an interactive verification methodology that uses a relational abstraction technique to mitigate solver divergence in the presence of quantifiers. This technique abstracts functions in the verification conditions (VCs) as one-to-one relations, which avoids the creation of function cycles and the resulting proliferation of ground terms. Relational abstraction is sound and guarantees correctness if the solver cannot find counter-models. However, it may also lead to false counterexamples, which can be addressed by refining the abstraction and requiring the existence of corresponding elements. In the domain of distributed protocols, we can refine the abstraction by diagnosing counterexamples and manually instantiating elements in the range of the original function. If the verification conditions are correct, there always exist finitely many refinement steps that eliminate all spurious counter-models, making the approach complete. We applied this approach in Ivy to verify the safety properties of consensus protocols and found that: (1) most verification goals can be automatically verified using relational abstraction, while SMT solvers often diverge when given the original VC, (2) only a few manual instantiations were needed, and the counterexamples provided valuable guidance for the user compared to timeouts produced by the traditional approach, and (3) the technique can be used to derive efficient low-level implementations of tricky algorithms.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136115049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Structural subtyping and parametric polymorphism provide similar flexibility and reusability to programmers. For example, both features enable the programmer to provide a wider record as an argument to a function that expects a narrower one. However, the means by which they do so differs substantially, and the precise details of the relationship between them exists, at best, as folklore in literature. In this paper, we systematically study the relative expressive power of structural subtyping and parametric polymorphism. We focus our investigation on establishing the extent to which parametric polymorphism, in the form of row and presence polymorphism, can encode structural subtyping for variant and record types. We base our study on various Church-style $lambda$-calculi extended with records and variants, different forms of structural subtyping, and row and presence polymorphism. We characterise expressiveness by exhibiting compositional translations between calculi. For each translation we prove a type preservation and operational correspondence result. We also prove a number of non-existence results. By imposing restrictions on both source and target types, we reveal further subtleties in the expressiveness landscape, the restrictions enabling otherwise impossible translations to be defined. More specifically, we prove that full subtyping cannot be encoded via polymorphism, but we show that several restricted forms of subtyping can be encoded via particular forms of polymorphism.
{"title":"Structural Subtyping as Parametric Polymorphism","authors":"Tang, Wenhao, Hillerström, Daniel, McKinna, James, Steuwer, Michel, Dardha, Ornela, Fu, Rongxiao, Lindley, Sam","doi":"10.1145/3622836","DOIUrl":"https://doi.org/10.1145/3622836","url":null,"abstract":"Structural subtyping and parametric polymorphism provide similar flexibility and reusability to programmers. For example, both features enable the programmer to provide a wider record as an argument to a function that expects a narrower one. However, the means by which they do so differs substantially, and the precise details of the relationship between them exists, at best, as folklore in literature. In this paper, we systematically study the relative expressive power of structural subtyping and parametric polymorphism. We focus our investigation on establishing the extent to which parametric polymorphism, in the form of row and presence polymorphism, can encode structural subtyping for variant and record types. We base our study on various Church-style $lambda$-calculi extended with records and variants, different forms of structural subtyping, and row and presence polymorphism. We characterise expressiveness by exhibiting compositional translations between calculi. For each translation we prove a type preservation and operational correspondence result. We also prove a number of non-existence results. By imposing restrictions on both source and target types, we reveal further subtleties in the expressiveness landscape, the restrictions enabling otherwise impossible translations to be defined. More specifically, we prove that full subtyping cannot be encoded via polymorphism, but we show that several restricted forms of subtyping can be encoded via particular forms of polymorphism.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136077527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fangke Ye, Jisheng Zhao, Jun Shirako, Vivek Sarkar
Despite the widespread popularity of dynamically typed languages such as Python, it is well known that they pose significant challenges to code optimization due to the lack of concrete type information. To overcome this limitation, many ahead-of-time optimizing compiler approaches for Python rely on programmers to provide optional type information as a prerequisite for extensive code optimization. Since few programmers provide this information, a large majority of Python applications are executed without the benefit of code optimization, thereby contributing collectively to a significant worldwide wastage of compute and energy resources. In this paper, we introduce a new approach to concrete type inference that is shown to be effective in enabling code optimization for dynamically typed languages, without requiring the programmer to provide any type information. We explore three kinds of type inference algorithms in our approach based on: 1) machine learning models including GPT-4, 2) constraint-based inference based on SMT solving, and 3) a combination of 1) and 2). Our approach then uses the output from type inference to generate multi-version code for a bounded number of concrete type options, while also including a catch-all untyped version for the case when no match is found. The typed versions are then amenable to code optimization. Experimental results show that the combined algorithm in 3) delivers far superior precision and performance than the separate algorithms for 1) and 2). The performance improvement due to type inference, in terms of geometric mean speedup across all benchmarks compared to standard Python, when using 3) is 26.4× with Numba as an AOT optimizing back-end and 62.2× with the Intrepydd optimizing compiler as a back-end. These vast performance improvements can have a significant impact on programmers’ productivity, while also reducing their applications’ use of compute and energy resources.
{"title":"Concrete Type Inference for Code Optimization using Machine Learning with SMT Solving","authors":"Fangke Ye, Jisheng Zhao, Jun Shirako, Vivek Sarkar","doi":"10.1145/3622825","DOIUrl":"https://doi.org/10.1145/3622825","url":null,"abstract":"Despite the widespread popularity of dynamically typed languages such as Python, it is well known that they pose significant challenges to code optimization due to the lack of concrete type information. To overcome this limitation, many ahead-of-time optimizing compiler approaches for Python rely on programmers to provide optional type information as a prerequisite for extensive code optimization. Since few programmers provide this information, a large majority of Python applications are executed without the benefit of code optimization, thereby contributing collectively to a significant worldwide wastage of compute and energy resources. In this paper, we introduce a new approach to concrete type inference that is shown to be effective in enabling code optimization for dynamically typed languages, without requiring the programmer to provide any type information. We explore three kinds of type inference algorithms in our approach based on: 1) machine learning models including GPT-4, 2) constraint-based inference based on SMT solving, and 3) a combination of 1) and 2). Our approach then uses the output from type inference to generate multi-version code for a bounded number of concrete type options, while also including a catch-all untyped version for the case when no match is found. The typed versions are then amenable to code optimization. Experimental results show that the combined algorithm in 3) delivers far superior precision and performance than the separate algorithms for 1) and 2). The performance improvement due to type inference, in terms of geometric mean speedup across all benchmarks compared to standard Python, when using 3) is 26.4× with Numba as an AOT optimizing back-end and 62.2× with the Intrepydd optimizing compiler as a back-end. These vast performance improvements can have a significant impact on programmers’ productivity, while also reducing their applications’ use of compute and energy resources.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136114713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a design and implementation of the automated "Extract Method" refactoring for Rust programs. Even though Extract Method is one of the most well-studied and widely used in practice automated refactorings, featured in all major IDEs for all popular programming languages, implementing it soundly for Rust is surprisingly non-trivial due to the restrictions of the Rust's ownership and lifetime-based type system. In this work, we provide a systematic decomposition of the Extract Method refactoring for Rust programs into a series of program transformations, each concerned with satisfying a particular aspect of Rust type safety, eventually producing a well-typed Rust program. Our key discovery is the formulation of Extract Method as a composition of naive function hoisting and a series of automated program repair procedures that progressively make the resulting program "more well-typed" by relying on the corresponding repair oracles. Those oracles include a novel static intra-procedural ownership analysis that infers correct sharing annotations for the extracted function's parameters, and the lifetime checker of rustc, Rust's reference compiler. We implemented our approach in a tool called REM---an automated Extract Method refactoring built on top of IntelliJ IDEA plugin for Rust. Our extensive evaluation on a corpus of changes in five popular Rust projects shows that REM (a) can extract a larger class of feature-rich code fragments into semantically correct functions than other existing refactoring tools, (b) can reproduce method extractions performed manually by human developers in the past, and (c) is efficient enough to be used in interactive development.
我们提出了一个Rust程序自动“提取方法”重构的设计和实现。尽管Extract Method是在自动化重构实践中被研究得最充分、应用最广泛的方法之一,在所有流行编程语言的主要ide中都有它的特点,但由于Rust的所有权和基于生命周期的类型系统的限制,在Rust中实现它是非常重要的。在这项工作中,我们将Rust程序的提取方法重构系统地分解为一系列程序转换,每个转换都涉及满足Rust类型安全的特定方面,最终生成类型良好的Rust程序。我们的关键发现是Extract Method的公式,它是由原始函数提升和一系列自动程序修复程序组成的,这些程序通过依赖相应的修复预言器,逐步使生成的程序“类型更佳”。这些oracle包括一种新的静态过程内所有权分析,它可以为提取的函数参数推断正确的共享注释,以及Rust的参考编译器rustc的生命周期检查器。我们在一个叫做REM的工具中实现了我们的方法——一个基于IntelliJ IDEA Rust插件的自动提取方法重构工具。我们对五个流行的Rust项目的变更语料库进行了广泛的评估,结果表明REM (a)可以比其他现有的重构工具将更大的一类功能丰富的代码片段提取为语义正确的函数,(b)可以重现过去由人类开发人员手动执行的方法提取,(c)足够高效,可用于交互式开发。
{"title":"Adventure of a Lifetime: Extract Method Refactoring for Rust","authors":"Sewen Thy, Andreea Costea, Kiran Gopinathan, Ilya Sergey","doi":"10.1145/3622821","DOIUrl":"https://doi.org/10.1145/3622821","url":null,"abstract":"We present a design and implementation of the automated \"Extract Method\" refactoring for Rust programs. Even though Extract Method is one of the most well-studied and widely used in practice automated refactorings, featured in all major IDEs for all popular programming languages, implementing it soundly for Rust is surprisingly non-trivial due to the restrictions of the Rust's ownership and lifetime-based type system. In this work, we provide a systematic decomposition of the Extract Method refactoring for Rust programs into a series of program transformations, each concerned with satisfying a particular aspect of Rust type safety, eventually producing a well-typed Rust program. Our key discovery is the formulation of Extract Method as a composition of naive function hoisting and a series of automated program repair procedures that progressively make the resulting program \"more well-typed\" by relying on the corresponding repair oracles. Those oracles include a novel static intra-procedural ownership analysis that infers correct sharing annotations for the extracted function's parameters, and the lifetime checker of rustc, Rust's reference compiler. We implemented our approach in a tool called REM---an automated Extract Method refactoring built on top of IntelliJ IDEA plugin for Rust. Our extensive evaluation on a corpus of changes in five popular Rust projects shows that REM (a) can extract a larger class of feature-rich code fragments into semantically correct functions than other existing refactoring tools, (b) can reproduce method extractions performed manually by human developers in the past, and (c) is efficient enough to be used in interactive development.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136115398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a gradually typed language, GrEff, with effects and handlers that supports migration from unchecked to checked effect typing. This serves as a simple model of the integration of an effect typing discipline with an existing effectful typed language that does not track fine-grained effect information. Our language supports a simple module system to model the programming model of gradual migration from unchecked to checked effect typing in the style of Typed Racket. The surface language GrEff is given semantics by elaboration to a core language Core GrEff. We equip Core GrEff with an inequational theory for reasoning about the semantic error ordering and desired program equivalences for programming with effects and handlers. We derive an operational semantics for the language from the equations provable in the theory. We then show that the theory is sound by constructing an operational logical relations model to prove the graduality theorem. This extends prior work on embedding-projection pair models of gradual typing to handle effect typing and subtyping.
{"title":"Gradual Typing for Effect Handlers","authors":"Max S. New, Eric Giovannini, Daniel R. Licata","doi":"10.1145/3622860","DOIUrl":"https://doi.org/10.1145/3622860","url":null,"abstract":"We present a gradually typed language, GrEff, with effects and handlers that supports migration from unchecked to checked effect typing. This serves as a simple model of the integration of an effect typing discipline with an existing effectful typed language that does not track fine-grained effect information. Our language supports a simple module system to model the programming model of gradual migration from unchecked to checked effect typing in the style of Typed Racket. The surface language GrEff is given semantics by elaboration to a core language Core GrEff. We equip Core GrEff with an inequational theory for reasoning about the semantic error ordering and desired program equivalences for programming with effects and handlers. We derive an operational semantics for the language from the equations provable in the theory. We then show that the theory is sound by constructing an operational logical relations model to prove the graduality theorem. This extends prior work on embedding-projection pair models of gradual typing to handle effect typing and subtyping.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"280 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136115990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper considers the callback reachability problem --- determining if a callback can be called by an event-driven framework in an unexpected state. Event-driven programming frameworks are pervasive for creating user-interactive applications (apps) on just about every modern platform. Control flow between callbacks is determined by the framework and largely opaque to the programmer. This opacity of the callback control flow not only causes difficulty for the programmer but is also difficult for those developing static analysis. Previous static analysis techniques address this opacity either by assuming an arbitrary framework implementation or attempting to eagerly specify all possible callback control flow, but this is either too coarse to prove properties requiring callback-ordering constraints or too burdensome and tricky to get right. Instead, we present a middle way where the callback control flow can be gradually refined in a targeted manner to prove assertions of interest. The key insight to get this middle way is by reasoning about the history of method invocations at the boundary between app and framework code --- enabling a decoupling of the specification of callback control flow from the analysis of app code. We call the sequence of such boundary-method invocations message histories and develop message-history logics to do this reasoning. In particular, we define the notion of an application-only transition system with boundary transitions, a message-history program logic for programs with such transitions, and a temporal specification logic for capturing callback control flow in a targeted and compositional manner. Then to utilize the logics in a goal-directed verifier, we define a way to combine after-the-fact an assertion about message histories with a specification of callback control flow. We implemented a prototype message history-based verifier called Historia and provide evidence that our approach is uniquely capable of distinguishing between buggy and fixed versions on challenging examples drawn from real-world issues and that our targeted specification approach enables proving the absence of multi-callback bug patterns in real-world open-source Android apps.
{"title":"Historia: Refuting Callback Reachability with Message-History Logics","authors":"Meier, Shawn, Mover, Sergio, Kaki, Gowtham, Chang, Bor-Yuh Evan","doi":"10.1145/3622865","DOIUrl":"https://doi.org/10.1145/3622865","url":null,"abstract":"This paper considers the callback reachability problem --- determining if a callback can be called by an event-driven framework in an unexpected state. Event-driven programming frameworks are pervasive for creating user-interactive applications (apps) on just about every modern platform. Control flow between callbacks is determined by the framework and largely opaque to the programmer. This opacity of the callback control flow not only causes difficulty for the programmer but is also difficult for those developing static analysis. Previous static analysis techniques address this opacity either by assuming an arbitrary framework implementation or attempting to eagerly specify all possible callback control flow, but this is either too coarse to prove properties requiring callback-ordering constraints or too burdensome and tricky to get right. Instead, we present a middle way where the callback control flow can be gradually refined in a targeted manner to prove assertions of interest. The key insight to get this middle way is by reasoning about the history of method invocations at the boundary between app and framework code --- enabling a decoupling of the specification of callback control flow from the analysis of app code. We call the sequence of such boundary-method invocations message histories and develop message-history logics to do this reasoning. In particular, we define the notion of an application-only transition system with boundary transitions, a message-history program logic for programs with such transitions, and a temporal specification logic for capturing callback control flow in a targeted and compositional manner. Then to utilize the logics in a goal-directed verifier, we define a way to combine after-the-fact an assertion about message histories with a specification of callback control flow. We implemented a prototype message history-based verifier called Historia and provide evidence that our approach is uniquely capable of distinguishing between buggy and fixed versions on challenging examples drawn from real-world issues and that our targeted specification approach enables proving the absence of multi-callback bug patterns in real-world open-source Android apps.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136077382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}