Stream programming languages employ FIFO (first-in, first-out) semantics to model data channels between producers and consumers. A FIFO data channel stores tokens in a buffer that is accessed indirectly via read- and write-pointers. This indirect token-access decouples a producer’s write-operations from the read-operations of the consumer, thereby making dataflow implicit. For a compiler, indirect token-access obscures data-dependencies, which renders standard optimizations ineffective and impacts stream program performance negatively. In this paper we propose a transformation for structured stream programming languages such as StreamIt that shifts FIFO buffer management from run-time to compile-time and eliminates splitters and joiners, whose task is to distribute and merge streams. To show the effectiveness of our lowering transformation, we have implemented a StreamIt to C compilation framework. We have developed our own intermediate representation (IR) called LaminarIR, which facilitates the transformation. We report on the enabling effect of the LaminarIR on LLVM’s optimizations, which required the conversion of several standard StreamIt benchmarks from static to randomized input, to prevent computation of partial results at compile-time. We conducted our experimental evaluation on the Intel i7-2600K, AMD Opteron 6378, Intel Xeon Phi 3120A and ARM Cortex-A15 platforms. Our LaminarIR reduces data-communication on average by 35.9% and achieves platform-specific speedups between 3.73x and 4.98x over StreamIt. We reduce memory accesses by more than 60% and achieve energy savings of up to 93.6% on the Intel i7-2600K.
{"title":"LaminarIR: compile-time queues for structured streams","authors":"Yousun Ko, Bernd Burgstaller, Bernhard Scholz","doi":"10.1145/2737924.2737994","DOIUrl":"https://doi.org/10.1145/2737924.2737994","url":null,"abstract":"Stream programming languages employ FIFO (first-in, first-out) semantics to model data channels between producers and consumers. A FIFO data channel stores tokens in a buffer that is accessed indirectly via read- and write-pointers. This indirect token-access decouples a producer’s write-operations from the read-operations of the consumer, thereby making dataflow implicit. For a compiler, indirect token-access obscures data-dependencies, which renders standard optimizations ineffective and impacts stream program performance negatively. In this paper we propose a transformation for structured stream programming languages such as StreamIt that shifts FIFO buffer management from run-time to compile-time and eliminates splitters and joiners, whose task is to distribute and merge streams. To show the effectiveness of our lowering transformation, we have implemented a StreamIt to C compilation framework. We have developed our own intermediate representation (IR) called LaminarIR, which facilitates the transformation. We report on the enabling effect of the LaminarIR on LLVM’s optimizations, which required the conversion of several standard StreamIt benchmarks from static to randomized input, to prevent computation of partial results at compile-time. We conducted our experimental evaluation on the Intel i7-2600K, AMD Opteron 6378, Intel Xeon Phi 3120A and ARM Cortex-A15 platforms. Our LaminarIR reduces data-communication on average by 35.9% and achieves platform-specific speedups between 3.73x and 4.98x over StreamIt. We reduce memory accesses by more than 60% and achieve energy savings of up to 93.6% on the Intel i7-2600K.","PeriodicalId":104101,"journal":{"name":"Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126364664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Panchekha, Alex Sanchez-Stern, James R. Wilcox, Zachary Tatlock
Scientific and engineering applications depend on floating point arithmetic to approximate real arithmetic. This approximation introduces rounding error, which can accumulate to produce unacceptable results. While the numerical methods literature provides techniques to mitigate rounding error, applying these techniques requires manually rearranging expressions and understanding the finer details of floating point arithmetic. We introduce Herbie, a tool which automatically discovers the rewrites experts perform to improve accuracy. Herbie's heuristic search estimates and localizes rounding error using sampled points (rather than static error analysis), applies a database of rules to generate improvements, takes series expansions, and combines improvements for different input regions. We evaluated Herbie on examples from a classic numerical methods textbook, and found that Herbie was able to improve accuracy on each example, some by up to 60 bits, while imposing a median performance overhead of 40%. Colleagues in machine learning have used Herbie to significantly improve the results of a clustering algorithm, and a mathematical library has accepted two patches generated using Herbie.
{"title":"Automatically improving accuracy for floating point expressions","authors":"P. Panchekha, Alex Sanchez-Stern, James R. Wilcox, Zachary Tatlock","doi":"10.1145/2737924.2737959","DOIUrl":"https://doi.org/10.1145/2737924.2737959","url":null,"abstract":"Scientific and engineering applications depend on floating point arithmetic to approximate real arithmetic. This approximation introduces rounding error, which can accumulate to produce unacceptable results. While the numerical methods literature provides techniques to mitigate rounding error, applying these techniques requires manually rearranging expressions and understanding the finer details of floating point arithmetic. We introduce Herbie, a tool which automatically discovers the rewrites experts perform to improve accuracy. Herbie's heuristic search estimates and localizes rounding error using sampled points (rather than static error analysis), applies a database of rules to generate improvements, takes series expansions, and combines improvements for different input regions. We evaluated Herbie on examples from a classic numerical methods textbook, and found that Herbie was able to improve accuracy on each example, some by up to 60 bits, while imposing a median performance overhead of 40%. Colleagues in machine learning have used Herbie to significantly improve the results of a clustering algorithm, and a mathematical library has accepted two patches generated using Herbie.","PeriodicalId":104101,"journal":{"name":"Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129035605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
W. Ding, Xulong Tang, M. Kandemir, Yuanrui Zhang, Emre Kultursay
In a network-on-chip (NoC) based manycore architecture, an off-chip data access (main memory access) needs to travel through the on-chip network, spending considerable amount of time within the chip (in addition to the memory access latency). In addition, it contends with on-chip (cache) accesses as both use the same NoC resources. In this paper, focusing on data-parallel, multithreaded applications, we propose a compiler-based off-chip data access localization strategy, which places data elements in the memory space such that an off-chip access traverses a minimum number of links (hops) to reach the memory controller that handles this access. This brings three main benefits. First, the network latency of off-chip accesses gets reduced; second, the network latency of on-chip accesses gets reduced; and finally, the memory latency of off-chip accesses improves, due to reduced queue latencies. We present an experimental evaluation of our optimization strategy using a set of 13 multithreaded application programs under both private and shared last-level caches. The results collected emphasize the importance of optimizing the off-chip data accesses.
{"title":"Optimizing off-chip accesses in multicores","authors":"W. Ding, Xulong Tang, M. Kandemir, Yuanrui Zhang, Emre Kultursay","doi":"10.1145/2737924.2737989","DOIUrl":"https://doi.org/10.1145/2737924.2737989","url":null,"abstract":"In a network-on-chip (NoC) based manycore architecture, an off-chip data access (main memory access) needs to travel through the on-chip network, spending considerable amount of time within the chip (in addition to the memory access latency). In addition, it contends with on-chip (cache) accesses as both use the same NoC resources. In this paper, focusing on data-parallel, multithreaded applications, we propose a compiler-based off-chip data access localization strategy, which places data elements in the memory space such that an off-chip access traverses a minimum number of links (hops) to reach the memory controller that handles this access. This brings three main benefits. First, the network latency of off-chip accesses gets reduced; second, the network latency of on-chip accesses gets reduced; and finally, the memory latency of off-chip accesses improves, due to reduced queue latencies. We present an experimental evaluation of our optimization strategy using a set of 13 multithreaded application programs under both private and shared last-level caches. The results collected emphasize the importance of optimizing the off-chip data accesses.","PeriodicalId":104101,"journal":{"name":"Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133689694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yusheng Weijiang, S. Balakrishna, Jianqiao Liu, Milind Kulkarni
We develop a new framework for analyzing recursive methods that perform traversals over trees, called tree dependence analysis. This analysis translates dependence analysis techniques for regular programs to the irregular space, identifying the structure of dependences within a recursive method that traverses trees. We develop a dependence test that exploits the dependence structure of such programs, and can prove that several locality- and parallelism- enhancing transformations are legal. In addition, we extend our analysis with a novel path-dependent, conditional analysis to refine the dependence test and prove the legality of transformations for a wider range of algorithms. We then use these analyses to show that several common algorithms that manipulate trees recursively are amenable to several locality- and parallelism-enhancing transformations. This work shows that classical dependence analysis techniques, which have largely been confined to nested loops over array data structures, can be extended and translated to work for complex, recursive programs that operate over pointer-based data structures.
{"title":"Tree dependence analysis","authors":"Yusheng Weijiang, S. Balakrishna, Jianqiao Liu, Milind Kulkarni","doi":"10.1145/2737924.2737972","DOIUrl":"https://doi.org/10.1145/2737924.2737972","url":null,"abstract":"We develop a new framework for analyzing recursive methods that perform traversals over trees, called tree dependence analysis. This analysis translates dependence analysis techniques for regular programs to the irregular space, identifying the structure of dependences within a recursive method that traverses trees. We develop a dependence test that exploits the dependence structure of such programs, and can prove that several locality- and parallelism- enhancing transformations are legal. In addition, we extend our analysis with a novel path-dependent, conditional analysis to refine the dependence test and prove the legality of transformations for a wider range of algorithms. We then use these analyses to show that several common algorithms that manipulate trees recursively are amenable to several locality- and parallelism-enhancing transformations. This work shows that classical dependence analysis techniques, which have largely been confined to nested loops over array data structures, can be extended and translated to work for complex, recursive programs that operate over pointer-based data structures.","PeriodicalId":104101,"journal":{"name":"Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"604 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130428841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The power of linear and affine logic lies in their ability to model state change. However, in a trustless, peer-to-peer setting, it is difficult to force principals to commit to state changes. We show how to solve the peer-to-peer affine commitment problem using a generalization of Bitcoin in which transactions deal in types rather than numbers. This has applications to proof-carrying authorization and mechanically executable contracts. Importantly, our system can be---and is---implemented on top of the existing Bitcoin network, so there is no need to recruit computing power to a new protocol.
{"title":"Peer-to-peer affine commitment using bitcoin","authors":"Karl Crary, Michael J. Sullivan","doi":"10.1145/2737924.2737997","DOIUrl":"https://doi.org/10.1145/2737924.2737997","url":null,"abstract":"The power of linear and affine logic lies in their ability to model state change. However, in a trustless, peer-to-peer setting, it is difficult to force principals to commit to state changes. We show how to solve the peer-to-peer affine commitment problem using a generalization of Bitcoin in which transactions deal in types rather than numbers. This has applications to proof-carrying authorization and mechanically executable contracts. Importantly, our system can be---and is---implemented on top of the existing Bitcoin network, so there is no need to recruit computing power to a new protocol.","PeriodicalId":104101,"journal":{"name":"Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130771116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Existing algorithmic debuggers for Haskell require a transformation of all modules in a program, even libraries that the user does not want to debug and which may use language features not supported by the debugger. This is a pity, because a promising approach to debugging is therefore not applicable to many real-world programs. We use the cost centre stack from the Glasgow Haskell Compiler profiling environment together with runtime value observations as provided by the Haskell Object Observation Debugger (HOOD) to collect enough information for algorithmic debugging. Program annotations are in suspected modules only. With this technique algorithmic debugging is applicable to a much larger set of Haskell programs. This demonstrates that for functional languages in general a simple stack trace extension is useful to support tasks such as profiling and debugging.
{"title":"Algorithmic debugging of real-world haskell programs: deriving dependencies from the cost centre stack","authors":"M. Faddegon, O. Chitil","doi":"10.1145/2737924.2737985","DOIUrl":"https://doi.org/10.1145/2737924.2737985","url":null,"abstract":"Existing algorithmic debuggers for Haskell require a transformation of all modules in a program, even libraries that the user does not want to debug and which may use language features not supported by the debugger. This is a pity, because a promising approach to debugging is therefore not applicable to many real-world programs. We use the cost centre stack from the Glasgow Haskell Compiler profiling environment together with runtime value observations as provided by the Haskell Object Observation Debugger (HOOD) to collect enough information for algorithmic debugging. Program annotations are in suspected modules only. With this technique algorithmic debugging is applicable to a much larger set of Haskell programs. This demonstrates that for functional languages in general a simple stack trace extension is useful to support tasks such as profiling and debugging.","PeriodicalId":104101,"journal":{"name":"Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134358796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Efficient implementations of concurrent objects such as semaphores, locks, and atomic collections are essential to modern computing. Programming such objects is error prone: in minimizing the synchronization overhead between concurrent object invocations, one risks the conformance to reference implementations — or in formal terms, one risks violating observational refinement. Precisely testing this refinement even within a single execution is intractable, limiting existing approaches to executions with very few object invocations. We develop scalable and effective algorithms for detecting refinement violations. Our algorithms are founded on incremental, symbolic reasoning, and exploit foundational insights into the refinement-checking problem. Our approach is sound, in that we detect only actual violations, and scales far beyond existing violation-detection algorithms. Empirically, we find that our approach is practically complete, in that we detect the violations arising in actual executions.
{"title":"Monitoring refinement via symbolic reasoning","authors":"M. Emmi, C. Enea, Jad Hamza","doi":"10.1145/2737924.2737983","DOIUrl":"https://doi.org/10.1145/2737924.2737983","url":null,"abstract":"Efficient implementations of concurrent objects such as semaphores, locks, and atomic collections are essential to modern computing. Programming such objects is error prone: in minimizing the synchronization overhead between concurrent object invocations, one risks the conformance to reference implementations — or in formal terms, one risks violating observational refinement. Precisely testing this refinement even within a single execution is intractable, limiting existing approaches to executions with very few object invocations. We develop scalable and effective algorithms for detecting refinement violations. Our algorithms are founded on incremental, symbolic reasoning, and exploit foundational insights into the refinement-checking problem. Our approach is sound, in that we detect only actual violations, and scales far beyond existing violation-detection algorithms. Empirically, we find that our approach is practically complete, in that we detect the violations arising in actual executions.","PeriodicalId":104101,"journal":{"name":"Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129049121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Read-Copy-Update (RCU) is a technique for letting multiple readers safely access a data structure while a writer concurrently modifies it. It is used heavily in the Linux kernel in situations where fast reads are important and writes are infrequent. Optimized implementations rely only on the weaker memory orderings provided by modern hardware, avoiding the need for expensive synchronization instructions (such as memory barriers) as much as possible. Using GPS, a recently developed program logic for the C/C++11 memory model, we verify an implementation of RCU for a singly-linked list assuming "release-acquire" semantics. Although release-acquire synchronization is stronger than what is required by real RCU implementations, it is nonetheless significantly weaker than the assumption of sequential consistency made in prior work on RCU verification. Ours is the first formal proof of correctness for an implementation of RCU under a weak memory model.
RCU (Read-Copy-Update)是一种允许多个读端安全地访问数据结构,同时一个写端修改数据结构的技术。它在Linux内核中大量使用,在这种情况下,快速读取很重要,而写入很少。优化的实现只依赖于现代硬件提供的较弱的内存排序,尽可能避免需要昂贵的同步指令(例如内存屏障)。使用最近为C/ c++ 11内存模型开发的程序逻辑GPS,我们验证了一个假设“release-acquire”语义的单链表RCU的实现。尽管release-acquire同步比实际的RCU实现所要求的更强,但它仍然明显弱于之前在RCU验证中所做的顺序一致性假设。这是弱内存模型下RCU实现的第一个正式的正确性证明。
{"title":"Verifying read-copy-update in a logic for weak memory","authors":"Joseph Tassarotti, Derek Dreyer, Viktor Vafeiadis","doi":"10.1145/2737924.2737992","DOIUrl":"https://doi.org/10.1145/2737924.2737992","url":null,"abstract":"Read-Copy-Update (RCU) is a technique for letting multiple readers safely access a data structure while a writer concurrently modifies it. It is used heavily in the Linux kernel in situations where fast reads are important and writes are infrequent. Optimized implementations rely only on the weaker memory orderings provided by modern hardware, avoiding the need for expensive synchronization instructions (such as memory barriers) as much as possible. Using GPS, a recently developed program logic for the C/C++11 memory model, we verify an implementation of RCU for a singly-linked list assuming \"release-acquire\" semantics. Although release-acquire synchronization is stronger than what is required by real RCU implementations, it is nonetheless significantly weaker than the assumption of sequential consistency made in prior work on RCU verification. Ours is the first formal proof of correctness for an implementation of RCU under a weak memory model.","PeriodicalId":104101,"journal":{"name":"Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116266034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Numerical abstract domains are a fundamental component in modern static program analysis and are used in a wide range of scenarios (e.g. computing array bounds, disjointness, etc). However, analysis with these domains can be very expensive, deeply affecting the scalability and practical applicability of the static analysis. Hence, it is critical to ensure that these domains are made highly efficient. In this work, we present a complete approach for optimizing the performance of the Octagon numerical abstract domain, a domain shown to be particularly effective in practice. Our optimization approach is based on two key insights: i) the ability to perform online decomposition of the octagons leading to a massive reduction in operation counts, and ii) leveraging classic performance optimizations from linear algebra such as vectorization, locality of reference, scalar replacement and others, for improving the key bottlenecks of the domain. Applying these ideas, we designed new algorithms for the core Octagon operators with better asymptotic runtime than prior work and combined them with the optimization techniques to achieve high actual performance. We implemented our approach in the Octagon operators exported by the popular APRON C library, thus enabling existing static analyzers using APRON to immediately benefit from our work. To demonstrate the performance benefits of our approach, we evaluated our framework on three published static analyzers showing massive speed-ups for the time spent in Octagon analysis (e.g., up to 146x) as well as significant end-to-end program analysis speed-ups (up to 18.7x). Based on these results, we believe that our framework can serve as a new basis for static analysis with the Octagon numerical domain.
{"title":"Making numerical program analysis fast","authors":"Gagandeep Singh, Markus Püschel, Martin T. Vechev","doi":"10.1145/2737924.2738000","DOIUrl":"https://doi.org/10.1145/2737924.2738000","url":null,"abstract":"Numerical abstract domains are a fundamental component in modern static program analysis and are used in a wide range of scenarios (e.g. computing array bounds, disjointness, etc). However, analysis with these domains can be very expensive, deeply affecting the scalability and practical applicability of the static analysis. Hence, it is critical to ensure that these domains are made highly efficient. In this work, we present a complete approach for optimizing the performance of the Octagon numerical abstract domain, a domain shown to be particularly effective in practice. Our optimization approach is based on two key insights: i) the ability to perform online decomposition of the octagons leading to a massive reduction in operation counts, and ii) leveraging classic performance optimizations from linear algebra such as vectorization, locality of reference, scalar replacement and others, for improving the key bottlenecks of the domain. Applying these ideas, we designed new algorithms for the core Octagon operators with better asymptotic runtime than prior work and combined them with the optimization techniques to achieve high actual performance. We implemented our approach in the Octagon operators exported by the popular APRON C library, thus enabling existing static analyzers using APRON to immediately benefit from our work. To demonstrate the performance benefits of our approach, we evaluated our framework on three published static analyzers showing massive speed-ups for the time spent in Octagon analysis (e.g., up to 146x) as well as significant end-to-end program analysis speed-ups (up to 18.7x). Based on these results, we believe that our framework can serve as a new basis for static analysis with the Octagon numerical domain.","PeriodicalId":104101,"journal":{"name":"Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126848098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C#, Dart, Pyret, Racket, TypeScript, VB: many recent languages integrate dynamic and static types via gradual typing. We systematically develop three calculi for gradual typing and the relations between them, building on and strengthening previous work. The calculi are: λB, based on the blame calculus of Wadler and Findler (2009); λC, inspired by the coercion calculus of Henglein (1994); λS inspired by the space-efficient calculus of Herman, Tomb, and Flanagan (2006) and the threesome calculus of Siek and Wadler (2010). While λB is little changed from previous work, λC and λS are new. Together, λB, λC, and λS provide a coherent foundation for design, implementation, and optimisation of gradual types. We define translations from λB to λC and from λC to λS. Much previous work lacked proofs of correctness or had weak correctness criteria; here we demonstrate the strongest correctness criterion one could hope for, that each of the translations is fully abstract. Each of the calculi reinforces the design of the others: λC has a particularly simple definition, and the subtle definition of blame safety for λB is justified by the simple definition of blame safety for λC. Our calculus λS is implementation-ready: the first space-efficient calculus that is both straightforward to implement and easy to understand. We give two applications: first, using full abstraction from λC to λS to validate the challenging part of full abstraction between λB and λC; and, second, using full abstraction from λB to λS to easily establish the Fundamental Property of Casts, which required a custom bisimulation and six lemmas in earlier work.
c#、Dart、Pyret、Racket、TypeScript、VB:许多最新的语言都通过渐进式类型集成了动态类型和静态类型。在前人工作的基础上,系统地发展了渐进式的三种演算及其相互关系。其演算公式为:λB,基于Wadler and Findler(2009)的过失演算;λC,灵感来自Henglein(1994)的强制演算;λS的灵感来自Herman, Tomb和Flanagan(2006)的空间效率演算以及Siek和Wadler(2010)的三人演算。λB与之前的工作相比变化不大,λC和λS是新的。λB、λC和λS共同为渐进式类型的设计、实现和优化提供了连贯的基础。我们定义了从λB到λC和从λC到λS的平移。许多先前的工作缺乏正确性的证明,或者正确性标准很弱;在这里,我们展示了人们所能期望的最强的正确性标准,即每个翻译都是完全抽象的。每个演算都加强了其他演算的设计:λC有一个特别简单的定义,λB的责备安全的微妙定义被λC的责备安全的简单定义所证明。我们的微积分λS已经为实现做好了准备:第一个空间高效的微积分,它既易于实现又易于理解。我们给出了两个应用:第一,利用λC到λS的全抽象来验证λB和λC之间全抽象的挑战性部分;其次,利用λB到λS的完全抽象,轻松建立了cast的基本性质,这在早期的工作中需要一个自定义的双模拟和六个引理。
{"title":"Blame and coercion: together again for the first time","authors":"Jeremy G. Siek, Peter Thiemann, P. Wadler","doi":"10.1145/2737924.2737968","DOIUrl":"https://doi.org/10.1145/2737924.2737968","url":null,"abstract":"C#, Dart, Pyret, Racket, TypeScript, VB: many recent languages integrate dynamic and static types via gradual typing. We systematically develop three calculi for gradual typing and the relations between them, building on and strengthening previous work. The calculi are: λB, based on the blame calculus of Wadler and Findler (2009); λC, inspired by the coercion calculus of Henglein (1994); λS inspired by the space-efficient calculus of Herman, Tomb, and Flanagan (2006) and the threesome calculus of Siek and Wadler (2010). While λB is little changed from previous work, λC and λS are new. Together, λB, λC, and λS provide a coherent foundation for design, implementation, and optimisation of gradual types. We define translations from λB to λC and from λC to λS. Much previous work lacked proofs of correctness or had weak correctness criteria; here we demonstrate the strongest correctness criterion one could hope for, that each of the translations is fully abstract. Each of the calculi reinforces the design of the others: λC has a particularly simple definition, and the subtle definition of blame safety for λB is justified by the simple definition of blame safety for λC. Our calculus λS is implementation-ready: the first space-efficient calculus that is both straightforward to implement and easy to understand. We give two applications: first, using full abstraction from λC to λS to validate the challenging part of full abstraction between λB and λC; and, second, using full abstraction from λB to λS to easily establish the Fundamental Property of Casts, which required a custom bisimulation and six lemmas in earlier work.","PeriodicalId":104101,"journal":{"name":"Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126919307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}