首页 > 最新文献

Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation最新文献

英文 中文
LaminarIR: compile-time queues for structured streams LaminarIR:结构化流的编译时队列
Yousun Ko, Bernd Burgstaller, Bernhard Scholz
Stream programming languages employ FIFO (first-in, first-out) semantics to model data channels between producers and consumers. A FIFO data channel stores tokens in a buffer that is accessed indirectly via read- and write-pointers. This indirect token-access decouples a producer’s write-operations from the read-operations of the consumer, thereby making dataflow implicit. For a compiler, indirect token-access obscures data-dependencies, which renders standard optimizations ineffective and impacts stream program performance negatively. In this paper we propose a transformation for structured stream programming languages such as StreamIt that shifts FIFO buffer management from run-time to compile-time and eliminates splitters and joiners, whose task is to distribute and merge streams. To show the effectiveness of our lowering transformation, we have implemented a StreamIt to C compilation framework. We have developed our own intermediate representation (IR) called LaminarIR, which facilitates the transformation. We report on the enabling effect of the LaminarIR on LLVM’s optimizations, which required the conversion of several standard StreamIt benchmarks from static to randomized input, to prevent computation of partial results at compile-time. We conducted our experimental evaluation on the Intel i7-2600K, AMD Opteron 6378, Intel Xeon Phi 3120A and ARM Cortex-A15 platforms. Our LaminarIR reduces data-communication on average by 35.9% and achieves platform-specific speedups between 3.73x and 4.98x over StreamIt. We reduce memory accesses by more than 60% and achieve energy savings of up to 93.6% on the Intel i7-2600K.
流编程语言采用先进先出(FIFO)语义对生产者和消费者之间的数据通道进行建模。FIFO数据通道将令牌存储在通过读写指针间接访问的缓冲区中。这种间接的令牌访问将生产者的写操作与消费者的读操作解耦,从而使数据流隐式化。对于编译器,间接的令牌访问模糊了数据依赖性,这使得标准优化无效,并对流程序性能产生负面影响。在本文中,我们提出了结构化流编程语言(如StreamIt)的转换,将FIFO缓冲区管理从运行时转移到编译时,并消除了拆分器和join器,其任务是分发和合并流。为了显示降低转换的有效性,我们实现了一个从StreamIt到C的编译框架。我们已经开发了自己的中间表示(IR),称为LaminarIR,它促进了转换。我们报告了LaminarIR对LLVM优化的启用效果,这需要将几个标准StreamIt基准从静态转换为随机输入,以防止在编译时计算部分结果。我们在Intel i7-2600K、AMD Opteron 6378、Intel Xeon Phi 3120A和ARM Cortex-A15平台上进行了实验评估。我们的LaminarIR平均减少了35.9%的数据通信,并且在StreamIt上实现了3.73到4.98倍的平台特定加速。我们在英特尔i7-2600K上减少了超过60%的内存访问,并实现了高达93.6%的节能。
{"title":"LaminarIR: compile-time queues for structured streams","authors":"Yousun Ko, Bernd Burgstaller, Bernhard Scholz","doi":"10.1145/2737924.2737994","DOIUrl":"https://doi.org/10.1145/2737924.2737994","url":null,"abstract":"Stream programming languages employ FIFO (first-in, first-out) semantics to model data channels between producers and consumers. A FIFO data channel stores tokens in a buffer that is accessed indirectly via read- and write-pointers. This indirect token-access decouples a producer’s write-operations from the read-operations of the consumer, thereby making dataflow implicit. For a compiler, indirect token-access obscures data-dependencies, which renders standard optimizations ineffective and impacts stream program performance negatively. In this paper we propose a transformation for structured stream programming languages such as StreamIt that shifts FIFO buffer management from run-time to compile-time and eliminates splitters and joiners, whose task is to distribute and merge streams. To show the effectiveness of our lowering transformation, we have implemented a StreamIt to C compilation framework. We have developed our own intermediate representation (IR) called LaminarIR, which facilitates the transformation. We report on the enabling effect of the LaminarIR on LLVM’s optimizations, which required the conversion of several standard StreamIt benchmarks from static to randomized input, to prevent computation of partial results at compile-time. We conducted our experimental evaluation on the Intel i7-2600K, AMD Opteron 6378, Intel Xeon Phi 3120A and ARM Cortex-A15 platforms. Our LaminarIR reduces data-communication on average by 35.9% and achieves platform-specific speedups between 3.73x and 4.98x over StreamIt. We reduce memory accesses by more than 60% and achieve energy savings of up to 93.6% on the Intel i7-2600K.","PeriodicalId":104101,"journal":{"name":"Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126364664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Automatically improving accuracy for floating point expressions 自动提高浮点表达式的精度
P. Panchekha, Alex Sanchez-Stern, James R. Wilcox, Zachary Tatlock
Scientific and engineering applications depend on floating point arithmetic to approximate real arithmetic. This approximation introduces rounding error, which can accumulate to produce unacceptable results. While the numerical methods literature provides techniques to mitigate rounding error, applying these techniques requires manually rearranging expressions and understanding the finer details of floating point arithmetic. We introduce Herbie, a tool which automatically discovers the rewrites experts perform to improve accuracy. Herbie's heuristic search estimates and localizes rounding error using sampled points (rather than static error analysis), applies a database of rules to generate improvements, takes series expansions, and combines improvements for different input regions. We evaluated Herbie on examples from a classic numerical methods textbook, and found that Herbie was able to improve accuracy on each example, some by up to 60 bits, while imposing a median performance overhead of 40%. Colleagues in machine learning have used Herbie to significantly improve the results of a clustering algorithm, and a mathematical library has accepted two patches generated using Herbie.
科学和工程应用依赖于浮点运算来近似实数运算。这种近似引入了舍入误差,累加起来会产生不可接受的结果。虽然数值方法文献提供了减轻舍入误差的技术,但应用这些技术需要手动重新排列表达式并理解浮点算术的更精细细节。我们介绍了Herbie,一个自动发现重写专家执行以提高准确性的工具。Herbie的启发式搜索使用采样点(而不是静态误差分析)估计和定位舍入误差,应用规则数据库来生成改进,进行序列展开,并结合不同输入区域的改进。我们在经典数值方法教科书的示例上对Herbie进行了评估,发现Herbie能够提高每个示例的准确性,有些甚至提高了60位,而平均性能开销为40%。机器学习领域的同事已经使用Herbie显著改善了聚类算法的结果,一个数学库已经接受了使用Herbie生成的两个补丁。
{"title":"Automatically improving accuracy for floating point expressions","authors":"P. Panchekha, Alex Sanchez-Stern, James R. Wilcox, Zachary Tatlock","doi":"10.1145/2737924.2737959","DOIUrl":"https://doi.org/10.1145/2737924.2737959","url":null,"abstract":"Scientific and engineering applications depend on floating point arithmetic to approximate real arithmetic. This approximation introduces rounding error, which can accumulate to produce unacceptable results. While the numerical methods literature provides techniques to mitigate rounding error, applying these techniques requires manually rearranging expressions and understanding the finer details of floating point arithmetic. We introduce Herbie, a tool which automatically discovers the rewrites experts perform to improve accuracy. Herbie's heuristic search estimates and localizes rounding error using sampled points (rather than static error analysis), applies a database of rules to generate improvements, takes series expansions, and combines improvements for different input regions. We evaluated Herbie on examples from a classic numerical methods textbook, and found that Herbie was able to improve accuracy on each example, some by up to 60 bits, while imposing a median performance overhead of 40%. Colleagues in machine learning have used Herbie to significantly improve the results of a clustering algorithm, and a mathematical library has accepted two patches generated using Herbie.","PeriodicalId":104101,"journal":{"name":"Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129035605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 176
Optimizing off-chip accesses in multicores 优化多核的片外访问
W. Ding, Xulong Tang, M. Kandemir, Yuanrui Zhang, Emre Kultursay
In a network-on-chip (NoC) based manycore architecture, an off-chip data access (main memory access) needs to travel through the on-chip network, spending considerable amount of time within the chip (in addition to the memory access latency). In addition, it contends with on-chip (cache) accesses as both use the same NoC resources. In this paper, focusing on data-parallel, multithreaded applications, we propose a compiler-based off-chip data access localization strategy, which places data elements in the memory space such that an off-chip access traverses a minimum number of links (hops) to reach the memory controller that handles this access. This brings three main benefits. First, the network latency of off-chip accesses gets reduced; second, the network latency of on-chip accesses gets reduced; and finally, the memory latency of off-chip accesses improves, due to reduced queue latencies. We present an experimental evaluation of our optimization strategy using a set of 13 multithreaded application programs under both private and shared last-level caches. The results collected emphasize the importance of optimizing the off-chip data accesses.
在基于片上网络(NoC)的多核架构中,片外数据访问(主存储器访问)需要通过片上网络,在芯片内花费相当多的时间(除了存储器访问延迟)。此外,它与片上(缓存)访问竞争,因为两者使用相同的NoC资源。在本文中,重点关注数据并行、多线程应用程序,我们提出了一种基于编译器的片外数据访问本地化策略,该策略将数据元素放置在内存空间中,以便片外访问遍历最小数量的链接(跳数)以到达处理此访问的内存控制器。这带来了三个主要好处。首先,降低了片外访问的网络延迟;其次,降低了片上访问的网络延迟;最后,由于队列延迟减少,片外访问的内存延迟得到改善。我们使用私有和共享最后一级缓存下的13个多线程应用程序对我们的优化策略进行了实验评估。收集的结果强调了优化片外数据访问的重要性。
{"title":"Optimizing off-chip accesses in multicores","authors":"W. Ding, Xulong Tang, M. Kandemir, Yuanrui Zhang, Emre Kultursay","doi":"10.1145/2737924.2737989","DOIUrl":"https://doi.org/10.1145/2737924.2737989","url":null,"abstract":"In a network-on-chip (NoC) based manycore architecture, an off-chip data access (main memory access) needs to travel through the on-chip network, spending considerable amount of time within the chip (in addition to the memory access latency). In addition, it contends with on-chip (cache) accesses as both use the same NoC resources. In this paper, focusing on data-parallel, multithreaded applications, we propose a compiler-based off-chip data access localization strategy, which places data elements in the memory space such that an off-chip access traverses a minimum number of links (hops) to reach the memory controller that handles this access. This brings three main benefits. First, the network latency of off-chip accesses gets reduced; second, the network latency of on-chip accesses gets reduced; and finally, the memory latency of off-chip accesses improves, due to reduced queue latencies. We present an experimental evaluation of our optimization strategy using a set of 13 multithreaded application programs under both private and shared last-level caches. The results collected emphasize the importance of optimizing the off-chip data accesses.","PeriodicalId":104101,"journal":{"name":"Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133689694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Tree dependence analysis 树相关性分析
Yusheng Weijiang, S. Balakrishna, Jianqiao Liu, Milind Kulkarni
We develop a new framework for analyzing recursive methods that perform traversals over trees, called tree dependence analysis. This analysis translates dependence analysis techniques for regular programs to the irregular space, identifying the structure of dependences within a recursive method that traverses trees. We develop a dependence test that exploits the dependence structure of such programs, and can prove that several locality- and parallelism- enhancing transformations are legal. In addition, we extend our analysis with a novel path-dependent, conditional analysis to refine the dependence test and prove the legality of transformations for a wider range of algorithms. We then use these analyses to show that several common algorithms that manipulate trees recursively are amenable to several locality- and parallelism-enhancing transformations. This work shows that classical dependence analysis techniques, which have largely been confined to nested loops over array data structures, can be extended and translated to work for complex, recursive programs that operate over pointer-based data structures.
我们开发了一个新的框架,用于分析在树上执行遍历的递归方法,称为树依赖分析。该分析将常规程序的依赖性分析技术转换为不规则空间,在遍历树的递归方法中识别依赖性结构。我们开发了一个依赖性测试,利用这些程序的依赖性结构,并可以证明几个局部性和并行性增强的转换是合法的。此外,我们用一种新的路径依赖的条件分析来扩展我们的分析,以改进依赖性测试并证明更广泛算法转换的合法性。然后,我们使用这些分析来显示递归操作树的几种常见算法适用于几种局部性和并行性增强的转换。这项工作表明,经典的依赖分析技术,很大程度上局限于数组数据结构上的嵌套循环,可以扩展和转换为复杂的递归程序,操作基于指针的数据结构。
{"title":"Tree dependence analysis","authors":"Yusheng Weijiang, S. Balakrishna, Jianqiao Liu, Milind Kulkarni","doi":"10.1145/2737924.2737972","DOIUrl":"https://doi.org/10.1145/2737924.2737972","url":null,"abstract":"We develop a new framework for analyzing recursive methods that perform traversals over trees, called tree dependence analysis. This analysis translates dependence analysis techniques for regular programs to the irregular space, identifying the structure of dependences within a recursive method that traverses trees. We develop a dependence test that exploits the dependence structure of such programs, and can prove that several locality- and parallelism- enhancing transformations are legal. In addition, we extend our analysis with a novel path-dependent, conditional analysis to refine the dependence test and prove the legality of transformations for a wider range of algorithms. We then use these analyses to show that several common algorithms that manipulate trees recursively are amenable to several locality- and parallelism-enhancing transformations. This work shows that classical dependence analysis techniques, which have largely been confined to nested loops over array data structures, can be extended and translated to work for complex, recursive programs that operate over pointer-based data structures.","PeriodicalId":104101,"journal":{"name":"Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"604 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130428841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Peer-to-peer affine commitment using bitcoin 使用比特币的点对点仿射承诺
Karl Crary, Michael J. Sullivan
The power of linear and affine logic lies in their ability to model state change. However, in a trustless, peer-to-peer setting, it is difficult to force principals to commit to state changes. We show how to solve the peer-to-peer affine commitment problem using a generalization of Bitcoin in which transactions deal in types rather than numbers. This has applications to proof-carrying authorization and mechanically executable contracts. Importantly, our system can be---and is---implemented on top of the existing Bitcoin network, so there is no need to recruit computing power to a new protocol.
线性和仿射逻辑的强大之处在于它们对状态变化建模的能力。然而,在不可信的点对点设置中,很难强制主体提交状态更改。我们展示了如何使用比特币的泛化来解决点对点仿射承诺问题,其中交易以类型而不是数字进行处理。这适用于携带证明的授权和机械可执行的合同。重要的是,我们的系统可以在现有的比特币网络上实现,因此不需要为新协议增加计算能力。
{"title":"Peer-to-peer affine commitment using bitcoin","authors":"Karl Crary, Michael J. Sullivan","doi":"10.1145/2737924.2737997","DOIUrl":"https://doi.org/10.1145/2737924.2737997","url":null,"abstract":"The power of linear and affine logic lies in their ability to model state change. However, in a trustless, peer-to-peer setting, it is difficult to force principals to commit to state changes. We show how to solve the peer-to-peer affine commitment problem using a generalization of Bitcoin in which transactions deal in types rather than numbers. This has applications to proof-carrying authorization and mechanically executable contracts. Importantly, our system can be---and is---implemented on top of the existing Bitcoin network, so there is no need to recruit computing power to a new protocol.","PeriodicalId":104101,"journal":{"name":"Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130771116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Algorithmic debugging of real-world haskell programs: deriving dependencies from the cost centre stack 真实世界haskell程序的算法调试:从成本中心堆栈派生依赖
M. Faddegon, O. Chitil
Existing algorithmic debuggers for Haskell require a transformation of all modules in a program, even libraries that the user does not want to debug and which may use language features not supported by the debugger. This is a pity, because a promising approach to debugging is therefore not applicable to many real-world programs. We use the cost centre stack from the Glasgow Haskell Compiler profiling environment together with runtime value observations as provided by the Haskell Object Observation Debugger (HOOD) to collect enough information for algorithmic debugging. Program annotations are in suspected modules only. With this technique algorithmic debugging is applicable to a much larger set of Haskell programs. This demonstrates that for functional languages in general a simple stack trace extension is useful to support tasks such as profiling and debugging.
现有的Haskell算法调试器需要对程序中的所有模块进行转换,甚至包括用户不想调试的库和可能使用调试器不支持的语言特性的库。这是一个遗憾,因为一种有前途的调试方法并不适用于许多实际的程序。我们使用来自格拉斯哥Haskell编译器分析环境的成本中心堆栈以及由Haskell对象观察调试器(HOOD)提供的运行时值观察来收集足够的信息进行算法调试。程序注释只存在于可疑模块中。使用这种技术,算法调试适用于更大的Haskell程序集。这表明,对于一般的函数式语言,简单的堆栈跟踪扩展对于支持诸如分析和调试之类的任务是有用的。
{"title":"Algorithmic debugging of real-world haskell programs: deriving dependencies from the cost centre stack","authors":"M. Faddegon, O. Chitil","doi":"10.1145/2737924.2737985","DOIUrl":"https://doi.org/10.1145/2737924.2737985","url":null,"abstract":"Existing algorithmic debuggers for Haskell require a transformation of all modules in a program, even libraries that the user does not want to debug and which may use language features not supported by the debugger. This is a pity, because a promising approach to debugging is therefore not applicable to many real-world programs. We use the cost centre stack from the Glasgow Haskell Compiler profiling environment together with runtime value observations as provided by the Haskell Object Observation Debugger (HOOD) to collect enough information for algorithmic debugging. Program annotations are in suspected modules only. With this technique algorithmic debugging is applicable to a much larger set of Haskell programs. This demonstrates that for functional languages in general a simple stack trace extension is useful to support tasks such as profiling and debugging.","PeriodicalId":104101,"journal":{"name":"Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134358796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Monitoring refinement via symbolic reasoning 通过符号推理监控改进
M. Emmi, C. Enea, Jad Hamza
Efficient implementations of concurrent objects such as semaphores, locks, and atomic collections are essential to modern computing. Programming such objects is error prone: in minimizing the synchronization overhead between concurrent object invocations, one risks the conformance to reference implementations — or in formal terms, one risks violating observational refinement. Precisely testing this refinement even within a single execution is intractable, limiting existing approaches to executions with very few object invocations. We develop scalable and effective algorithms for detecting refinement violations. Our algorithms are founded on incremental, symbolic reasoning, and exploit foundational insights into the refinement-checking problem. Our approach is sound, in that we detect only actual violations, and scales far beyond existing violation-detection algorithms. Empirically, we find that our approach is practically complete, in that we detect the violations arising in actual executions.
并发对象(如信号量、锁和原子集合)的高效实现对于现代计算至关重要。对这样的对象进行编程很容易出错:为了最小化并发对象调用之间的同步开销,人们冒着与引用实现的一致性的风险——或者用正式的术语来说,人们冒着违反观察细化的风险。即使在单个执行中精确地测试这种细化也是难以处理的,这限制了使用很少对象调用的现有执行方法。我们开发了可扩展和有效的算法来检测改进违规。我们的算法建立在增量、符号推理的基础上,并利用对改进检查问题的基本见解。我们的方法是合理的,因为我们只检测实际的违规行为,并且远远超出了现有的违规检测算法。根据经验,我们发现我们的方法实际上是完整的,因为我们发现了实际处决中发生的违法行为。
{"title":"Monitoring refinement via symbolic reasoning","authors":"M. Emmi, C. Enea, Jad Hamza","doi":"10.1145/2737924.2737983","DOIUrl":"https://doi.org/10.1145/2737924.2737983","url":null,"abstract":"Efficient implementations of concurrent objects such as semaphores, locks, and atomic collections are essential to modern computing. Programming such objects is error prone: in minimizing the synchronization overhead between concurrent object invocations, one risks the conformance to reference implementations — or in formal terms, one risks violating observational refinement. Precisely testing this refinement even within a single execution is intractable, limiting existing approaches to executions with very few object invocations. We develop scalable and effective algorithms for detecting refinement violations. Our algorithms are founded on incremental, symbolic reasoning, and exploit foundational insights into the refinement-checking problem. Our approach is sound, in that we detect only actual violations, and scales far beyond existing violation-detection algorithms. Empirically, we find that our approach is practically complete, in that we detect the violations arising in actual executions.","PeriodicalId":104101,"journal":{"name":"Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129049121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Verifying read-copy-update in a logic for weak memory 验证弱内存逻辑中的读-复制-更新
Joseph Tassarotti, Derek Dreyer, Viktor Vafeiadis
Read-Copy-Update (RCU) is a technique for letting multiple readers safely access a data structure while a writer concurrently modifies it. It is used heavily in the Linux kernel in situations where fast reads are important and writes are infrequent. Optimized implementations rely only on the weaker memory orderings provided by modern hardware, avoiding the need for expensive synchronization instructions (such as memory barriers) as much as possible. Using GPS, a recently developed program logic for the C/C++11 memory model, we verify an implementation of RCU for a singly-linked list assuming "release-acquire" semantics. Although release-acquire synchronization is stronger than what is required by real RCU implementations, it is nonetheless significantly weaker than the assumption of sequential consistency made in prior work on RCU verification. Ours is the first formal proof of correctness for an implementation of RCU under a weak memory model.
RCU (Read-Copy-Update)是一种允许多个读端安全地访问数据结构,同时一个写端修改数据结构的技术。它在Linux内核中大量使用,在这种情况下,快速读取很重要,而写入很少。优化的实现只依赖于现代硬件提供的较弱的内存排序,尽可能避免需要昂贵的同步指令(例如内存屏障)。使用最近为C/ c++ 11内存模型开发的程序逻辑GPS,我们验证了一个假设“release-acquire”语义的单链表RCU的实现。尽管release-acquire同步比实际的RCU实现所要求的更强,但它仍然明显弱于之前在RCU验证中所做的顺序一致性假设。这是弱内存模型下RCU实现的第一个正式的正确性证明。
{"title":"Verifying read-copy-update in a logic for weak memory","authors":"Joseph Tassarotti, Derek Dreyer, Viktor Vafeiadis","doi":"10.1145/2737924.2737992","DOIUrl":"https://doi.org/10.1145/2737924.2737992","url":null,"abstract":"Read-Copy-Update (RCU) is a technique for letting multiple readers safely access a data structure while a writer concurrently modifies it. It is used heavily in the Linux kernel in situations where fast reads are important and writes are infrequent. Optimized implementations rely only on the weaker memory orderings provided by modern hardware, avoiding the need for expensive synchronization instructions (such as memory barriers) as much as possible. Using GPS, a recently developed program logic for the C/C++11 memory model, we verify an implementation of RCU for a singly-linked list assuming \"release-acquire\" semantics. Although release-acquire synchronization is stronger than what is required by real RCU implementations, it is nonetheless significantly weaker than the assumption of sequential consistency made in prior work on RCU verification. Ours is the first formal proof of correctness for an implementation of RCU under a weak memory model.","PeriodicalId":104101,"journal":{"name":"Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116266034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
Making numerical program analysis fast 使数值程序分析速度快
Gagandeep Singh, Markus Püschel, Martin T. Vechev
Numerical abstract domains are a fundamental component in modern static program analysis and are used in a wide range of scenarios (e.g. computing array bounds, disjointness, etc). However, analysis with these domains can be very expensive, deeply affecting the scalability and practical applicability of the static analysis. Hence, it is critical to ensure that these domains are made highly efficient. In this work, we present a complete approach for optimizing the performance of the Octagon numerical abstract domain, a domain shown to be particularly effective in practice. Our optimization approach is based on two key insights: i) the ability to perform online decomposition of the octagons leading to a massive reduction in operation counts, and ii) leveraging classic performance optimizations from linear algebra such as vectorization, locality of reference, scalar replacement and others, for improving the key bottlenecks of the domain. Applying these ideas, we designed new algorithms for the core Octagon operators with better asymptotic runtime than prior work and combined them with the optimization techniques to achieve high actual performance. We implemented our approach in the Octagon operators exported by the popular APRON C library, thus enabling existing static analyzers using APRON to immediately benefit from our work. To demonstrate the performance benefits of our approach, we evaluated our framework on three published static analyzers showing massive speed-ups for the time spent in Octagon analysis (e.g., up to 146x) as well as significant end-to-end program analysis speed-ups (up to 18.7x). Based on these results, we believe that our framework can serve as a new basis for static analysis with the Octagon numerical domain.
数值抽象域是现代静态程序分析的一个基本组成部分,被广泛应用于各种场景(如计算阵列边界、不相交等)。然而,使用这些域进行分析的成本非常高,严重影响了静态分析的可伸缩性和实际适用性。因此,确保这些域的高效率是至关重要的。在这项工作中,我们提出了一个完整的方法来优化八边形数值抽象域的性能,这个域在实践中被证明是特别有效的。我们的优化方法基于两个关键的见解:i)执行八边形在线分解的能力,从而大量减少操作次数;ii)利用线性代数的经典性能优化,如向量化、引用局部性、标量替换等,以改善领域的关键瓶颈。应用这些思想,我们为核心八边形算子设计了新的算法,并将其与优化技术相结合,使其具有更好的渐近运行时间,从而获得更高的实际性能。我们在流行的APRON C库导出的Octagon操作符中实现了我们的方法,从而使使用APRON的现有静态分析器能够立即从我们的工作中受益。为了证明我们的方法的性能优势,我们在三个已发布的静态分析器上评估了我们的框架,显示了在八边形分析中花费的时间的大量加速(例如,高达146倍)以及显著的端到端程序分析加速(高达18.7倍)。基于这些结果,我们相信我们的框架可以作为八边形数值域静态分析的新基础。
{"title":"Making numerical program analysis fast","authors":"Gagandeep Singh, Markus Püschel, Martin T. Vechev","doi":"10.1145/2737924.2738000","DOIUrl":"https://doi.org/10.1145/2737924.2738000","url":null,"abstract":"Numerical abstract domains are a fundamental component in modern static program analysis and are used in a wide range of scenarios (e.g. computing array bounds, disjointness, etc). However, analysis with these domains can be very expensive, deeply affecting the scalability and practical applicability of the static analysis. Hence, it is critical to ensure that these domains are made highly efficient. In this work, we present a complete approach for optimizing the performance of the Octagon numerical abstract domain, a domain shown to be particularly effective in practice. Our optimization approach is based on two key insights: i) the ability to perform online decomposition of the octagons leading to a massive reduction in operation counts, and ii) leveraging classic performance optimizations from linear algebra such as vectorization, locality of reference, scalar replacement and others, for improving the key bottlenecks of the domain. Applying these ideas, we designed new algorithms for the core Octagon operators with better asymptotic runtime than prior work and combined them with the optimization techniques to achieve high actual performance. We implemented our approach in the Octagon operators exported by the popular APRON C library, thus enabling existing static analyzers using APRON to immediately benefit from our work. To demonstrate the performance benefits of our approach, we evaluated our framework on three published static analyzers showing massive speed-ups for the time spent in Octagon analysis (e.g., up to 146x) as well as significant end-to-end program analysis speed-ups (up to 18.7x). Based on these results, we believe that our framework can serve as a new basis for static analysis with the Octagon numerical domain.","PeriodicalId":104101,"journal":{"name":"Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126848098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
Blame and coercion: together again for the first time 责备和胁迫:第一次又在一起了
Jeremy G. Siek, Peter Thiemann, P. Wadler
C#, Dart, Pyret, Racket, TypeScript, VB: many recent languages integrate dynamic and static types via gradual typing. We systematically develop three calculi for gradual typing and the relations between them, building on and strengthening previous work. The calculi are: λB, based on the blame calculus of Wadler and Findler (2009); λC, inspired by the coercion calculus of Henglein (1994); λS inspired by the space-efficient calculus of Herman, Tomb, and Flanagan (2006) and the threesome calculus of Siek and Wadler (2010). While λB is little changed from previous work, λC and λS are new. Together, λB, λC, and λS provide a coherent foundation for design, implementation, and optimisation of gradual types. We define translations from λB to λC and from λC to λS. Much previous work lacked proofs of correctness or had weak correctness criteria; here we demonstrate the strongest correctness criterion one could hope for, that each of the translations is fully abstract. Each of the calculi reinforces the design of the others: λC has a particularly simple definition, and the subtle definition of blame safety for λB is justified by the simple definition of blame safety for λC. Our calculus λS is implementation-ready: the first space-efficient calculus that is both straightforward to implement and easy to understand. We give two applications: first, using full abstraction from λC to λS to validate the challenging part of full abstraction between λB and λC; and, second, using full abstraction from λB to λS to easily establish the Fundamental Property of Casts, which required a custom bisimulation and six lemmas in earlier work.
c#、Dart、Pyret、Racket、TypeScript、VB:许多最新的语言都通过渐进式类型集成了动态类型和静态类型。在前人工作的基础上,系统地发展了渐进式的三种演算及其相互关系。其演算公式为:λB,基于Wadler and Findler(2009)的过失演算;λC,灵感来自Henglein(1994)的强制演算;λS的灵感来自Herman, Tomb和Flanagan(2006)的空间效率演算以及Siek和Wadler(2010)的三人演算。λB与之前的工作相比变化不大,λC和λS是新的。λB、λC和λS共同为渐进式类型的设计、实现和优化提供了连贯的基础。我们定义了从λB到λC和从λC到λS的平移。许多先前的工作缺乏正确性的证明,或者正确性标准很弱;在这里,我们展示了人们所能期望的最强的正确性标准,即每个翻译都是完全抽象的。每个演算都加强了其他演算的设计:λC有一个特别简单的定义,λB的责备安全的微妙定义被λC的责备安全的简单定义所证明。我们的微积分λS已经为实现做好了准备:第一个空间高效的微积分,它既易于实现又易于理解。我们给出了两个应用:第一,利用λC到λS的全抽象来验证λB和λC之间全抽象的挑战性部分;其次,利用λB到λS的完全抽象,轻松建立了cast的基本性质,这在早期的工作中需要一个自定义的双模拟和六个引理。
{"title":"Blame and coercion: together again for the first time","authors":"Jeremy G. Siek, Peter Thiemann, P. Wadler","doi":"10.1145/2737924.2737968","DOIUrl":"https://doi.org/10.1145/2737924.2737968","url":null,"abstract":"C#, Dart, Pyret, Racket, TypeScript, VB: many recent languages integrate dynamic and static types via gradual typing. We systematically develop three calculi for gradual typing and the relations between them, building on and strengthening previous work. The calculi are: λB, based on the blame calculus of Wadler and Findler (2009); λC, inspired by the coercion calculus of Henglein (1994); λS inspired by the space-efficient calculus of Herman, Tomb, and Flanagan (2006) and the threesome calculus of Siek and Wadler (2010). While λB is little changed from previous work, λC and λS are new. Together, λB, λC, and λS provide a coherent foundation for design, implementation, and optimisation of gradual types. We define translations from λB to λC and from λC to λS. Much previous work lacked proofs of correctness or had weak correctness criteria; here we demonstrate the strongest correctness criterion one could hope for, that each of the translations is fully abstract. Each of the calculi reinforces the design of the others: λC has a particularly simple definition, and the subtle definition of blame safety for λB is justified by the simple definition of blame safety for λC. Our calculus λS is implementation-ready: the first space-efficient calculus that is both straightforward to implement and easy to understand. We give two applications: first, using full abstraction from λC to λS to validate the challenging part of full abstraction between λB and λC; and, second, using full abstraction from λB to λS to easily establish the Fundamental Property of Casts, which required a custom bisimulation and six lemmas in earlier work.","PeriodicalId":104101,"journal":{"name":"Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126919307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
期刊
Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1