Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation最新文献

英文中文

Accepting blame for safe tunneled exceptions 接受安全隧道异常的责任

Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2016-06-02 DOI: 10.1145/2908080.2908086

Yizhou Zhang, G. Salvaneschi, Quinn Beightol, B. Liskov, A. Myers

Unhandled exceptions crash programs, so a compile-time check that exceptions are handled should in principle make software more reliable. But designers of some recent languages have argued that the benefits of statically checked exceptions are not worth the costs. We introduce a new statically checked exception mechanism that addresses the problems with existing checked-exception mechanisms. In particular, it interacts well with higher-order functions and other design patterns. The key insight is that whether an exception should be treated as a "checked" exception is not a property of its type but rather of the context in which the exception propagates. Statically checked exceptions can "tunnel" through code that is oblivious to their presence, but the type system nevertheless checks that these exceptions are handled. Further, exceptions can be tunneled without being accidentally caught, by expanding the space of exception identifiers to identify the exception-handling context. The resulting mechanism is expressive and syntactically light, and can be implemented efficiently. We demonstrate the expressiveness of the mechanism using significant codebases and evaluate its performance. We have implemented this new exception mechanism as part of the new Genus programming language, but the mechanism could equally well be applied to other programming languages.

未处理的异常会使程序崩溃，因此编译时检查异常是否已处理，原则上应该使软件更可靠。但是最近一些语言的设计者认为，静态检查异常的好处不值得付出代价。我们引入了一种新的静态检查异常机制来解决现有检查异常机制的问题。特别是，它可以很好地与高阶函数和其他设计模式交互。关键的见解是，异常是否应该被视为“检查”异常不是其类型的属性，而是异常传播的上下文的属性。静态检查的异常可以“穿过”对其存在不知情的代码，但是类型系统仍然检查这些异常是否被处理。此外，通过扩展异常标识符的空间来标识异常处理上下文，可以在不被意外捕获的情况下对异常进行隧道化处理。由此产生的机制具有表达性和语法轻量级，并且可以有效地实现。我们使用重要的代码库演示了该机制的表达性，并评估了其性能。我们已经将这种新的异常机制作为新的Genus编程语言的一部分实现，但是这种机制同样可以很好地应用于其他编程语言。

{"title":"Accepting blame for safe tunneled exceptions","authors":"Yizhou Zhang, G. Salvaneschi, Quinn Beightol, B. Liskov, A. Myers","doi":"10.1145/2908080.2908086","DOIUrl":"https://doi.org/10.1145/2908080.2908086","url":null,"abstract":"Unhandled exceptions crash programs, so a compile-time check that exceptions are handled should in principle make software more reliable. But designers of some recent languages have argued that the benefits of statically checked exceptions are not worth the costs. We introduce a new statically checked exception mechanism that addresses the problems with existing checked-exception mechanisms. In particular, it interacts well with higher-order functions and other design patterns. The key insight is that whether an exception should be treated as a \"checked\" exception is not a property of its type but rather of the context in which the exception propagates. Statically checked exceptions can \"tunnel\" through code that is oblivious to their presence, but the type system nevertheless checks that these exceptions are handled. Further, exceptions can be tunneled without being accidentally caught, by expanding the space of exception identifiers to identify the exception-handling context. The resulting mechanism is expressive and syntactically light, and can be implemented efficiently. We demonstrate the expressiveness of the mechanism using significant codebases and evaluate its performance. We have implemented this new exception mechanism as part of the new Genus programming language, but the mechanism could equally well be applied to other programming languages.","PeriodicalId":178839,"journal":{"name":"Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122774644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

GreenWeb: language extensions for energy-efficient mobile web computing GreenWeb:节能移动web计算的语言扩展

Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2016-06-02 DOI: 10.1145/2908080.2908082

Yuhao Zhu, V. Reddi

Web computing is gradually shifting toward mobile devices, in which the energy budget is severely constrained. As a result, Web developers must be conscious of energy efficiency. However, current Web languages provide developers little control over energy consumption. In this paper, we take a first step toward language-level research to enable energy-efficient Web computing. Our key motivation is that mobile systems can wisely budget energy usage if informed with user quality-of-service (QoS) constraints. To do this, programmers need new abstractions. We propose two language abstractions, QoS type and QoS target, to capture two fundamental aspects of user QoS experience. We then present GreenWeb, a set of language extensions that empower developers to easily express the QoS abstractions as program annotations. As a proof of concept, we develop a GreenWeb runtime, which intelligently determines how to deliver specified user QoS expectation while minimizing energy consumption. Overall, GreenWeb shows significant energy savings (29.2% ∼ 66.0%) over Android’s default Interactive governor with few QoS violations. Our work demonstrates a promising first step toward language innovations for energy-efficient Web computing.

网络计算正逐渐向移动设备转移，而移动设备的能源预算受到严重限制。因此，Web开发人员必须意识到能源效率。然而，当前的Web语言为开发人员提供了很少的能耗控制。在本文中，我们向语言级研究迈出了第一步，以实现节能的Web计算。我们的主要动机是，如果了解用户服务质量(QoS)约束，移动系统可以明智地预算能源使用。要做到这一点，程序员需要新的抽象。我们提出了两种语言抽象，QoS类型和QoS目标，以捕获用户QoS体验的两个基本方面。然后我们介绍GreenWeb，这是一组语言扩展，使开发人员能够轻松地将QoS抽象表示为程序注释。作为概念验证，我们开发了一个GreenWeb运行时，它智能地确定如何在最小化能耗的同时交付指定的用户QoS期望。总的来说，GreenWeb比Android默认的交互式调控器节能(29.2% ~ 66.0%)，并且很少违反QoS。我们的工作向高效节能Web计算的语言创新迈出了有希望的第一步。

{"title":"GreenWeb: language extensions for energy-efficient mobile web computing","authors":"Yuhao Zhu, V. Reddi","doi":"10.1145/2908080.2908082","DOIUrl":"https://doi.org/10.1145/2908080.2908082","url":null,"abstract":"Web computing is gradually shifting toward mobile devices, in which the energy budget is severely constrained. As a result, Web developers must be conscious of energy efficiency. However, current Web languages provide developers little control over energy consumption. In this paper, we take a first step toward language-level research to enable energy-efficient Web computing. Our key motivation is that mobile systems can wisely budget energy usage if informed with user quality-of-service (QoS) constraints. To do this, programmers need new abstractions. We propose two language abstractions, QoS type and QoS target, to capture two fundamental aspects of user QoS experience. We then present GreenWeb, a set of language extensions that empower developers to easily express the QoS abstractions as program annotations. As a proof of concept, we develop a GreenWeb runtime, which intelligently determines how to deliver specified user QoS expectation while minimizing energy consumption. Overall, GreenWeb shows significant energy savings (29.2% ∼ 66.0%) over Android’s default Interactive governor with few QoS violations. Our work demonstrates a promising first step toward language innovations for energy-efficient Web computing.","PeriodicalId":178839,"journal":{"name":"Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125889059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Ivy: safety verification by interactive generalization Ivy:通过交互泛化进行安全验证

Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2016-06-02 DOI: 10.1145/2908080.2908118

O. Padon, K. McMillan, Aurojit Panda, Shmuel Sagiv, Sharon Shoham

Despite several decades of research, the problem of formal verification of infinite-state systems has resisted effective automation. We describe a system --- Ivy --- for interactively verifying safety of infinite-state systems. Ivy's key principle is that whenever verification fails, Ivy graphically displays a concrete counterexample to induction. The user then interactively guides generalization from this counterexample. This process continues until an inductive invariant is found. Ivy searches for universally quantified invariants, and uses a restricted modeling language. This ensures that all verification conditions can be checked algorithmically. All user interactions are performed using graphical models, easing the user's task. We describe our initial experience with verifying several distributed protocols.

尽管经过了几十年的研究，无限状态系统的形式验证问题一直阻碍着有效的自动化。我们描述了一个系统——Ivy——用于交互式验证无限状态系统的安全性。Ivy的关键原则是，无论何时验证失败，Ivy都会以图形方式显示归纳的具体反例。然后，用户以交互方式指导从这个反例中归纳。这个过程一直持续到找到归纳不变量为止。Ivy搜索普遍量化的不变量，并使用一种受限的建模语言。这确保了所有验证条件都可以通过算法进行检查。所有用户交互都使用图形模型执行，简化了用户的任务。我们描述了验证几个分布式协议的初步经验。

引用次数: 177

MapReduce program synthesis MapReduce程序合成

Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2016-06-02 DOI: 10.1145/2908080.2908102

Calvin Smith, Aws Albarghouthi

By abstracting away the complexity of distributed systems, large-scale data processing platforms—MapReduce, Hadoop, Spark, Dryad, etc.—have provided developers with simple means for harnessing the power of the cloud. In this paper, we ask whether we can automatically synthesize MapReduce-style distributed programs from input–output examples. Our ultimate goal is to enable end users to specify large-scale data analyses through the simple interface of examples. We thus present a new algorithm and tool for synthesizing programs composed of efficient data-parallel operations that can execute on cloud computing infrastructure. We evaluate our tool on a range of real-world big-data analysis tasks and general computations. Our results demonstrate the efficiency of our approach and the small number of examples it requires to synthesize correct, scalable programs.

通过抽象出分布式系统的复杂性，大规模数据处理平台——mapreduce、Hadoop、Spark、Dryad等——为开发人员提供了利用云计算力量的简单方法。在本文中，我们探讨是否可以从输入输出示例中自动合成mapreduce风格的分布式程序。我们的最终目标是使最终用户能够通过简单的示例界面指定大规模数据分析。因此，我们提出了一种新的算法和工具，用于合成可在云计算基础设施上执行的高效数据并行操作组成的程序。我们通过一系列现实世界的大数据分析任务和一般计算来评估我们的工具。我们的结果证明了我们的方法的效率和少量的例子，它需要合成正确的，可扩展的程序。

引用次数: 91

Higher-order and tuple-based massively-parallel prefix sums 高阶和基于元组的大规模并行前缀和

Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2016-06-02 DOI: 10.1145/2908080.2908089

Sepideh Maleki, Annie Yang, Martin Burtscher

Prefix sums are an important parallel primitive, especially in massively-parallel programs. This paper discusses two orthogonal generalizations thereof, which we call higher-order and tuple-based prefix sums. Moreover, it describes and evaluates SAM, a GPU-friendly algorithm for computing prefix sums and other scans that directly supports higher orders and tuple values. Its templated CUDA implementation unifies all of these computations in a single 100-statement kernel. SAM is communication-efficient in the sense that it minimizes main-memory accesses. When computing prefix sums of a million or more values, it outperforms Thrust and CUDPP on both a Titan X and a K40 GPU. On the Titan X, SAM reaches memory-copy speeds for large input sizes, which cannot be surpassed. SAM outperforms CUB, the currently fastest conventional prefix sum implementation, by up to a factor of 2.9 on eighth-order prefix sums and by up to a factor of 2.6 on eight-tuple prefix sums.

前缀和是一个重要的并行原语，特别是在大规模并行程序中。本文讨论了它的两个正交推广，我们称之为高阶和基于元组的前缀和。此外，它还描述并评估了SAM，这是一种gpu友好的算法，用于计算前缀和和其他直接支持高阶和元组值的扫描。它的模板化CUDA实现将所有这些计算统一到一个包含100条语句的内核中。SAM是通信高效的，因为它最小化了对主存的访问。当计算100万或更多值的前缀和时，它在Titan X和K40 GPU上的性能都优于Thrust和CUDPP。在Titan X上，SAM达到了大输入容量的内存复制速度，这是无法超越的。SAM比CUB(目前最快的传统前缀和实现)的性能要好，在八阶前缀和上最多高出2.9倍，在八元组前缀和上最多高出2.6倍。

引用次数: 16

Idle time garbage collection scheduling 空闲时间垃圾收集调度

Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2016-06-02 DOI: 10.1145/2908080.2908106

Ulan Degenbaev, J. Eisinger, M. Ernst, R. McIlroy, H. Payer

Efficient garbage collection is increasingly important in today's managed language runtime systems that demand low latency, low memory consumption, and high throughput. Garbage collection may pause the application for many milliseconds to identify live memory, free unused memory, and compact fragmented regions of memory, even when employing concurrent garbage collection. In animation-based applications that require 60 frames per second, these pause times may be observable, degrading user experience. This paper introduces idle time garbage collection scheduling to increase the responsiveness of applications by hiding expensive garbage collection operations inside of small, otherwise unused idle portions of the application's execution, resulting in smoother animations. Additionally we take advantage of idleness to reduce memory consumption while allowing higher memory use when high throughput is required. We implemented idle time garbage collection scheduling in V8, an open-source, production JavaScript virtual machine running within Chrome. We present performance results on various benchmarks running popular webpages and show that idle time garbage collection scheduling can significantly improve latency and memory consumption. Furthermore, we introduce a new metric called frame time discrepancy to quantify the quality of the user experience and precisely measure the improvements that idle time garbage collection provides for a WebGL-based game benchmark. Idle time garbage collection is shipped and enabled by default in Chrome.

在当今要求低延迟、低内存消耗和高吞吐量的托管语言运行时系统中，高效的垃圾收集变得越来越重要。垃圾收集可能会将应用程序暂停许多毫秒，以识别活动内存、释放未使用的内存和压缩内存碎片区域，即使在使用并发垃圾收集时也是如此。在每秒需要60帧的基于动画的应用程序中，这些暂停时间可能是可观察到的，这会降低用户体验。本文介绍了空闲时间垃圾收集调度，通过将昂贵的垃圾收集操作隐藏在应用程序执行的小而未使用的空闲部分中，从而提高应用程序的响应性，从而产生更平滑的动画。此外，我们利用空闲来减少内存消耗，同时在需要高吞吐量时允许更高的内存使用。我们在V8中实现了空闲时间垃圾收集调度，V8是一个在Chrome中运行的开源、生产JavaScript虚拟机。我们展示了运行流行网页的各种基准测试的性能结果，并表明空闲时间垃圾收集调度可以显著改善延迟和内存消耗。此外，我们引入了一个名为帧时间差异的新指标来量化用户体验的质量，并精确测量空闲时间垃圾收集为基于webgl的游戏基准提供的改进。空闲时间垃圾收集是默认情况下在Chrome中发送和启用的。

{"title":"Idle time garbage collection scheduling","authors":"Ulan Degenbaev, J. Eisinger, M. Ernst, R. McIlroy, H. Payer","doi":"10.1145/2908080.2908106","DOIUrl":"https://doi.org/10.1145/2908080.2908106","url":null,"abstract":"Efficient garbage collection is increasingly important in today's managed language runtime systems that demand low latency, low memory consumption, and high throughput. Garbage collection may pause the application for many milliseconds to identify live memory, free unused memory, and compact fragmented regions of memory, even when employing concurrent garbage collection. In animation-based applications that require 60 frames per second, these pause times may be observable, degrading user experience. This paper introduces idle time garbage collection scheduling to increase the responsiveness of applications by hiding expensive garbage collection operations inside of small, otherwise unused idle portions of the application's execution, resulting in smoother animations. Additionally we take advantage of idleness to reduce memory consumption while allowing higher memory use when high throughput is required. We implemented idle time garbage collection scheduling in V8, an open-source, production JavaScript virtual machine running within Chrome. We present performance results on various benchmarks running popular webpages and show that idle time garbage collection scheduling can significantly improve latency and memory consumption. Furthermore, we introduce a new metric called frame time discrepancy to quantify the quality of the user experience and precisely measure the improvements that idle time garbage collection provides for a WebGL-based game benchmark. Idle time garbage collection is shipped and enabled by default in Chrome.","PeriodicalId":178839,"journal":{"name":"Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"164 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124961099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

Temporal NetKAT

Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2016-06-02 DOI: 10.1145/2908080.2908108

Ryan Beckett, M. Greenberg, D. Walker

Over the past 5-10 years, the rise of software-defined networking (SDN) has inspired a wide range of new systems, libraries, hypervisors and languages for programming, monitoring, and debugging network behavior. Oftentimes, these systems are disjoint—one language for programming and another for verification, and yet another for run-time monitoring and debugging. In this paper, we present a new, unified framework, called Temporal NetKAT, capable of facilitating all of these tasks at once. As its name suggests, Temporal NetKAT is the synthesis of two formal theories: past-time (finite trace) linear temporal logic and (network) Kleene Algebra with Tests. Temporal predicates allow programmers to write down concise properties of a packet’s path through the network and to make dynamic packet-forwarding, access control or debugging decisions on that basis. In addition to being useful for programming, the combined equational theory of LTL and NetKAT facilitates proofs of path-based correctness properties. Using new, general, proof techniques, we show that the equational semantics is sound with respect to the denotational semantics, and, for a class of programs we call network-wide programs, complete. We have also implemented a compiler for temporal NetKAT, evaluated its performance on a range of benchmarks, and studied the effectiveness of several optimizations.

在过去的5-10年里，软件定义网络(SDN)的兴起激发了一系列用于编程、监控和调试网络行为的新系统、库、管理程序和语言。通常，这些系统是分离的——一种语言用于编程，另一种语言用于验证，还有一种语言用于运行时监视和调试。在本文中，我们提出了一个新的，统一的框架，称为时态NetKAT，能够一次促进所有这些任务。正如它的名字所暗示的，时间NetKAT是两种形式理论的综合:过去时间(有限轨迹)线性时间逻辑和(网络)Kleene代数与测试。时间谓词允许程序员写下数据包通过网络路径的简明属性，并在此基础上做出动态数据包转发、访问控制或调试决策。除了对编程有用之外，LTL和NetKAT的组合等式理论还有助于证明基于路径的正确性。使用新的，一般的证明技术，我们证明了等式语义相对于指称语义是健全的，并且，对于一类我们称为网络范围程序的程序，是完整的。我们还实现了一个临时NetKAT编译器，在一系列基准测试中评估了它的性能，并研究了几种优化的有效性。

{"title":"Temporal NetKAT","authors":"Ryan Beckett, M. Greenberg, D. Walker","doi":"10.1145/2908080.2908108","DOIUrl":"https://doi.org/10.1145/2908080.2908108","url":null,"abstract":"Over the past 5-10 years, the rise of software-defined networking (SDN) has inspired a wide range of new systems, libraries, hypervisors and languages for programming, monitoring, and debugging network behavior. Oftentimes, these systems are disjoint—one language for programming and another for verification, and yet another for run-time monitoring and debugging. In this paper, we present a new, unified framework, called Temporal NetKAT, capable of facilitating all of these tasks at once. As its name suggests, Temporal NetKAT is the synthesis of two formal theories: past-time (finite trace) linear temporal logic and (network) Kleene Algebra with Tests. Temporal predicates allow programmers to write down concise properties of a packet’s path through the network and to make dynamic packet-forwarding, access control or debugging decisions on that basis. In addition to being useful for programming, the combined equational theory of LTL and NetKAT facilitates proofs of path-based correctness properties. Using new, general, proof techniques, we show that the equational semantics is sound with respect to the denotational semantics, and, for a class of programs we call network-wide programs, complete. We have also implemented a compiler for temporal NetKAT, evaluated its performance on a range of benchmarks, and studied the effectiveness of several optimizations.","PeriodicalId":178839,"journal":{"name":"Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131061674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

Exposing errors related to weak memory in GPU applications 在GPU应用程序中暴露与弱内存相关的错误

Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2016-06-02 DOI: 10.1145/2908080.2908114

Tyler Sorensen, A. Donaldson

We present the systematic design of a testing environment that uses stressing and fuzzing to reveal errors in GPU applications that arise due to weak memory effects. We evaluate our approach on seven GPUs spanning three Nvidia architectures, across ten CUDA applications that use fine-grained concurrency. Our results show that applications that rarely or never exhibit errors related to weak memory when executed natively can readily exhibit these errors when executed in our testing environment. Our testing environment also provides a means to help identify the root causes of such errors, and automatically suggests how to insert fences that harden an application against weak memory bugs. To understand the cost of GPU fences, we benchmark applications with fences provided by the hardening strategy as well as a more conservative, sound fencing strategy.

我们提出了一个测试环境的系统设计，该环境使用压力和模糊测试来揭示由于弱内存效应而产生的GPU应用程序中的错误。我们在七个gpu上评估了我们的方法，跨越三个Nvidia架构，跨越十个使用细粒度并发的CUDA应用程序。我们的结果表明，在本机执行时很少或从不出现与弱内存相关的错误的应用程序在我们的测试环境中执行时很容易出现这些错误。我们的测试环境还提供了一种方法来帮助识别此类错误的根本原因，并自动建议如何插入栅栏，使应用程序免受弱内存错误的侵害。为了了解GPU围栏的成本，我们对应用程序进行了基准测试，测试了加固策略提供的围栏以及更保守、更健全的围栏策略。

引用次数: 32

A distributed OpenCL framework using redundant computation and data replication 一个使用冗余计算和数据复制的分布式OpenCL框架

Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2016-06-02 DOI: 10.1145/2908080.2908094

Junghyun Kim, Gangwon Jo, Jaehoon Jung, Jungwon Kim, Jaejin Lee

Applications written solely in OpenCL or CUDA cannot execute on a cluster as a whole. Most previous approaches that extend these programming models to clusters are based on a common idea: designating a centralized host node and coordinating the other nodes with the host for computation. However, the centralized host node is a serious performance bottleneck when the number of nodes is large. In this paper, we propose a scalable and distributed OpenCL framework called SnuCL-D for large-scale clusters. SnuCL-D's remote device virtualization provides an OpenCL application with an illusion that all compute devices in a cluster are confined in a single node. To reduce the amount of control-message and data communication between nodes, SnuCL-D replicates the OpenCL host program execution and data in each node. We also propose a new OpenCL host API function and a queueing optimization technique that significantly reduce the overhead incurred by the previous centralized approaches. To show the effectiveness of SnuCL-D, we evaluate SnuCL-D with a microbenchmark and eleven benchmark applications on a large-scale CPU cluster and a medium-scale GPU cluster.

仅用OpenCL或CUDA编写的应用程序不能在整个集群上执行。以前将这些编程模型扩展到集群的大多数方法都基于一个共同的思想:指定一个集中的主机节点，并与主机协调其他节点进行计算。但是，当节点数量较大时，集中式主机节点是一个严重的性能瓶颈。在本文中，我们提出了一个可扩展的分布式OpenCL框架，称为SnuCL-D，用于大规模集群。SnuCL-D的远程设备虚拟化为OpenCL应用程序提供了一种错觉，即集群中的所有计算设备都被限制在单个节点中。为了减少节点之间控制消息和数据通信的数量，SnuCL-D在每个节点中复制OpenCL主机程序的执行和数据。我们还提出了一个新的OpenCL主机API函数和队列优化技术，可以显著减少以前集中式方法带来的开销。为了证明SnuCL-D的有效性，我们在大规模CPU集群和中等规模GPU集群上使用微基准测试和11个基准测试应用程序来评估SnuCL-D。

{"title":"A distributed OpenCL framework using redundant computation and data replication","authors":"Junghyun Kim, Gangwon Jo, Jaehoon Jung, Jungwon Kim, Jaejin Lee","doi":"10.1145/2908080.2908094","DOIUrl":"https://doi.org/10.1145/2908080.2908094","url":null,"abstract":"Applications written solely in OpenCL or CUDA cannot execute on a cluster as a whole. Most previous approaches that extend these programming models to clusters are based on a common idea: designating a centralized host node and coordinating the other nodes with the host for computation. However, the centralized host node is a serious performance bottleneck when the number of nodes is large. In this paper, we propose a scalable and distributed OpenCL framework called SnuCL-D for large-scale clusters. SnuCL-D's remote device virtualization provides an OpenCL application with an illusion that all compute devices in a cluster are confined in a single node. To reduce the amount of control-message and data communication between nodes, SnuCL-D replicates the OpenCL host program execution and data in each node. We also propose a new OpenCL host API function and a queueing optimization technique that significantly reduce the overhead incurred by the previous centralized approaches. To show the effectiveness of SnuCL-D, we evaluate SnuCL-D with a microbenchmark and eleven benchmark applications on a large-scale CPU cluster and a medium-scale GPU cluster.","PeriodicalId":178839,"journal":{"name":"Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122644623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Stratified synthesis: automatically learning the x86-64 instruction set 分层综合:自动学习x86-64指令集

Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation

Pub Date : 2016-06-02 DOI: 10.1145/2908080.2908121

Stefan Heule, Eric Schkufza, Rahul Sharma, A. Aiken

The x86-64 ISA sits at the bottom of the software stack of most desktop and server software. Because of its importance, many software analysis and verification tools depend, either explicitly or implicitly, on correct modeling of the semantics of x86-64 instructions. However, formal semantics for the x86-64 ISA are difficult to obtain and often written manually through great effort. We describe an automatically synthesized formal semantics of the input/output behavior for a large fraction of the x86-64 Haswell ISA’s many thousands of instruction variants. The key to our results is stratified synthesis, where we use a set of instructions whose semantics are known to synthesize the semantics of additional instructions whose semantics are unknown. As the set of formally described instructions increases, the synthesis vocabulary expands, making it possible to synthesize the semantics of increasingly complex instructions. Using this technique we automatically synthesized formal semantics for 1,795 instruction variants of the x86-64 Haswell ISA. We evaluate the learned semantics against manually written semantics (where available) and find that they are formally equivalent with the exception of 50 instructions, where the manually written semantics contain an error. We further find the learned formulas to be largely as precise as manually written ones and of similar size.

x86-64 ISA位于大多数桌面和服务器软件堆栈的底部。由于它的重要性，许多软件分析和验证工具或显式或隐式地依赖于对x86-64指令语义的正确建模。但是，x86-64 ISA的形式化语义很难获得，通常需要花费大量精力手动编写。我们描述了一个自动合成的输入/输出行为的形式化语义，用于x86-64 Haswell ISA的数千个指令变体中的很大一部分。我们的结果的关键是分层合成，我们使用一组语义已知的指令来合成语义未知的附加指令的语义。随着正式描述的指令集的增加，合成词汇表也随之扩展，使得合成日益复杂的指令的语义成为可能。使用这种技术，我们自动合成了x86-64 Haswell ISA的1795个指令变体的形式语义。我们将学习到的语义与手动编写的语义(在可用的情况下)进行比较，发现它们在形式上是等价的，只有50条指令例外，其中手动编写的语义包含错误。我们进一步发现，学到的公式在很大程度上与手写的公式一样精确，而且大小相似。

{"title":"Stratified synthesis: automatically learning the x86-64 instruction set","authors":"Stefan Heule, Eric Schkufza, Rahul Sharma, A. Aiken","doi":"10.1145/2908080.2908121","DOIUrl":"https://doi.org/10.1145/2908080.2908121","url":null,"abstract":"The x86-64 ISA sits at the bottom of the software stack of most desktop and server software. Because of its importance, many software analysis and verification tools depend, either explicitly or implicitly, on correct modeling of the semantics of x86-64 instructions. However, formal semantics for the x86-64 ISA are difficult to obtain and often written manually through great effort. We describe an automatically synthesized formal semantics of the input/output behavior for a large fraction of the x86-64 Haswell ISA’s many thousands of instruction variants. The key to our results is stratified synthesis, where we use a set of instructions whose semantics are known to synthesize the semantics of additional instructions whose semantics are unknown. As the set of formally described instructions increases, the synthesis vocabulary expands, making it possible to synthesize the semantics of increasingly complex instructions. Using this technique we automatically synthesized formal semantics for 1,795 instruction variants of the x86-64 Haswell ISA. We evaluate the learned semantics against manually written semantics (where available) and find that they are formally equivalent with the exception of 50 instructions, where the manually written semantics contain an error. We further find the learned formulas to be largely as precise as manually written ones and of similar size.","PeriodicalId":178839,"journal":{"name":"Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123246374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 73

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀