首页 > 最新文献

ACM Sigplan Notices最新文献

英文 中文
Incremental inference for probabilistic programs 概率程序的增量推理
Q1 Computer Science Pub Date : 2018-06-11 DOI: 10.1145/3296979.3192399
Marco F. Cusumano-Towner, Benjamin Bichsel, Timon Gehr, Martin T. Vechev, Vikash K. Mansinghka
We present a novel approach for approximate sampling in probabilistic programs based on incremental inference. The key idea is to adapt the samples for a program P into samples for a program Q, thereby avoiding the expensive sampling computation for program Q. To enable incremental inference in probabilistic programming, our work: (i) introduces the concept of a trace translator which adapts samples from P into samples of Q, (ii) phrases this translation approach in the context of sequential Monte Carlo (SMC), which gives theoretical guarantees that the adapted samples converge to the distribution induced by Q, and (iii) shows how to obtain a concrete trace translator by establishing a correspondence between the random choices of the two probabilistic programs. We implemented our approach in two different probabilistic programming systems and showed that, compared to methods that sample the program Q from scratch, incremental inference can lead to orders of magnitude increase in efficiency, depending on how closely related P and Q are.
提出了一种基于增量推理的概率规划近似抽样的新方法。关键思想是将程序P的样本调整为程序Q的样本,从而避免程序Q的昂贵采样计算。为了使概率规划中的增量推理成为可能,我们的工作:(i)引入了将P的样本改编为Q的样本的轨迹翻译器的概念,(ii)在顺序蒙特卡罗(SMC)的背景下描述了这种翻译方法,该方法从理论上保证了改编后的样本收敛于由Q引起的分布,(iii)展示了如何通过建立两个概率规划的随机选择之间的对应关系来获得具体的轨迹翻译器。我们在两个不同的概率编程系统中实现了我们的方法,并表明,与从头开始对程序Q进行采样的方法相比,增量推理可以导致效率的数量级提高,这取决于P和Q的密切程度。
{"title":"Incremental inference for probabilistic programs","authors":"Marco F. Cusumano-Towner, Benjamin Bichsel, Timon Gehr, Martin T. Vechev, Vikash K. Mansinghka","doi":"10.1145/3296979.3192399","DOIUrl":"https://doi.org/10.1145/3296979.3192399","url":null,"abstract":"We present a novel approach for approximate sampling in probabilistic programs based on incremental inference. The key idea is to adapt the samples for a program P into samples for a program Q, thereby avoiding the expensive sampling computation for program Q. To enable incremental inference in probabilistic programming, our work: (i) introduces the concept of a trace translator which adapts samples from P into samples of Q, (ii) phrases this translation approach in the context of sequential Monte Carlo (SMC), which gives theoretical guarantees that the adapted samples converge to the distribution induced by Q, and (iii) shows how to obtain a concrete trace translator by establishing a correspondence between the random choices of the two probabilistic programs. We implemented our approach in two different probabilistic programming systems and showed that, compared to methods that sample the program Q from scratch, incremental inference can lead to orders of magnitude increase in efficiency, depending on how closely related P and Q are.","PeriodicalId":50923,"journal":{"name":"ACM Sigplan Notices","volume":"28 1","pages":"571 - 585"},"PeriodicalIF":0.0,"publicationDate":"2018-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86667229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Mapping spiking neural networks onto a manycore neuromorphic architecture 将脉冲神经网络映射到多核神经形态架构上
Q1 Computer Science Pub Date : 2018-06-11 DOI: 10.1145/3296979.3192371
Chit-Kwan Lin, Andreas Wild, G. Chinya, Tsung-Han Lin, Mike Davies, Hong Wang
We present a compiler for Loihi, a novel manycore neuromorphic processor that features a programmable, on-chip learning engine for training and executing spiking neural networks (SNNs). An SNN is distinguished from other neural networks in that (1) its independent computing units, or "neurons", communicate with others only through spike messages; and (2) each neuron evaluates local learning rules, which are functions of spike arrival and departure timings, to modify its local state. The collective neuronal state dynamics of an SNN form a nonlinear dynamical system that can be cast as an unconventional model of computation. To realize such an SNN on Loihi requires each constituent neuron to locally store and independently update its own spike timing information. However, each Loihi core has limited resources for this purpose and these must be shared by neurons assigned to the same core. In this work, we present a compiler for Loihi that maps the neurons of an SNN onto and across Loihi's cores efficiently. We show that a poor neuron-to-core mapping can incur significant energy costs and address this with a greedy algorithm that compiles SNNs onto Loihi in a power-efficient manner. In so doing, we highlight the need for further development of compilers for this new, emerging class of architectures.
Loihi是一种新颖的多核神经形态处理器,具有可编程的片上学习引擎,用于训练和执行尖峰神经网络(snn)。SNN与其他神经网络的区别在于:(1)其独立的计算单元或“神经元”仅通过尖峰信息与其他神经网络进行通信;(2)每个神经元评估局部学习规则,即脉冲到达和离开时间的函数,以修改其局部状态。SNN的集体神经元状态动力学形成了一个非线性动力系统,可以作为一种非常规的计算模型。要在Loihi上实现这样的SNN,需要每个组成神经元局部存储并独立更新自己的尖峰时间信息。然而,每个Loihi核心用于此目的的资源有限,这些资源必须由分配给同一核心的神经元共享。在这项工作中,我们提出了一个Loihi编译器,它可以有效地将SNN的神经元映射到Loihi的核心上。我们表明,一个糟糕的神经元到核映射会产生显著的能量成本,并通过一种贪婪算法来解决这个问题,该算法以一种节能的方式将snn编译到Loihi上。在这样做的过程中,我们强调了为这类新兴的体系结构进一步开发编译器的必要性。
{"title":"Mapping spiking neural networks onto a manycore neuromorphic architecture","authors":"Chit-Kwan Lin, Andreas Wild, G. Chinya, Tsung-Han Lin, Mike Davies, Hong Wang","doi":"10.1145/3296979.3192371","DOIUrl":"https://doi.org/10.1145/3296979.3192371","url":null,"abstract":"We present a compiler for Loihi, a novel manycore neuromorphic processor that features a programmable, on-chip learning engine for training and executing spiking neural networks (SNNs). An SNN is distinguished from other neural networks in that (1) its independent computing units, or \"neurons\", communicate with others only through spike messages; and (2) each neuron evaluates local learning rules, which are functions of spike arrival and departure timings, to modify its local state. The collective neuronal state dynamics of an SNN form a nonlinear dynamical system that can be cast as an unconventional model of computation. To realize such an SNN on Loihi requires each constituent neuron to locally store and independently update its own spike timing information. However, each Loihi core has limited resources for this purpose and these must be shared by neurons assigned to the same core. In this work, we present a compiler for Loihi that maps the neurons of an SNN onto and across Loihi's cores efficiently. We show that a poor neuron-to-core mapping can incur significant energy costs and address this with a greedy algorithm that compiles SNNs onto Loihi in a power-efficient manner. In so doing, we highlight the need for further development of compilers for this new, emerging class of architectures.","PeriodicalId":50923,"journal":{"name":"ACM Sigplan Notices","volume":"15 1","pages":"78 - 89"},"PeriodicalIF":0.0,"publicationDate":"2018-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80462146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Inferring type rules for syntactic sugar 推断语法糖的类型规则
Q1 Computer Science Pub Date : 2018-06-11 DOI: 10.1145/3296979.3192398
Justin Pombrio, S. Krishnamurthi
Type systems and syntactic sugar are both valuable to programmers, but sometimes at odds. While sugar is a valuable mechanism for implementing realistic languages, the expansion process obscures program source structure. As a result, type errors can reference terms the programmers did not write (and even constructs they do not know), baffling them. The language developer must also manually construct type rules for the sugars, to give a typed account of the surface language. We address these problems by presenting a process for automatically reconstructing type rules for the surface language using rules for the core. We have implemented this theory, and show several interesting case studies.
类型系统和语法糖对程序员来说都很有价值,但有时会产生冲突。虽然糖是实现现实语言的一种有价值的机制,但扩展过程模糊了程序源结构。因此,类型错误可能会引用程序员没有编写的术语(甚至是他们不知道的结构),使他们感到困惑。语言开发人员还必须手动为糖构建类型规则,以给出表面语言的类型说明。我们通过提出一个使用核心规则自动重建表面语言类型规则的过程来解决这些问题。我们已经实现了这个理论,并展示了几个有趣的案例研究。
{"title":"Inferring type rules for syntactic sugar","authors":"Justin Pombrio, S. Krishnamurthi","doi":"10.1145/3296979.3192398","DOIUrl":"https://doi.org/10.1145/3296979.3192398","url":null,"abstract":"Type systems and syntactic sugar are both valuable to programmers, but sometimes at odds. While sugar is a valuable mechanism for implementing realistic languages, the expansion process obscures program source structure. As a result, type errors can reference terms the programmers did not write (and even constructs they do not know), baffling them. The language developer must also manually construct type rules for the sugars, to give a typed account of the surface language. We address these problems by presenting a process for automatically reconstructing type rules for the surface language using rules for the core. We have implemented this theory, and show several interesting case studies.","PeriodicalId":50923,"journal":{"name":"ACM Sigplan Notices","volume":"94 1","pages":"812 - 825"},"PeriodicalIF":0.0,"publicationDate":"2018-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84594377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Symbolic reasoning for automatic signal placement 自动信号放置的符号推理
Q1 Computer Science Pub Date : 2018-04-07 DOI: 10.1145/3296979.3192395
Kostas Ferles, Jacob Van Geffen, Işıl Dillig, Y. Smaragdakis
Explicit signaling between threads is a perennial cause of bugs in concurrent programs. While there are several run-time techniques to automatically notify threads upon the availability of some shared resource, such techniques are not widely-adopted due to their run-time overhead. This paper proposes a new solution based on static analysis for automatically generating a performant explicit-signal program from its corresponding implicit-signal implementation. The key idea is to generate verification conditions that allow us to minimize the number of required signals and unnecessary context switches, while guaranteeing semantic equivalence between the source and target programs. We have implemented our method in a tool called Expresso and evaluate it on challenging benchmarks from prior papers and open-source software. Expresso-generated code significantly outperforms past automatic signaling mechanisms (avg. 1.56x speedup) and closely matches the performance of hand-optimized explicit-signal code.
线程之间的显式信号是并发程序中bug的长期原因。虽然有几种运行时技术可以在某些共享资源可用时自动通知线程,但由于它们的运行时开销,这些技术并没有被广泛采用。本文提出了一种基于静态分析的方法,从相应的隐式信号实现自动生成高性能的显式信号程序。关键思想是生成验证条件,使我们能够最小化所需信号和不必要的上下文切换的数量,同时保证源程序和目标程序之间的语义等价。我们在一个名为Expresso的工具中实现了我们的方法,并在以前的论文和开源软件的挑战性基准上对其进行了评估。expresso生成的代码明显优于过去的自动信号机制(平均加速1.56倍),并且与手动优化的显式信号代码的性能非常接近。
{"title":"Symbolic reasoning for automatic signal placement","authors":"Kostas Ferles, Jacob Van Geffen, Işıl Dillig, Y. Smaragdakis","doi":"10.1145/3296979.3192395","DOIUrl":"https://doi.org/10.1145/3296979.3192395","url":null,"abstract":"Explicit signaling between threads is a perennial cause of bugs in concurrent programs. While there are several run-time techniques to automatically notify threads upon the availability of some shared resource, such techniques are not widely-adopted due to their run-time overhead. This paper proposes a new solution based on static analysis for automatically generating a performant explicit-signal program from its corresponding implicit-signal implementation. The key idea is to generate verification conditions that allow us to minimize the number of required signals and unnecessary context switches, while guaranteeing semantic equivalence between the source and target programs. We have implemented our method in a tool called Expresso and evaluate it on challenging benchmarks from prior papers and open-source software. Expresso-generated code significantly outperforms past automatic signaling mechanisms (avg. 1.56x speedup) and closely matches the performance of hand-optimized explicit-signal code.","PeriodicalId":50923,"journal":{"name":"ACM Sigplan Notices","volume":"1 1","pages":"120 - 134"},"PeriodicalIF":0.0,"publicationDate":"2018-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86897473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Enhancing Cross-ISA DBT Through Automatically Learned Translation Rules 通过自动学习翻译规则增强跨isa DBT
Q1 Computer Science Pub Date : 2018-03-19 DOI: 10.1145/3296957.3177160
WangWenwen, McCamantStephen, ZhaiAntonia, YewPen-Chung
This paper presents a novel approach for dynamic binary translation (DBT) to automatically learn translation rules from guest and host binaries compiled from the same source code. The learned trans...
本文提出了一种动态二进制翻译(DBT)的新方法,该方法可以从从相同源代码编译的来宾和主机二进制文件中自动学习翻译规则。博学的变性人…
{"title":"Enhancing Cross-ISA DBT Through Automatically Learned Translation Rules","authors":"WangWenwen, McCamantStephen, ZhaiAntonia, YewPen-Chung","doi":"10.1145/3296957.3177160","DOIUrl":"https://doi.org/10.1145/3296957.3177160","url":null,"abstract":"This paper presents a novel approach for dynamic binary translation (DBT) to automatically learn translation rules from guest and host binaries compiled from the same source code. The learned trans...","PeriodicalId":50923,"journal":{"name":"ACM Sigplan Notices","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3296957.3177160","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43825226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The Architectural Implications of Autonomous Driving 自动驾驶的架构含义
Q1 Computer Science Pub Date : 2018-03-19 DOI: 10.1145/3296957.3173191
LinShih-Chieh, ZhangYunqi, HsuChang-Hong, SkachMatt, E. HaqueMd, TangLingjia, MarsJason
Autonomous driving systems have attracted a significant amount of interest recently, and many industry leaders, such as Google, Uber, Tesla, and Mobileye, have invested a large amount of capital an...
自动驾驶系统最近引起了人们的极大兴趣,许多行业领导者,如b谷歌、Uber、特斯拉和Mobileye,都投入了大量资金……
{"title":"The Architectural Implications of Autonomous Driving","authors":"LinShih-Chieh, ZhangYunqi, HsuChang-Hong, SkachMatt, E. HaqueMd, TangLingjia, MarsJason","doi":"10.1145/3296957.3173191","DOIUrl":"https://doi.org/10.1145/3296957.3173191","url":null,"abstract":"Autonomous driving systems have attracted a significant amount of interest recently, and many industry leaders, such as Google, Uber, Tesla, and Mobileye, have invested a large amount of capital an...","PeriodicalId":50923,"journal":{"name":"ACM Sigplan Notices","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3296957.3173191","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42958270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
FirmUp FirmUp
Q1 Computer Science Pub Date : 2018-03-19 DOI: 10.1145/3296957.3177157
Yaniv David, Nimrod Partush, Eran Yahav
We present a static, precise, and scalable technique for finding CVEs (Common Vulnerabilities and Exposures) in stripped firmware images. Our technique is able to efficiently find vulnerabilities in real-world firmware with high accuracy. Given a vulnerable procedure in an executable binary and a firmware image containing multiple stripped binaries, our goal is to detect possible occurrences of the vulnerable procedure in the firmware image. Due to the variety of architectures and unique tool chains used by vendors, as well as the highly customized nature of firmware, identifying procedures in stripped firmware is extremely challenging. Vulnerability detection requires not only pairwise similarity between procedures but also information about the relationships between procedures in the surrounding executable. This observation serves as the foundation for a novel technique that establishes a partial correspondence between procedures in the two binaries. We implemented our technique in a tool called FirmUp and performed an extensive evaluation over 40 million procedures, over 4 different prevalent architectures, crawled from public vendor firmware images. We discovered 373 vulnerabilities affecting publicly available firmware, 147 of them in the latest available firmware version for the device. A thorough comparison of FirmUp to previous methods shows that it accurately and effectively finds vulnerabilities in firmware, while outperforming the detection rate of the state of the art by 45% on average.
我们提出了一种静态、精确和可扩展的技术,用于在剥离固件映像中查找cve(常见漏洞和暴露)。我们的技术能够在真实世界的固件中以高精度有效地发现漏洞。给定可执行二进制文件中的一个易受攻击过程和包含多个剥离二进制文件的固件映像,我们的目标是检测固件映像中可能出现的易受攻击过程。由于供应商使用的各种架构和独特的工具链,以及固件的高度定制性,在剥离固件中识别程序是极具挑战性的。漏洞检测不仅需要过程之间的成对相似性,还需要关于周围可执行程序中过程之间关系的信息。这一观察结果为一种新技术奠定了基础,该技术在两个二进制文件中的过程之间建立了部分对应关系。我们在一个名为FirmUp的工具中实现了我们的技术,并对从公共供应商固件映像中抓取的4种不同的流行架构的4000多万个程序进行了广泛的评估。我们发现了373个影响公开可用固件的漏洞,其中147个存在于设备的最新可用固件版本中。将FirmUp与之前的方法进行全面比较表明,它可以准确有效地发现固件中的漏洞,同时平均比现有技术的检测率高出45%。
{"title":"FirmUp","authors":"Yaniv David, Nimrod Partush, Eran Yahav","doi":"10.1145/3296957.3177157","DOIUrl":"https://doi.org/10.1145/3296957.3177157","url":null,"abstract":"We present a static, precise, and scalable technique for finding CVEs (Common Vulnerabilities and Exposures) in stripped firmware images. Our technique is able to efficiently find vulnerabilities in real-world firmware with high accuracy. Given a vulnerable procedure in an executable binary and a firmware image containing multiple stripped binaries, our goal is to detect possible occurrences of the vulnerable procedure in the firmware image. Due to the variety of architectures and unique tool chains used by vendors, as well as the highly customized nature of firmware, identifying procedures in stripped firmware is extremely challenging. Vulnerability detection requires not only pairwise similarity between procedures but also information about the relationships between procedures in the surrounding executable. This observation serves as the foundation for a novel technique that establishes a partial correspondence between procedures in the two binaries. We implemented our technique in a tool called FirmUp and performed an extensive evaluation over 40 million procedures, over 4 different prevalent architectures, crawled from public vendor firmware images. We discovered 373 vulnerabilities affecting publicly available firmware, 147 of them in the latest available firmware version for the device. A thorough comparison of FirmUp to previous methods shows that it accurately and effectively finds vulnerabilities in firmware, while outperforming the detection rate of the state of the art by 45% on average.","PeriodicalId":50923,"journal":{"name":"ACM Sigplan Notices","volume":"183 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85617753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Gloss 光泽
Q1 Computer Science Pub Date : 2018-03-19 DOI: 10.1145/3296957.3173170
S. Rajadurai, Jeffrey Bosboom, W. Wong, S. Amarasinghe
An important class of applications computes on long-running or infinite streams of data, often with known fixed data rates. The latter is referred to as synchronous data flow ~(SDF) streams. These stream applications need to run on clusters or the cloud due to the high performance requirement. Further, they require live reconfiguration and reoptimization for various reasons such as hardware maintenance, elastic computation, or to respond to fluctuations in resources or application workload. However, reconfiguration and reoptimization without downtime while accurately preserving program state in a distributed environment is difficult. In this paper, we introduce Gloss, a suite of compiler and runtime techniques for live reconfiguration of distributed stream programs. Gloss, for the first time, avoids periods of zero throughput during the reconfiguration of both stateless and stateful SDF based stream programs. Furthermore, unlike other systems, Gloss globally reoptimizes and completely recompiles the program during reconfiguration. This permits it to reoptimize the application for entirely new configurations that it may not have encountered before. All these Gloss operations happen in-situ, requiring no extra hardware resources. We show how Gloss allows stream programs to reconfigure and reoptimize with no downtime and minimal overhead, and demonstrate the wider applicability of it via a variety of experiments.
一类重要的应用程序对长时间运行或无限的数据流进行计算,通常具有已知的固定数据速率。后者被称为同步数据流(SDF)流。由于高性能需求,这些流应用程序需要在集群或云上运行。此外,由于硬件维护、弹性计算或响应资源或应用程序工作负载的波动等各种原因,它们需要实时重新配置和重新优化。然而,要在不停机的情况下重新配置和重新优化,同时在分布式环境中准确地保持程序状态是很困难的。在本文中,我们介绍了Gloss,这是一套用于分布式流程序实时重构的编译器和运行时技术。Gloss首次避免了在无状态和有状态的基于SDF的流程序重新配置期间的零吞吐量。此外,与其他系统不同,Gloss在重新配置期间全局重新优化并完全重新编译程序。这允许它重新优化应用程序,以获得以前可能没有遇到过的全新配置。所有这些Gloss操作都在现场进行,不需要额外的硬件资源。我们展示了Gloss如何允许流程序在没有停机时间和最小开销的情况下重新配置和重新优化,并通过各种实验证明了它的更广泛的适用性。
{"title":"Gloss","authors":"S. Rajadurai, Jeffrey Bosboom, W. Wong, S. Amarasinghe","doi":"10.1145/3296957.3173170","DOIUrl":"https://doi.org/10.1145/3296957.3173170","url":null,"abstract":"\u0000 An important class of applications computes on long-running or infinite streams of data, often with known fixed data rates. The latter is referred to as\u0000 synchronous data flow\u0000 ~(SDF) streams. These stream applications need to run on clusters or the cloud due to the high performance requirement. Further, they require live reconfiguration and reoptimization for various reasons such as hardware maintenance, elastic computation, or to respond to fluctuations in resources or application workload. However, reconfiguration and reoptimization without downtime while accurately preserving program state in a distributed environment is difficult. In this paper, we introduce Gloss, a suite of compiler and runtime techniques for live reconfiguration of distributed stream programs. Gloss, for the first time, avoids periods of zero throughput during the reconfiguration of both stateless and stateful SDF based stream programs. Furthermore, unlike other systems, Gloss globally reoptimizes and completely recompiles the program during reconfiguration. This permits it to reoptimize the application for entirely new configurations that it may not have encountered before. All these Gloss operations happen in-situ, requiring no extra hardware resources. We show how Gloss allows stream programs to reconfigure and reoptimize with no downtime and minimal overhead, and demonstrate the wider applicability of it via a variety of experiments.\u0000","PeriodicalId":50923,"journal":{"name":"ACM Sigplan Notices","volume":"344 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77621877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
LTRF
Q1 Computer Science Pub Date : 2018-03-19 DOI: 10.1145/3296957.3173211
Mohammad Sadrosadati, Amirhossein Mirhosseini, Seyed Borna Ehsani, H. Sarbazi-Azad, M. Drumond, B. Falsafi, Rachata Ausavarungnirun, O. Mutlu
Graphics Processing Units (GPUs) employ large register files to accommodate all active threads and accelerate context switching. Unfortunately, register files are a scalability bottleneck for future GPUs due to long access latency, high power consumption, and large silicon area provisioning. Prior work proposes hierarchical register file, to reduce the register file power consumption by caching registers in a smaller register file cache. Unfortunately, this approach does not improve register access latency due to the low hit rate in the register file cache. In this paper, we propose the Latency-Tolerant Register File (LTRF) architecture to achieve low latency in a two-level hierarchical structure while keeping power consumption low. We observe that compile-time interval analysis enables us to divide GPU program execution into intervals with an accurate estimate of a warp's aggregate register working-set within each interval. The key idea of LTRF is to prefetch the estimated register working-set from the main register file to the register file cache under software control, at the beginning of each interval, and overlap the prefetch latency with the execution of other warps. Our experimental results show that LTRF enables high-capacity yet long-latency main GPU register files, paving the way for various optimizations. As an example optimization, we implement the main register file with emerging high-density high-latency memory technologies, enabling 8X larger capacity and improving overall GPU performance by 31% while reducing register file power consumption by 46%.
图形处理单元(gpu)使用大型寄存器文件来容纳所有活动线程并加速上下文切换。不幸的是,由于长访问延迟、高功耗和大芯片面积配置,寄存器文件是未来gpu的可伸缩性瓶颈。先前的工作提出了分层寄存器文件,通过在较小的寄存器文件缓存中缓存寄存器来减少寄存器文件的功耗。不幸的是,由于寄存器文件缓存中的低命中率,这种方法不能改善寄存器访问延迟。在本文中,我们提出了延迟容忍寄存器文件(LTRF)架构,以实现低延迟的两级分层结构,同时保持低功耗。我们观察到编译时间间隔分析使我们能够将GPU程序执行划分为间隔,并在每个间隔内准确估计warp的总寄存器工作集。LTRF的关键思想是在软件控制下,在每个间隔的开始,从主寄存器文件预取估计的寄存器工作集到寄存器文件缓存,并将预取延迟与其他warps的执行重叠。我们的实验结果表明,ltf支持高容量但长延迟的主GPU寄存器文件,为各种优化铺平了道路。作为优化示例,我们使用新兴的高密度高延迟内存技术实现主寄存器文件,使容量增加8倍,并将整体GPU性能提高31%,同时将寄存器文件功耗降低46%。
{"title":"LTRF","authors":"Mohammad Sadrosadati, Amirhossein Mirhosseini, Seyed Borna Ehsani, H. Sarbazi-Azad, M. Drumond, B. Falsafi, Rachata Ausavarungnirun, O. Mutlu","doi":"10.1145/3296957.3173211","DOIUrl":"https://doi.org/10.1145/3296957.3173211","url":null,"abstract":"Graphics Processing Units (GPUs) employ large register files to accommodate all active threads and accelerate context switching. Unfortunately, register files are a scalability bottleneck for future GPUs due to long access latency, high power consumption, and large silicon area provisioning. Prior work proposes hierarchical register file, to reduce the register file power consumption by caching registers in a smaller register file cache. Unfortunately, this approach does not improve register access latency due to the low hit rate in the register file cache. In this paper, we propose the Latency-Tolerant Register File (LTRF) architecture to achieve low latency in a two-level hierarchical structure while keeping power consumption low. We observe that compile-time interval analysis enables us to divide GPU program execution into intervals with an accurate estimate of a warp's aggregate register working-set within each interval. The key idea of LTRF is to prefetch the estimated register working-set from the main register file to the register file cache under software control, at the beginning of each interval, and overlap the prefetch latency with the execution of other warps. Our experimental results show that LTRF enables high-capacity yet long-latency main GPU register files, paving the way for various optimizations. As an example optimization, we implement the main register file with emerging high-density high-latency memory technologies, enabling 8X larger capacity and improving overall GPU performance by 31% while reducing register file power consumption by 46%.","PeriodicalId":50923,"journal":{"name":"ACM Sigplan Notices","volume":"80 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86490881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Datasize-Aware High Dimensional Configurations Auto-Tuning of In-Memory Cluster Computing 内存集群计算的数据感知高维配置自动调优
Q1 Computer Science Pub Date : 2018-03-19 DOI: 10.1145/3296957.3173187
YuZhibin, BeiZhendong, QianXuehai
In-Memory cluster Computing (IMC) frameworks (e.g., Spark) have become increasingly important because they typically achieve more than 10× speedups over the traditional On-Disk cluster Computing (O...
内存集群计算(IMC)框架(例如Spark)已经变得越来越重要,因为它们通常比传统的磁盘集群计算(O…
{"title":"Datasize-Aware High Dimensional Configurations Auto-Tuning of In-Memory Cluster Computing","authors":"YuZhibin, BeiZhendong, QianXuehai","doi":"10.1145/3296957.3173187","DOIUrl":"https://doi.org/10.1145/3296957.3173187","url":null,"abstract":"In-Memory cluster Computing (IMC) frameworks (e.g., Spark) have become increasingly important because they typically achieve more than 10× speedups over the traditional On-Disk cluster Computing (O...","PeriodicalId":50923,"journal":{"name":"ACM Sigplan Notices","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3296957.3173187","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45723935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
期刊
ACM Sigplan Notices
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1