首页 > 最新文献

ASPLOS X最新文献

英文 中文
Compiler optimization of scalar value communication between speculative threads 在推测线程之间的标量值通信的编译器优化
Pub Date : 2002-10-05 DOI: 10.1145/605397.605416
Antonia Zhai, Christopher B. Colohan, J. Steffan, T. Mowry
While there have been many recent proposals for hardware that supports Thread-Level Speculation (TLS), there has been relatively little work on compiler optimizations to fully exploit this potential for parallelizing programs optimistically. In this paper, we focus on one important limitation of program performance under TLS, which is stalls due to forwarding scalar values between threads that would otherwise cause frequent data dependences. We present and evaluate dataflow algorithms for three increasingly-aggressive instruction scheduling techniques that reduce the critical forwarding path introduced by the synchronization associated with this data forwarding. In addition, we contrast our compiler techniques with related hardware-only approaches. With our most aggressive compiler and hardware techniques, we improve performance under TLS by 6.2-28.5% for 6 of 14 applications, and by at least 2.7% for half of the other applications.
虽然最近有许多关于支持线程级推测(TLS)的硬件的建议,但在编译器优化方面,为了充分利用这种潜力来乐观地并行化程序,所做的工作相对较少。在本文中,我们关注的是TLS下程序性能的一个重要限制,即由于在线程之间转发标量值而导致的延迟,否则会导致频繁的数据依赖。我们提出并评估了三种日益激进的指令调度技术的数据流算法,这些技术减少了与此数据转发相关的同步所引入的关键转发路径。此外,我们将我们的编译器技术与相关的纯硬件方法进行了对比。使用我们最先进的编译器和硬件技术,我们将14个应用程序中的6个应用程序在TLS下的性能提高了6.2-28.5%,其他一半应用程序的性能至少提高了2.7%。
{"title":"Compiler optimization of scalar value communication between speculative threads","authors":"Antonia Zhai, Christopher B. Colohan, J. Steffan, T. Mowry","doi":"10.1145/605397.605416","DOIUrl":"https://doi.org/10.1145/605397.605416","url":null,"abstract":"While there have been many recent proposals for hardware that supports Thread-Level Speculation (TLS), there has been relatively little work on compiler optimizations to fully exploit this potential for parallelizing programs optimistically. In this paper, we focus on one important limitation of program performance under TLS, which is stalls due to forwarding scalar values between threads that would otherwise cause frequent data dependences. We present and evaluate dataflow algorithms for three increasingly-aggressive instruction scheduling techniques that reduce the critical forwarding path introduced by the synchronization associated with this data forwarding. In addition, we contrast our compiler techniques with related hardware-only approaches. With our most aggressive compiler and hardware techniques, we improve performance under TLS by 6.2-28.5% for 6 of 14 applications, and by at least 2.7% for half of the other applications.","PeriodicalId":377379,"journal":{"name":"ASPLOS X","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133074913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 141
Evolving RPC for active storage 改进用于活动存储的RPC
Pub Date : 2002-10-05 DOI: 10.1145/605397.605425
Muthian Sivathanu, A. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
We introduce Scriptable RPC (SRPC), an RPC-based framework that enables distributed system services to take advantage of active components. Technology trends point to a world where each component in a system (whether disk, network interface, or memory) has substantial computational capabilities; however, traditional methods of building distributed services are not designed to take advantage of these new architectures, mandating wholesale change of the software base to exploit more powerful hardware. In contrast, SRPC provides a direct and simple migration path for traditional services into the active environment.We demonstrate the power and flexibility of the SRPC framework through a series of case studies, with a focus on active storage servers. Specifically, we find three advantages to our approach. First, SRPC improves the performance of distributed file servers, reducing latency by combining the execution of operations at the file server. Second, SRPC enables the ready addition of new functionality; for example, more powerful cache consistency models can be realized on top of a server that exports a simple NFS-like interface. Third, SRPC simplifies the construction of distributed services; operations that are difficult to coordinate across client and server can now be co-executed at the server, thus avoiding costly agreement and crash-recovery protocols.
我们介绍了可脚本RPC (SRPC),这是一个基于RPC的框架,它使分布式系统服务能够利用活动组件。技术趋势表明,系统中的每个组件(无论是磁盘、网络接口还是内存)都具有强大的计算能力;然而,构建分布式服务的传统方法并不是为了利用这些新体系结构而设计的,而是要求对软件基础进行大规模更改,以利用更强大的硬件。相反,SRPC为传统服务到活动环境提供了直接和简单的迁移路径。我们通过一系列案例研究展示了SRPC框架的强大功能和灵活性,重点是活动存储服务器。具体来说,我们发现我们的方法有三个优点。首先,SRPC提高了分布式文件服务器的性能,通过在文件服务器上组合执行操作来减少延迟。其次,SRPC可以随时添加新功能;例如,可以在导出简单的类nfs接口的服务器上实现更强大的缓存一致性模型。第三,SRPC简化了分布式服务的构建;难以跨客户机和服务器协调的操作现在可以在服务器上共同执行,从而避免了代价高昂的协议和崩溃恢复协议。
{"title":"Evolving RPC for active storage","authors":"Muthian Sivathanu, A. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau","doi":"10.1145/605397.605425","DOIUrl":"https://doi.org/10.1145/605397.605425","url":null,"abstract":"We introduce Scriptable RPC (SRPC), an RPC-based framework that enables distributed system services to take advantage of active components. Technology trends point to a world where each component in a system (whether disk, network interface, or memory) has substantial computational capabilities; however, traditional methods of building distributed services are not designed to take advantage of these new architectures, mandating wholesale change of the software base to exploit more powerful hardware. In contrast, SRPC provides a direct and simple migration path for traditional services into the active environment.We demonstrate the power and flexibility of the SRPC framework through a series of case studies, with a focus on active storage servers. Specifically, we find three advantages to our approach. First, SRPC improves the performance of distributed file servers, reducing latency by combining the execution of operations at the file server. Second, SRPC enables the ready addition of new functionality; for example, more powerful cache consistency models can be realized on top of a server that exports a simple NFS-like interface. Third, SRPC simplifies the construction of distributed services; operations that are difficult to coordinate across client and server can now be co-executed at the server, thus avoiding costly agreement and crash-recovery protocols.","PeriodicalId":377379,"journal":{"name":"ASPLOS X","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121280764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Joint local and global hardware adaptations for energy 联合本地和全球硬件适应能源
Pub Date : 2002-10-05 DOI: 10.1145/605397.605413
Ruchira Sasanka, C. Hughes, S. Adve
This work concerns algorithms to control energy-driven architecture adaptations for multimedia applications, without and with dynamic voltage scaling (DVS). We identify a broad design space for adaptation control algorithms based on two attributes: (1) when to adapt or temporal granularity and (2) what structures to adapt or spatial granularity. For each attribute, adaptation may be global or local. Our previous work developed a temporally and spatially global algorithm. It invokes adaptation at the granularity of a full frame of a multimedia application (temporally global) and considers the entire hardware configuration at a time (spatially global). It exploits inter-frame execution time variability, slowing computation just enough to eliminate idle time before the real-time deadline.This paper explores temporally and spatially local algorithms and their integration with the previous global algorithm. The local algorithms invoke architectural adaptation within an application frame to exploit intra-frame execution variability, and attempt to save energy without affecting execution time. We consider local algorithms previously studied for non-real-time applications as well as propose new algorithms. We find that, for systems without and with DVS, the local algorithms are effective in saving energy for multimedia applications, but the new integrated global and local algorithm is best for the systems and applications studied.
这项工作涉及控制多媒体应用的能量驱动架构适应的算法,没有和有动态电压缩放(DVS)。我们基于两个属性为自适应控制算法确定了一个广泛的设计空间:(1)何时适应或时间粒度;(2)适应什么结构或空间粒度。对于每个属性,适应可以是全局的,也可以是局部的。我们之前的工作开发了一种时间和空间全局算法。它以多媒体应用程序的全帧粒度(暂时全局)调用适配,并一次考虑整个硬件配置(空间全局)。它利用帧间执行时间的可变性,将计算速度放慢到足以消除实时截止日期之前的空闲时间。本文探讨了时间和空间局部算法及其与先前全局算法的集成。本地算法调用应用程序框架内的体系结构适应,以利用帧内执行的可变性,并尝试在不影响执行时间的情况下节省能源。我们考虑了以前研究的非实时应用的局部算法,并提出了新的算法。我们发现,对于没有分布式交换机和有分布式交换机的系统,局部算法在多媒体应用中都是有效的节能算法,但对于所研究的系统和应用,新的全局和局部集成算法是最好的。
{"title":"Joint local and global hardware adaptations for energy","authors":"Ruchira Sasanka, C. Hughes, S. Adve","doi":"10.1145/605397.605413","DOIUrl":"https://doi.org/10.1145/605397.605413","url":null,"abstract":"This work concerns algorithms to control energy-driven architecture adaptations for multimedia applications, without and with dynamic voltage scaling (DVS). We identify a broad design space for adaptation control algorithms based on two attributes: (1) when to adapt or temporal granularity and (2) what structures to adapt or spatial granularity. For each attribute, adaptation may be global or local. Our previous work developed a temporally and spatially global algorithm. It invokes adaptation at the granularity of a full frame of a multimedia application (temporally global) and considers the entire hardware configuration at a time (spatially global). It exploits inter-frame execution time variability, slowing computation just enough to eliminate idle time before the real-time deadline.This paper explores temporally and spatially local algorithms and their integration with the previous global algorithm. The local algorithms invoke architectural adaptation within an application frame to exploit intra-frame execution variability, and attempt to save energy without affecting execution time. We consider local algorithms previously studied for non-real-time applications as well as propose new algorithms. We find that, for systems without and with DVS, the local algorithms are effective in saving energy for multimedia applications, but the new integrated global and local algorithm is best for the systems and applications studied.","PeriodicalId":377379,"journal":{"name":"ASPLOS X","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124322297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 76
Maté: a tiny virtual machine for sensor networks mat<s:1>:用于传感器网络的微型虚拟机
Pub Date : 2002-10-01 DOI: 10.1145/605397.605407
P. Levis, D. Culler
Composed of tens of thousands of tiny devices with very limited resources ("motes"), sensor networks are subject to novel systems problems and constraints. The large number of motes in a sensor network means that there will often be some failing nodes; networks must be easy to repopulate. Often there is no feasible method to recharge motes, so energy is a precious resource. Once deployed, a network must be reprogrammable although physically unreachable, and this reprogramming can be a significant energy cost.We present Maté, a tiny communication-centric virtual machine designed for sensor networks. Maté's high-level interface allows complex programs to be very short (under 100 bytes), reducing the energy cost of transmitting new programs. Code is broken up into small capsules of 24 instructions, which can self-replicate through the network. Packet sending and reception capsules enable the deployment of ad-hoc routing and data aggregation algorithms. Maté's concise, high-level program representation simplifies programming and allows large networks to be frequently reprogrammed in an energy-efficient manner; in addition, its safe execution environment suggests a use of virtual machines to provide the user/kernel boundary on motes that have no hardware protection mechanisms.
传感器网络由成千上万的微型设备和非常有限的资源(“mote”)组成,受到新的系统问题和限制。传感器网络中存在大量的节点,这意味着经常会有一些故障节点;网络必须易于重新填充。通常没有可行的方法来给手机充电,所以能量是一种宝贵的资源。一旦部署,网络必须重新编程,尽管物理上不可达,这种重新编程可能是一个重大的能源成本。我们展示了mat,一个为传感器网络设计的以通信为中心的微型虚拟机。mat的高级接口允许复杂的程序非常短(小于100字节),减少了传输新程序的能量成本。代码被分解成由24条指令组成的小胶囊,这些指令可以通过网络自我复制。数据包发送和接收胶囊支持部署ad-hoc路由和数据聚合算法。mat简洁、高级的程序表示简化了编程,并允许大型网络以节能的方式频繁重新编程;此外,它的安全执行环境建议使用虚拟机在没有硬件保护机制的节点上提供用户/内核边界。
{"title":"Maté: a tiny virtual machine for sensor networks","authors":"P. Levis, D. Culler","doi":"10.1145/605397.605407","DOIUrl":"https://doi.org/10.1145/605397.605407","url":null,"abstract":"Composed of tens of thousands of tiny devices with very limited resources (\"motes\"), sensor networks are subject to novel systems problems and constraints. The large number of motes in a sensor network means that there will often be some failing nodes; networks must be easy to repopulate. Often there is no feasible method to recharge motes, so energy is a precious resource. Once deployed, a network must be reprogrammable although physically unreachable, and this reprogramming can be a significant energy cost.We present Maté, a tiny communication-centric virtual machine designed for sensor networks. Maté's high-level interface allows complex programs to be very short (under 100 bytes), reducing the energy cost of transmitting new programs. Code is broken up into small capsules of 24 instructions, which can self-replicate through the network. Packet sending and reception capsules enable the deployment of ad-hoc routing and data aggregation algorithms. Maté's concise, high-level program representation simplifies programming and allows large networks to be frequently reprogrammed in an energy-efficient manner; in addition, its safe execution environment suggests a use of virtual machines to provide the user/kernel boundary on motes that have no hardware protection mechanisms.","PeriodicalId":377379,"journal":{"name":"ASPLOS X","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133534810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1290
Temporally silent stores 暂时沉默的商店
Pub Date : 2002-10-01 DOI: 10.1145/605397.605401
Kevin M. Lepak, Mikko H. Lipasti
Recent work has shown that silent stores--stores which write a value matching the one already stored at the memory location--occur quite frequently and can be exploited to reduce memory traffic and improve performance. This paper extends the definition of silent stores to encompass sets of stores that change the value stored at a memory location, but only temporarily, and subsequently return a previous value of interest to the memory location. The stores that cause the value to revert are called temporally silent stores. We redefine multiprocessor sharing to account for temporal silence and show that in the limit, up to 45% of communication misses in scientific and commercial applications can be eliminated by exploiting values that change only temporarily. We describe a practical mechanism that detects temporally silent stores and removes the coherence traffic they cause in conventional multiprocessors. We find that up to 42% of communication misses can be eliminated with a simple extension to the MESI protocol. Further, we examine application and operating system code to provide insight into the temporal silence phenomenon and characterize temporal silence by examining value frequencies and dynamic instruction distances between temporally silent pairs. These studies indicate that the operating system is involved heavily in temporal silence, in both commercial and scientific workloads, and that while detectable synchronization primitives provide substantial contributions, significant opportunity exists outside these references.
最近的研究表明,静默存储(silent store)——写入与已经存储在内存位置的值相匹配的值的存储——出现得相当频繁,可以用来减少内存流量并提高性能。本文扩展了静默存储的定义,使其包含一组存储,这些存储更改存储在内存位置的值,但只是暂时的,然后将先前感兴趣的值返回到内存位置。导致值恢复的存储称为暂时静默存储。我们重新定义了多处理器共享,以考虑暂时的沉默,并表明在极限情况下,科学和商业应用中多达45%的通信缺失可以通过利用仅临时变化的值来消除。我们描述了一种实用的机制,可以检测暂时静默存储并消除它们在传统多处理器中引起的相干流量。我们发现,通过对MESI协议的简单扩展,可以消除多达42%的通信遗漏。此外,我们研究了应用程序和操作系统代码,以深入了解时间沉默现象,并通过检查时间沉默对之间的值频率和动态指令距离来表征时间沉默。这些研究表明,在商业和科学工作负载中,操作系统在很大程度上与时间沉默有关,并且尽管可检测的同步原语提供了大量贡献,但在这些参考之外存在重大机会。
{"title":"Temporally silent stores","authors":"Kevin M. Lepak, Mikko H. Lipasti","doi":"10.1145/605397.605401","DOIUrl":"https://doi.org/10.1145/605397.605401","url":null,"abstract":"Recent work has shown that silent stores--stores which write a value matching the one already stored at the memory location--occur quite frequently and can be exploited to reduce memory traffic and improve performance. This paper extends the definition of silent stores to encompass sets of stores that change the value stored at a memory location, but only temporarily, and subsequently return a previous value of interest to the memory location. The stores that cause the value to revert are called temporally silent stores. We redefine multiprocessor sharing to account for temporal silence and show that in the limit, up to 45% of communication misses in scientific and commercial applications can be eliminated by exploiting values that change only temporarily. We describe a practical mechanism that detects temporally silent stores and removes the coherence traffic they cause in conventional multiprocessors. We find that up to 42% of communication misses can be eliminated with a simple extension to the MESI protocol. Further, we examine application and operating system code to provide insight into the temporal silence phenomenon and characterize temporal silence by examining value frequencies and dynamic instruction distances between temporally silent pairs. These studies indicate that the operating system is involved heavily in temporal silence, in both commercial and scientific workloads, and that while detectable synchronization primitives provide substantial contributions, significant opportunity exists outside these references.","PeriodicalId":377379,"journal":{"name":"ASPLOS X","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123531342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
Design and evaluation of compiler algorithms for pre-execution 预执行编译器算法的设计与评估
Pub Date : 2002-10-01 DOI: 10.1145/605397.605415
Dongkeun Kim, D. Yeung
Pre-execution is a promising latency tolerance technique that uses one or more helper threads running in spare hardware contexts ahead of the main computation to trigger long-latency memory operations early, hence absorbing their latency on behalf of the main computation. This paper investigates a source-to-source C compiler for extracting pre-execution thread code automatically, thus relieving the programmer or hardware from this onerous task. At the heart of our compiler are three algorithms. First, program slicing removes non-critical code for computing cache-missing memory references, reducing pre-execution overhead. Second, prefetch conversion replaces blocking memory references with non-blocking prefetch instructions to minimize pre-execution thread stalls. Finally, threading scheme selection chooses the best scheme for initiating pre-execution threads, speculatively parallelizing loops to generate thread-level parallelism when necessary for latency tolerance. We prototyped our algorithms using the Stanford University Intermediate Format (SUIF) framework and a publicly available program slicer, called Unravel [13], and we evaluated our compiler on a detailed architectural simulator of an SMT processor. Our results show compiler-based pre-execution improves the performance of 9 out of 13 applications, reducing execution time by 22.7%. Across all 13 applications, our technique delivers an average speedup of 17.0%. These performance gains are achieved fully automatically on conventional SMT hardware, with only minimal modifications to support pre-execution threads.
预执行是一种很有前途的延迟容忍技术,它使用在主计算之前在备用硬件上下文中运行的一个或多个辅助线程来提前触发长延迟内存操作,从而代表主计算吸收它们的延迟。本文研究了一个源到源的C编译器,用于自动提取预执行线程代码,从而将程序员或硬件从繁重的任务中解脱出来。编译器的核心是三个算法。首先,程序切片删除了用于计算缺少缓存的内存引用的非关键代码,减少了预执行开销。其次,预取转换用非阻塞的预取指令替换阻塞的内存引用,以最小化预执行线程的停顿。最后,线程方案选择为启动预执行线程选择最佳方案,在需要时推测并行循环以生成线程级别的并行性,以实现延迟容忍。我们使用斯坦福大学中间格式(SUIF)框架和一个公开可用的程序切片器(称为Unravel)对算法进行了原型设计[13],并在SMT处理器的详细架构模拟器上评估了我们的编译器。我们的结果显示,基于编译器的预执行提高了13个应用程序中的9个的性能,减少了22.7%的执行时间。在所有13个应用程序中,我们的技术提供了17.0%的平均加速。这些性能提升是在传统的SMT硬件上完全自动实现的,只需要很少的修改就可以支持预执行线程。
{"title":"Design and evaluation of compiler algorithms for pre-execution","authors":"Dongkeun Kim, D. Yeung","doi":"10.1145/605397.605415","DOIUrl":"https://doi.org/10.1145/605397.605415","url":null,"abstract":"Pre-execution is a promising latency tolerance technique that uses one or more helper threads running in spare hardware contexts ahead of the main computation to trigger long-latency memory operations early, hence absorbing their latency on behalf of the main computation. This paper investigates a source-to-source C compiler for extracting pre-execution thread code automatically, thus relieving the programmer or hardware from this onerous task. At the heart of our compiler are three algorithms. First, program slicing removes non-critical code for computing cache-missing memory references, reducing pre-execution overhead. Second, prefetch conversion replaces blocking memory references with non-blocking prefetch instructions to minimize pre-execution thread stalls. Finally, threading scheme selection chooses the best scheme for initiating pre-execution threads, speculatively parallelizing loops to generate thread-level parallelism when necessary for latency tolerance. We prototyped our algorithms using the Stanford University Intermediate Format (SUIF) framework and a publicly available program slicer, called Unravel [13], and we evaluated our compiler on a detailed architectural simulator of an SMT processor. Our results show compiler-based pre-execution improves the performance of 9 out of 13 applications, reducing execution time by 22.7%. Across all 13 applications, our technique delivers an average speedup of 17.0%. These performance gains are achieved fully automatically on conventional SMT hardware, with only minimal modifications to support pre-execution threads.","PeriodicalId":377379,"journal":{"name":"ASPLOS X","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130678073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 121
Enabling trusted software integrity 启用可信软件完整性
Pub Date : 2002-10-01 DOI: 10.1145/605397.605409
D. Kirovski, M. Drinic, M. Potkonjak
Preventing execution of unauthorized software on a given computer plays a pivotal role in system security. The key problem is that although a program at the beginning of its execution can be verified as authentic, while running, its execution flow can be redirected to externally injected malicious code using, for example, a buffer overflow exploit. Existing techniques address this problem by trying to detect the intrusion at run-time or by formally verifying that the software is not prone to a particular attack.We take a radically different approach to this problem. We aim at intrusion prevention as the core technology for enabling secure computing systems. Intrusion prevention systems force an adversary to solve a computationally hard task in order to create a binary that can be executed on a given machine. In this paper, we present an exemplary system--SPEF--a combination of architectural and compilation techniques that ensure software integrity at run-time. SPEF embeds encrypted, processor-specific constraints into each block of instructions at software installation time and then verifies their existence at run-time. Thus, the processor can execute only properly installed programs, which makes installation the only system gate that needs to be protected. We have designed a SPEF prototype based on the ARM instruction set and validated its impact on security and performance using the MediaBench suite of applications.
防止在给定计算机上执行未经授权的软件在系统安全中起着关键作用。关键的问题是,尽管程序在开始执行时可以被验证为是真实的,但在运行时,它的执行流可以被重定向到外部注入的恶意代码,例如,使用缓冲区溢出漏洞。现有技术通过尝试在运行时检测入侵或通过正式验证软件不容易受到特定攻击来解决此问题。我们对这个问题采取完全不同的方法。我们的目标是将防入侵技术作为保障电脑系统安全的核心技术。入侵防御系统迫使攻击者解决一个难以计算的任务,以便创建一个可以在给定机器上执行的二进制文件。在本文中,我们展示了一个典型的系统——SPEF——一个架构和编译技术的组合,它确保了软件在运行时的完整性。SPEF在软件安装时将加密的、特定于处理器的约束嵌入到每个指令块中,然后在运行时验证它们的存在。因此,处理器只能执行正确安装的程序,这使得安装成为唯一需要保护的系统门。我们设计了一个基于ARM指令集的SPEF原型,并使用mediabbench应用程序套件验证了其对安全性和性能的影响。
{"title":"Enabling trusted software integrity","authors":"D. Kirovski, M. Drinic, M. Potkonjak","doi":"10.1145/605397.605409","DOIUrl":"https://doi.org/10.1145/605397.605409","url":null,"abstract":"Preventing execution of unauthorized software on a given computer plays a pivotal role in system security. The key problem is that although a program at the beginning of its execution can be verified as authentic, while running, its execution flow can be redirected to externally injected malicious code using, for example, a buffer overflow exploit. Existing techniques address this problem by trying to detect the intrusion at run-time or by formally verifying that the software is not prone to a particular attack.We take a radically different approach to this problem. We aim at intrusion prevention as the core technology for enabling secure computing systems. Intrusion prevention systems force an adversary to solve a computationally hard task in order to create a binary that can be executed on a given machine. In this paper, we present an exemplary system--SPEF--a combination of architectural and compilation techniques that ensure software integrity at run-time. SPEF embeds encrypted, processor-specific constraints into each block of instructions at software installation time and then verifies their existence at run-time. Thus, the processor can execute only properly installed programs, which makes installation the only system gate that needs to be protected. We have designed a SPEF prototype based on the ARM instruction set and validated its impact on security and performance using the MediaBench suite of applications.","PeriodicalId":377379,"journal":{"name":"ASPLOS X","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114167523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 94
Programming language optimizations for modular router configurations 模块化路由器配置的编程语言优化
Pub Date : 2002-10-01 DOI: 10.1145/605397.605424
E. Kohler, R. Morris, Benjie Chen
Networking systems such as Ensemble, the x-kernel, Scout, and Click achieve flexibility by building routers and other packet processors from modular components. Unfortunately, component designs are often slower than purpose-built code, and routers in particular have stringent efficiency requirements. This paper addresses the efficiency problems of one component-based router, Click, through optimization tools inspired in part by compiler optimization passes. This pragmatic approach can result in significant performance improvements; for example, the combination of three optimizations reduces the amount of CPU time Click requires to process a packet in a simple IP router by 34%. We present several optimization tools, describe how those tools affected the design of Click itself, and present detailed evaluations of Click's performance with and without optimization.
诸如Ensemble、x-kernel、Scout和Click等网络系统通过使用模块化组件构建路由器和其他数据包处理器来实现灵活性。不幸的是,组件设计通常比专门构建的代码慢,特别是路由器有严格的效率要求。本文通过部分受编译器优化通道启发的优化工具,解决了基于单组件的路由器Click的效率问题。这种实用的方法可以显著提高性能;例如,三个优化的组合将Click在简单IP路由器中处理数据包所需的CPU时间减少了34%。我们介绍了几个优化工具,描述了这些工具如何影响Click本身的设计,并详细评估了经过优化和没有经过优化的Click的性能。
{"title":"Programming language optimizations for modular router configurations","authors":"E. Kohler, R. Morris, Benjie Chen","doi":"10.1145/605397.605424","DOIUrl":"https://doi.org/10.1145/605397.605424","url":null,"abstract":"Networking systems such as Ensemble, the x-kernel, Scout, and Click achieve flexibility by building routers and other packet processors from modular components. Unfortunately, component designs are often slower than purpose-built code, and routers in particular have stringent efficiency requirements. This paper addresses the efficiency problems of one component-based router, Click, through optimization tools inspired in part by compiler optimization passes. This pragmatic approach can result in significant performance improvements; for example, the combination of three optimizations reduces the amount of CPU time Click requires to process a packet in a simple IP router by 34%. We present several optimization tools, describe how those tools affected the design of Click itself, and present detailed evaluations of Click's performance with and without optimization.","PeriodicalId":377379,"journal":{"name":"ASPLOS X","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128098475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Transactional lock-free execution of lock-based programs 基于锁的程序的事务性无锁执行
Pub Date : 2002-10-01 DOI: 10.1145/605397.605399
Ravi Rajwar, J. Goodman
This paper is motivated by the difficulty in writing correct high-performance programs. Writing shared-memory multi-threaded programs imposes a complex trade-off between programming ease and performance, largely due to subtleties in coordinating access to shared data. To ensure correctness programmers often rely on conservative locking at the expense of performance. The resulting serialization of threads is a performance bottleneck. Locks also interact poorly with thread scheduling and faults, resulting in poor system performance.We seek to improve multithreaded programming trade-offs by providing architectural support for optimistic lock-free execution. In a lock-free execution, shared objects are never locked when accessed by various threads. We propose Transactional Lock Removal (TLR) and show how a program that uses lock-based synchronization can be executed by the hardware in a lock-free manner, even in the presence of conflicts, without programmer support or software changes. TLR uses timestamps for conflict resolution, modest hardware, and features already present in many modern computer systems.TLR's benefits include improved programmability, stability, and performance. Programmers can obtain benefits of lock-free data structures, such as non-blocking behavior and wait-freedom, while using lock-protected critical sections for writing programs.
本文的动机是编写正确的高性能程序的困难。编写共享内存多线程程序需要在编程便利性和性能之间进行复杂的权衡,这主要是由于协调共享数据访问的微妙之处。为了确保正确性,程序员经常以牺牲性能为代价依赖保守锁定。由此产生的线程序列化是性能瓶颈。锁与线程调度和错误的交互也很差,导致系统性能差。我们寻求通过为乐观无锁执行提供体系结构支持来改进多线程编程的权衡。在无锁执行中,共享对象在被各种线程访问时永远不会被锁定。我们提出了事务性锁移除(Transactional Lock Removal, TLR),并展示了使用基于锁的同步的程序如何在没有锁的情况下由硬件执行,即使在存在冲突的情况下,也不需要程序员的支持或软件更改。TLR使用时间戳来解决冲突、简单的硬件和许多现代计算机系统中已经存在的特性。TLR的优点包括改进的可编程性、稳定性和性能。程序员在使用锁保护的临界区编写程序时,可以获得无锁数据结构的好处,例如非阻塞行为和等待自由。
{"title":"Transactional lock-free execution of lock-based programs","authors":"Ravi Rajwar, J. Goodman","doi":"10.1145/605397.605399","DOIUrl":"https://doi.org/10.1145/605397.605399","url":null,"abstract":"This paper is motivated by the difficulty in writing correct high-performance programs. Writing shared-memory multi-threaded programs imposes a complex trade-off between programming ease and performance, largely due to subtleties in coordinating access to shared data. To ensure correctness programmers often rely on conservative locking at the expense of performance. The resulting serialization of threads is a performance bottleneck. Locks also interact poorly with thread scheduling and faults, resulting in poor system performance.We seek to improve multithreaded programming trade-offs by providing architectural support for optimistic lock-free execution. In a lock-free execution, shared objects are never locked when accessed by various threads. We propose Transactional Lock Removal (TLR) and show how a program that uses lock-based synchronization can be executed by the hardware in a lock-free manner, even in the presence of conflicts, without programmer support or software changes. TLR uses timestamps for conflict resolution, modest hardware, and features already present in many modern computer systems.TLR's benefits include improved programmability, stability, and performance. Programmers can obtain benefits of lock-free data structures, such as non-blocking behavior and wait-freedom, while using lock-protected critical sections for writing programs.","PeriodicalId":377379,"journal":{"name":"ASPLOS X","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126144061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 358
Keynote address: Sensor network research: emerging challenges for architecture, systems, and languages 主题演讲:传感器网络研究:架构、系统和语言的新挑战
Pub Date : 2002-10-01 DOI: 10.1145/605397.1090192
D. Estrin
{"title":"Keynote address: Sensor network research: emerging challenges for architecture, systems, and languages","authors":"D. Estrin","doi":"10.1145/605397.1090192","DOIUrl":"https://doi.org/10.1145/605397.1090192","url":null,"abstract":"","PeriodicalId":377379,"journal":{"name":"ASPLOS X","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129763851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
ASPLOS X
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1