首页 > 最新文献

ACM Transactions on Computer Systems (TOCS)最新文献

英文 中文
A Small-Footprint Accelerator for Large-Scale Neural Networks 用于大规模神经网络的小足迹加速器
Pub Date : 2015-05-22 DOI: 10.1145/2701417
Tian-ping Chen, Shijin Zhang, Shaoli Liu, Zidong Du, Tao Luo, Yuan Gao, Junjie Liu, Dongsheng Wang, Chengyong Wu, Ninghui Sun, Yunji Chen, O. Temam
Machine-learning tasks are becoming pervasive in a broad range of domains, and in a broad range of systems (from embedded systems to data centers). At the same time, a small set of machine-learning algorithms (especially Convolutional and Deep Neural Networks, i.e., CNNs and DNNs) are proving to be state-of-the-art across many applications. As architectures evolve toward heterogeneous multicores composed of a mix of cores and accelerators, a machine-learning accelerator can achieve the rare combination of efficiency (due to the small number of target algorithms) and broad application scope. Until now, most machine-learning accelerator designs have been focusing on efficiently implementing the computational part of the algorithms. However, recent state-of-the-art CNNs and DNNs are characterized by their large size. In this study, we design an accelerator for large-scale CNNs and DNNs, with a special emphasis on the impact of memory on accelerator design, performance, and energy. We show that it is possible to design an accelerator with a high throughput, capable of performing 452 GOP/s (key NN operations such as synaptic weight multiplications and neurons outputs additions) in a small footprint of 3.02mm2 and 485mW; compared to a 128-bit 2GHz SIMD processor, the accelerator is 117.87 × faster, and it can reduce the total energy by 21.08 ×. The accelerator characteristics are obtained after layout at 65nm. Such a high throughput in a small footprint can open up the usage of state-of-the-art machine-learning algorithms in a broad set of systems and for a broad set of applications.
机器学习任务在广泛的领域和广泛的系统(从嵌入式系统到数据中心)中变得无处不在。与此同时,一小部分机器学习算法(尤其是卷积和深度神经网络,即cnn和dnn)在许多应用中被证明是最先进的。然而,最近最先进的cnn和dnn的特点是它们的大尺寸。在本研究中,我们设计了一个用于大规模cnn和dnn的加速器,特别强调了内存对加速器设计、性能和能量的影响。我们表明,可以设计一个具有高吞吐量的加速器,能够在3.02mm2和485mW的小占地中执行452 GOP/s(关键的神经网络操作,如突触权重乘法和神经元输出加法);与128位2GHz SIMD处理器相比,加速器速度提高了117.87倍,总能耗降低了21.08倍。在65nm布置后得到了加速器的特性。在很小的空间内实现如此高的吞吐量,可以在广泛的系统和广泛的应用中使用最先进的机器学习算法。
{"title":"A Small-Footprint Accelerator for Large-Scale Neural Networks","authors":"Tian-ping Chen, Shijin Zhang, Shaoli Liu, Zidong Du, Tao Luo, Yuan Gao, Junjie Liu, Dongsheng Wang, Chengyong Wu, Ninghui Sun, Yunji Chen, O. Temam","doi":"10.1145/2701417","DOIUrl":"https://doi.org/10.1145/2701417","url":null,"abstract":"Machine-learning tasks are becoming pervasive in a broad range of domains, and in a broad range of systems (from embedded systems to data centers). At the same time, a small set of machine-learning algorithms (especially Convolutional and Deep Neural Networks, i.e., CNNs and DNNs) are proving to be state-of-the-art across many applications. As architectures evolve toward heterogeneous multicores composed of a mix of cores and accelerators, a machine-learning accelerator can achieve the rare combination of efficiency (due to the small number of target algorithms) and broad application scope. Until now, most machine-learning accelerator designs have been focusing on efficiently implementing the computational part of the algorithms. However, recent state-of-the-art CNNs and DNNs are characterized by their large size. In this study, we design an accelerator for large-scale CNNs and DNNs, with a special emphasis on the impact of memory on accelerator design, performance, and energy. We show that it is possible to design an accelerator with a high throughput, capable of performing 452 GOP/s (key NN operations such as synaptic weight multiplications and neurons outputs additions) in a small footprint of 3.02mm2 and 485mW; compared to a 128-bit 2GHz SIMD processor, the accelerator is 117.87 × faster, and it can reduce the total energy by 21.08 ×. The accelerator characteristics are obtained after layout at 65nm. Such a high throughput in a small footprint can open up the usage of state-of-the-art machine-learning algorithms in a broad set of systems and for a broad set of applications.","PeriodicalId":318554,"journal":{"name":"ACM Transactions on Computer Systems (TOCS)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123410375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Fireflies 萤火虫
Pub Date : 2015-05-22 DOI: 10.1145/2701418
H. Johansen, R. V. Renesse, Ymir Vigfusson, D. Johansen
An attacker who controls a computer in an overlay network can effectively control the entire overlay network if the mechanism managing membership information can successfully be targeted. This article describes Fireflies, an overlay network protocol that fights such attacks by organizing members in a verifiable pseudorandom structure so that an intruder cannot incorrectly modify the membership views of correct members. Fireflies provides each member with a view of the entire membership, and supports networks with moderate total churn. We evaluate Fireflies using both simulations and PlanetLab to show that Fireflies is a practical approach for secure membership maintenance in such networks.
如果攻击者能够成功地攻击管理成员信息的机制,那么控制覆盖网络中的一台计算机的攻击者就可以有效地控制整个覆盖网络。本文介绍了Fireflies,这是一种覆盖网络协议,它通过将成员组织在可验证的伪随机结构中来对抗此类攻击,从而使入侵者无法错误地修改正确成员的成员视图。Fireflies为每个成员提供了整个成员的视图,并支持具有适度总流失率的网络。我们使用模拟和PlanetLab来评估Fireflies,以表明Fireflies是在此类网络中安全维护成员的实用方法。
{"title":"Fireflies","authors":"H. Johansen, R. V. Renesse, Ymir Vigfusson, D. Johansen","doi":"10.1145/2701418","DOIUrl":"https://doi.org/10.1145/2701418","url":null,"abstract":"An attacker who controls a computer in an overlay network can effectively control the entire overlay network if the mechanism managing membership information can successfully be targeted. This article describes Fireflies, an overlay network protocol that fights such attacks by organizing members in a verifiable pseudorandom structure so that an intruder cannot incorrectly modify the membership views of correct members. Fireflies provides each member with a view of the entire membership, and supports networks with moderate total churn. We evaluate Fireflies using both simulations and PlanetLab to show that Fireflies is a practical approach for secure membership maintenance in such networks.","PeriodicalId":318554,"journal":{"name":"ACM Transactions on Computer Systems (TOCS)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127161978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
A Differential Approach to Undefined Behavior Detection 未定义行为检测的差分方法
Pub Date : 2015-03-11 DOI: 10.1145/2699678
Xi Wang, N. Zeldovich, M. Kaashoek, Armando Solar-Lezama
This article studies undefined behavior arising in systems programming languages such as C/C++. Undefined behavior bugs lead to unpredictable and subtle systems behavior, and their effects can be further amplified by compiler optimizations. Undefined behavior bugs are present in many systems, including the Linux kernel and the Postgres database. The consequences range from incorrect functionality to missing security checks. This article proposes a formal and practical approach that finds undefined behavior bugs by finding “unstable code” in terms of optimizations that leverage undefined behavior. Using this approach, we introduce a new static checker called Stack that precisely identifies undefined behavior bugs. Applying Stack to widely used systems has uncovered 161 new bugs that have been confirmed and fixed by developers.
本文研究了C/ c++等系统编程语言中出现的未定义行为。未定义的行为bug导致不可预测和微妙的系统行为,其影响可以通过编译器优化进一步放大。许多系统中都存在未定义的行为错误,包括Linux内核和Postgres数据库。其后果包括从不正确的功能到缺少安全检查。本文提出了一种正式而实用的方法,通过寻找利用未定义行为的优化方面的“不稳定代码”来发现未定义行为bug。使用这种方法,我们引入了一个名为Stack的新的静态检查器,它可以精确地识别未定义的行为错误。将Stack应用于广泛使用的系统已经发现了161个新的bug,这些bug已经被开发人员确认并修复了。
{"title":"A Differential Approach to Undefined Behavior Detection","authors":"Xi Wang, N. Zeldovich, M. Kaashoek, Armando Solar-Lezama","doi":"10.1145/2699678","DOIUrl":"https://doi.org/10.1145/2699678","url":null,"abstract":"This article studies undefined behavior arising in systems programming languages such as C/C++. Undefined behavior bugs lead to unpredictable and subtle systems behavior, and their effects can be further amplified by compiler optimizations. Undefined behavior bugs are present in many systems, including the Linux kernel and the Postgres database. The consequences range from incorrect functionality to missing security checks. This article proposes a formal and practical approach that finds undefined behavior bugs by finding “unstable code” in terms of optimizations that leverage undefined behavior. Using this approach, we introduce a new static checker called Stack that precisely identifies undefined behavior bugs. Applying Stack to widely used systems has uncovered 161 new bugs that have been confirmed and fixed by developers.","PeriodicalId":318554,"journal":{"name":"ACM Transactions on Computer Systems (TOCS)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130683698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
ISA Wars
Pub Date : 2015-03-11 DOI: 10.1145/2699682
Emily R. Blem, J. Menon, T. Vijayaraghavan, K. Sankaralingam
RISC versus CISC wars raged in the 1980s when chip area and processor design complexity were the primary constraints and desktops and servers exclusively dominated the computing landscape. Today, energy and power are the primary design constraints and the computing landscape is significantly different: Growth in tablets and smartphones running ARM (a RISC ISA) is surpassing that of desktops and laptops running x86 (a CISC ISA). Furthermore, the traditionally low-power ARM ISA is entering the high-performance server market, while the traditionally high-performance x86 ISA is entering the mobile low-power device market. Thus, the question of whether ISA plays an intrinsic role in performance or energy efficiency is becoming important again, and we seek to answer this question through a detailed measurement-based study on real hardware running real applications. We analyze measurements on seven platforms spanning three ISAs (MIPS, ARM, and x86) over workloads spanning mobile, desktop, and server computing. Our methodical investigation demonstrates the role of ISA in modern microprocessors’ performance and energy efficiency. We find that ARM, MIPS, and x86 processors are simply engineering design points optimized for different levels of performance, and there is nothing fundamentally more energy efficient in one ISA class or the other. The ISA being RISC or CISC seems irrelevant.
RISC与CISC之争在20世纪80年代爆发,当时芯片面积和处理器设计复杂性是主要的限制因素,台式机和服务器独占了计算领域。今天,能源和功率是主要的设计限制,计算环境也明显不同:运行ARM (RISC ISA)的平板电脑和智能手机的增长正在超过运行x86 (CISC ISA)的台式机和笔记本电脑。此外,传统的低功耗ARM ISA正在进入高性能服务器市场,而传统的高性能x86 ISA正在进入移动低功耗设备市场。因此,ISA是否在性能或能源效率中发挥内在作用的问题再次变得重要起来,我们试图通过对运行实际应用程序的实际硬件进行基于测量的详细研究来回答这个问题。我们分析了跨越三个isa (MIPS、ARM和x86)的七个平台上的测量结果,以及跨越移动、桌面和服务器计算的工作负载。我们的系统调查证明了ISA在现代微处理器的性能和能源效率中的作用。我们发现ARM、MIPS和x86处理器只是针对不同级别的性能进行了优化的工程设计点,并且在一个ISA类或另一个ISA类中没有什么从根本上更节能的。ISA是RISC还是CISC似乎无关紧要。
{"title":"ISA Wars","authors":"Emily R. Blem, J. Menon, T. Vijayaraghavan, K. Sankaralingam","doi":"10.1145/2699682","DOIUrl":"https://doi.org/10.1145/2699682","url":null,"abstract":"RISC versus CISC wars raged in the 1980s when chip area and processor design complexity were the primary constraints and desktops and servers exclusively dominated the computing landscape. Today, energy and power are the primary design constraints and the computing landscape is significantly different: Growth in tablets and smartphones running ARM (a RISC ISA) is surpassing that of desktops and laptops running x86 (a CISC ISA). Furthermore, the traditionally low-power ARM ISA is entering the high-performance server market, while the traditionally high-performance x86 ISA is entering the mobile low-power device market. Thus, the question of whether ISA plays an intrinsic role in performance or energy efficiency is becoming important again, and we seek to answer this question through a detailed measurement-based study on real hardware running real applications. We analyze measurements on seven platforms spanning three ISAs (MIPS, ARM, and x86) over workloads spanning mobile, desktop, and server computing. Our methodical investigation demonstrates the role of ISA in modern microprocessors’ performance and energy efficiency. We find that ARM, MIPS, and x86 processors are simply engineering design points optimized for different levels of performance, and there is nothing fundamentally more energy efficient in one ISA class or the other. The ISA being RISC or CISC seems irrelevant.","PeriodicalId":318554,"journal":{"name":"ACM Transactions on Computer Systems (TOCS)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116319503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Energy-Oriented Partial Desktop Virtual Machine Migration 面向能源的部分桌面虚拟机迁移
Pub Date : 2015-03-11 DOI: 10.1145/2699683
Nilton Bila, Eric J. Wright, E. D. Lara, Kaustubh R. Joshi, H. A. Lagar-Cavilla, Eunbyung Park, Ashvin Goel, M. Hiltunen, M. Satyanarayanan
Modern offices are crowded with personal computers. While studies have shown these to be idle most of the time, they remain powered, consuming up to 60% of their peak power. Hardware-based solutions engendered by PC vendors (e.g., low-power states, Wake-on-LAN) have proved unsuccessful because, in spite of user inactivity, these machines often need to remain network active in support of background applications that maintain network presence. Recent proposals have advocated the use of consolidation of idle desktop Virtual Machines (VMs). However, desktop VMs are often large, requiring gigabytes of memory. Consolidating such VMs creates large network transfers lasting in the order of minutes and utilizes server memory inefficiently. When multiple VMs migrate concurrently, networks become congested, and the resulting migration latencies are prohibitive. We present partial VM migration, an approach that transparently migrates only the working set of an idle VM. It creates a partial replica of the desktop VM on the consolidation server by copying only VM metadata, and it transfers pages to the server on-demand, as the VM accesses them. This approach places desktop PCs in low-power mode when inactive and switches them to running mode when pages are needed by the VM running on the consolidation server. To ensure that desktops save energy, we have developed sleep scheduling and prefetching algorithms, as well as the context-aware selective resume framework, a novel approach to reduce the latency of power mode transition operations in commodity PCs. Jettison, our software prototype of partial VM migration for off-the-shelf PCs, can deliver 44--91% energy savings during idle periods of at least 10 minutes, while providing low migration latencies of about 4 seconds and migrating minimal state that is under an order of magnitude of the VM’s memory footprint.
现代办公室里挤满了个人电脑。虽然研究表明这些设备大部分时间都是闲置的,但它们仍然有电,耗电量高达峰值的60%。PC厂商提出的基于硬件的解决方案(例如,低功耗状态,局域网唤醒)被证明是不成功的,因为尽管用户不活动,这些机器通常需要保持网络活动,以支持维持网络存在的后台应用程序。最近的建议提倡使用空闲桌面虚拟机(vm)的整合。然而,桌面虚拟机通常很大,需要千兆字节的内存。合并这样的虚拟机会产生持续几分钟的大量网络传输,并且会低效地利用服务器内存。当多个虚拟机并发迁移时,网络会变得拥塞,导致迁移延迟过高。我们提出了部分VM迁移,这是一种仅透明地迁移空闲VM的工作集的方法。它通过仅复制VM元数据在整合服务器上创建桌面VM的部分副本,并在VM访问页面时按需将页面传输到服务器。这种方法在桌面pc处于非活动状态时将其置于低功耗模式,并在整合服务器上运行的VM需要页面时将其切换到运行模式。为了确保桌面节省能源,我们开发了睡眠调度和预取算法,以及上下文感知的选择性恢复框架,这是一种新颖的方法,可以减少商用pc中电源模式转换操作的延迟。Jettison是我们针对现成pc的部分虚拟机迁移的软件原型,可以在至少10分钟的空闲时间内节省44- 91%的能源,同时提供约4秒的低迁移延迟和迁移最小状态,这是虚拟机内存占用的一个数量级。
{"title":"Energy-Oriented Partial Desktop Virtual Machine Migration","authors":"Nilton Bila, Eric J. Wright, E. D. Lara, Kaustubh R. Joshi, H. A. Lagar-Cavilla, Eunbyung Park, Ashvin Goel, M. Hiltunen, M. Satyanarayanan","doi":"10.1145/2699683","DOIUrl":"https://doi.org/10.1145/2699683","url":null,"abstract":"Modern offices are crowded with personal computers. While studies have shown these to be idle most of the time, they remain powered, consuming up to 60% of their peak power. Hardware-based solutions engendered by PC vendors (e.g., low-power states, Wake-on-LAN) have proved unsuccessful because, in spite of user inactivity, these machines often need to remain network active in support of background applications that maintain network presence. Recent proposals have advocated the use of consolidation of idle desktop Virtual Machines (VMs). However, desktop VMs are often large, requiring gigabytes of memory. Consolidating such VMs creates large network transfers lasting in the order of minutes and utilizes server memory inefficiently. When multiple VMs migrate concurrently, networks become congested, and the resulting migration latencies are prohibitive. We present partial VM migration, an approach that transparently migrates only the working set of an idle VM. It creates a partial replica of the desktop VM on the consolidation server by copying only VM metadata, and it transfers pages to the server on-demand, as the VM accesses them. This approach places desktop PCs in low-power mode when inactive and switches them to running mode when pages are needed by the VM running on the consolidation server. To ensure that desktops save energy, we have developed sleep scheduling and prefetching algorithms, as well as the context-aware selective resume framework, a novel approach to reduce the latency of power mode transition operations in commodity PCs. Jettison, our software prototype of partial VM migration for off-the-shelf PCs, can deliver 44--91% energy savings during idle periods of at least 10 minutes, while providing low migration latencies of about 4 seconds and migrating minimal state that is under an order of magnitude of the VM’s memory footprint.","PeriodicalId":318554,"journal":{"name":"ACM Transactions on Computer Systems (TOCS)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127815797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
The Scalable Commutativity Rule 可伸缩交换律
Pub Date : 2015-01-20 DOI: 10.1145/2699681
A. Clements, M. Kaashoek, N. Zeldovich, Robert T. Morris, E. Kohler
What opportunities for multicore scalability are latent in software interfaces, such as system call APIs? Can scalability challenges and opportunities be identified even before any implementation exists, simply by considering interface specifications? To answer these questions, we introduce the scalable commutativity rule: whenever interface operations commute, they can be implemented in a way that scales. This rule is useful throughout the development process for scalable multicore software, from the interface design through implementation, testing, and evaluation. This article formalizes the scalable commutativity rule. This requires defining a novel form of commutativity, SIM commutativity, that lets the rule apply even to complex and highly stateful software interfaces. We also introduce a suite of software development tools based on the rule. Our Commuter tool accepts high-level interface models, generates tests of interface operations that commute and hence could scale, and uses these tests to systematically evaluate the scalability of implementations. We apply Commuter to a model of 18 POSIX file and virtual memory system operations. Using the resulting 26,238 scalability tests, Commuter highlights Linux kernel problems previously observed to limit application scalability and identifies previously unknown bottlenecks that may be triggered by future workloads or hardware. Finally, we apply the scalable commutativity rule and Commuter to the design and implementation sv6, a new POSIX-like operating system. sv6’s novel file and virtual memory system designs enable it to scale for 99% of the tests generated by Commuter. These results translate to linear scalability on an 80-core x86 machine for applications built on sv6’s commutative operations.
软件接口(如系统调用api)中潜在的多核可伸缩性机会是什么?仅仅通过考虑接口规范,是否可以在任何实现存在之前就识别出可伸缩性的挑战和机遇?为了回答这些问题,我们引入了可伸缩交换规则:无论何时接口操作交换,它们都可以以可伸缩的方式实现。这条规则在可扩展多核软件的整个开发过程中都很有用,从接口设计到实现、测试和评估。本文形式化了可伸缩交换律。这需要定义一种新的交换性形式,SIM交换性,使规则甚至适用于复杂和高度有状态的软件接口。我们还介绍了一套基于该规则的软件开发工具。我们的通勤工具接受高级接口模型,生成可通勤的接口操作的测试,因此可以扩展,并使用这些测试系统地评估实现的可伸缩性。我们将通勤器应用于一个由18个POSIX文件和虚拟内存系统操作组成的模型。使用得到的26,238个可伸缩性测试,Commuter突出了以前观察到的限制应用程序可伸缩性的Linux内核问题,并识别了可能由未来的工作负载或硬件触发的以前未知的瓶颈。最后,我们将可伸缩交换规则和通勤器应用于sv6的设计和实现,sv6是一个新的类posix操作系统。sv6新颖的文件和虚拟内存系统设计使其能够扩展到由Commuter生成的99%的测试。这些结果转化为基于sv6交换操作构建的应用程序在80核x86机器上的线性可伸缩性。
{"title":"The Scalable Commutativity Rule","authors":"A. Clements, M. Kaashoek, N. Zeldovich, Robert T. Morris, E. Kohler","doi":"10.1145/2699681","DOIUrl":"https://doi.org/10.1145/2699681","url":null,"abstract":"What opportunities for multicore scalability are latent in software interfaces, such as system call APIs? Can scalability challenges and opportunities be identified even before any implementation exists, simply by considering interface specifications? To answer these questions, we introduce the scalable commutativity rule: whenever interface operations commute, they can be implemented in a way that scales. This rule is useful throughout the development process for scalable multicore software, from the interface design through implementation, testing, and evaluation. This article formalizes the scalable commutativity rule. This requires defining a novel form of commutativity, SIM commutativity, that lets the rule apply even to complex and highly stateful software interfaces. We also introduce a suite of software development tools based on the rule. Our Commuter tool accepts high-level interface models, generates tests of interface operations that commute and hence could scale, and uses these tests to systematically evaluate the scalability of implementations. We apply Commuter to a model of 18 POSIX file and virtual memory system operations. Using the resulting 26,238 scalability tests, Commuter highlights Linux kernel problems previously observed to limit application scalability and identifies previously unknown bottlenecks that may be triggered by future workloads or hardware. Finally, we apply the scalable commutativity rule and Commuter to the design and implementation sv6, a new POSIX-like operating system. sv6’s novel file and virtual memory system designs enable it to scale for 99% of the tests generated by Commuter. These results translate to linear scalability on an 80-core x86 machine for applications built on sv6’s commutative operations.","PeriodicalId":318554,"journal":{"name":"ACM Transactions on Computer Systems (TOCS)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130208979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Mechanistic Modeling of Architectural Vulnerability Factor 建筑脆弱性因素的机械建模
Pub Date : 2015-01-20 DOI: 10.1145/2669364
Arun A. Nair, Stijn Eyerman, Jian Chen, L. John, L. Eeckhout
Reliability to soft errors is a significant design challenge in modern microprocessors owing to an exponential increase in the number of transistors on chip and the reduction in operating voltages with each process generation. Architectural Vulnerability Factor (AVF) modeling using microarchitectural simulators enables architects to make informed performance, power, and reliability tradeoffs. However, such simulators are time-consuming and do not reveal the microarchitectural mechanisms that influence AVF. In this article, we present an accurate first-order mechanistic analytical model to compute AVF, developed using the first principles of an out-of-order superscalar execution. This model provides insight into the fundamental interactions between the workload and microarchitecture that together influence AVF. We use the model to perform design space exploration, parametric sweeps, and workload characterization for AVF.
在现代微处理器中,由于芯片上晶体管的数量呈指数级增长,每一代工艺的工作电压都在降低,因此对软错误的可靠性是一个重大的设计挑战。使用微体系结构模拟器的体系结构脆弱性因子(AVF)建模使架构师能够做出明智的性能、功率和可靠性权衡。然而,这样的模拟器是耗时的,并不能揭示影响AVF的微架构机制。在本文中,我们提出了一个精确的一阶机制分析模型来计算AVF,该模型是利用乱序超标量执行的第一原理开发的。该模型提供了对共同影响AVF的工作负载和微架构之间的基本交互的深入了解。我们使用该模型对AVF进行设计空间探索、参数扫描和工作负载表征。
{"title":"Mechanistic Modeling of Architectural Vulnerability Factor","authors":"Arun A. Nair, Stijn Eyerman, Jian Chen, L. John, L. Eeckhout","doi":"10.1145/2669364","DOIUrl":"https://doi.org/10.1145/2669364","url":null,"abstract":"Reliability to soft errors is a significant design challenge in modern microprocessors owing to an exponential increase in the number of transistors on chip and the reduction in operating voltages with each process generation. Architectural Vulnerability Factor (AVF) modeling using microarchitectural simulators enables architects to make informed performance, power, and reliability tradeoffs. However, such simulators are time-consuming and do not reveal the microarchitectural mechanisms that influence AVF. In this article, we present an accurate first-order mechanistic analytical model to compute AVF, developed using the first principles of an out-of-order superscalar execution. This model provides insight into the fundamental interactions between the workload and microarchitecture that together influence AVF. We use the model to perform design space exploration, parametric sweeps, and workload characterization for AVF.","PeriodicalId":318554,"journal":{"name":"ACM Transactions on Computer Systems (TOCS)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124463771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Shielding Applications from an Untrusted Cloud with Haven 用Haven屏蔽不受信任云的应用程序
Pub Date : 2014-10-06 DOI: 10.1145/2799647
Andrew Baumann, Marcus Peinado, G. Hunt
Today’s cloud computing infrastructure requires substantial trust. Cloud users rely on both the provider’s staff and its globally distributed software/hardware platform not to expose any of their private data. We introduce the notion of shielded execution, which protects the confidentiality and integrity of a program and its data from the platform on which it runs (i.e., the cloud operator’s OS, VM, and firmware). Our prototype, Haven, is the first system to achieve shielded execution of unmodified legacy applications, including SQL Server and Apache, on a commodity OS (Windows) and commodity hardware. Haven leverages the hardware protection of Intel SGX to defend against privileged code and physical attacks such as memory probes, and also addresses the dual challenges of executing unmodified legacy binaries and protecting them from a malicious host. This work motivated recent changes in the SGX specification.
今天的云计算基础设施需要大量的信任。云用户依靠提供商的员工及其全球分布的软件/硬件平台来避免暴露其任何私人数据。我们介绍了屏蔽执行的概念,它保护程序及其数据在其运行的平台(即云运营商的操作系统、VM和固件)上的机密性和完整性。我们的原型Haven是第一个在普通操作系统(Windows)和普通硬件上实现未修改遗留应用程序(包括SQL Server和Apache)屏蔽执行的系统。Haven利用英特尔SGX的硬件保护来防御特权代码和物理攻击,如内存探测,还解决了执行未修改的遗留二进制文件和保护它们免受恶意主机攻击的双重挑战。这项工作推动了SGX规范最近的变化。
{"title":"Shielding Applications from an Untrusted Cloud with Haven","authors":"Andrew Baumann, Marcus Peinado, G. Hunt","doi":"10.1145/2799647","DOIUrl":"https://doi.org/10.1145/2799647","url":null,"abstract":"Today’s cloud computing infrastructure requires substantial trust. Cloud users rely on both the provider’s staff and its globally distributed software/hardware platform not to expose any of their private data. We introduce the notion of shielded execution, which protects the confidentiality and integrity of a program and its data from the platform on which it runs (i.e., the cloud operator’s OS, VM, and firmware). Our prototype, Haven, is the first system to achieve shielded execution of unmodified legacy applications, including SQL Server and Apache, on a commodity OS (Windows) and commodity hardware. Haven leverages the hardware protection of Intel SGX to defend against privileged code and physical attacks such as memory probes, and also addresses the dual challenges of executing unmodified legacy binaries and protecting them from a malicious host. This work motivated recent changes in the SGX specification.","PeriodicalId":318554,"journal":{"name":"ACM Transactions on Computer Systems (TOCS)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114886391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 770
Faults in Linux 2.6 Linux 2.6故障
Pub Date : 2014-06-01 DOI: 10.1145/2619090
Nicolas Palix, Gaël Thomas, S. Saha, C. Calvès, Gilles Muller, J. Lawall
In August 2011, Linux entered its third decade. Ten years before, Chou et al. published a study of faults found by applying a static analyzer to Linux versions 1.0 through 2.4.1. A major result of their work was that the drivers directory contained up to 7 times more of certain kinds of faults than other directories. This result inspired numerous efforts on improving the reliability of driver code. Today, Linux is used in a wider range of environments, provides a wider range of services, and has adopted a new development and release model. What has been the impact of these changes on code quality? To answer this question, we have transported Chou et al.’s experiments to all versions of Linux 2.6; released between 2003 and 2011. We find that Linux has more than doubled in size during this period, but the number of faults per line of code has been decreasing. Moreover, the fault rate of drivers is now below that of other directories, such as arch. These results can guide further development and research efforts for the decade to come. To allow updating these results as Linux evolves, we define our experimental protocol and make our checkers available.
2011年8月,Linux进入了第三个十年。十年前,Chou等人发表了一项研究,研究了通过对Linux 1.0到2.4.1版本应用静态分析器发现的故障。他们工作的一个主要结果是,驱动程序目录包含的某些类型的错误比其他目录多7倍。这个结果激发了许多改进驱动程序代码可靠性的努力。今天,Linux在更广泛的环境中使用,提供更广泛的服务,并采用了新的开发和发布模型。这些变化对代码质量有什么影响?为了回答这个问题,我们把Chou等人的实验移植到Linux 2.6的所有版本;发行于2003年至2011年之间。我们发现,在此期间,Linux的规模增加了一倍多,但每行代码的错误数量一直在减少。此外,驱动程序的故障率现在低于其他目录,如arch。这些结果可以指导未来十年的进一步发展和研究工作。为了允许随着Linux的发展而更新这些结果,我们定义了我们的实验协议并提供了我们的检查器。
{"title":"Faults in Linux 2.6","authors":"Nicolas Palix, Gaël Thomas, S. Saha, C. Calvès, Gilles Muller, J. Lawall","doi":"10.1145/2619090","DOIUrl":"https://doi.org/10.1145/2619090","url":null,"abstract":"In August 2011, Linux entered its third decade. Ten years before, Chou et al. published a study of faults found by applying a static analyzer to Linux versions 1.0 through 2.4.1. A major result of their work was that the drivers directory contained up to 7 times more of certain kinds of faults than other directories. This result inspired numerous efforts on improving the reliability of driver code. Today, Linux is used in a wider range of environments, provides a wider range of services, and has adopted a new development and release model. What has been the impact of these changes on code quality? To answer this question, we have transported Chou et al.’s experiments to all versions of Linux 2.6; released between 2003 and 2011. We find that Linux has more than doubled in size during this period, but the number of faults per line of code has been decreasing. Moreover, the fault rate of drivers is now below that of other directories, such as arch. These results can guide further development and research efforts for the decade to come. To allow updating these results as Linux evolves, we define our experimental protocol and make our checkers available.","PeriodicalId":318554,"journal":{"name":"ACM Transactions on Computer Systems (TOCS)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114419821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
TaintDroid
Pub Date : 2014-06-01 DOI: 10.1145/2619091
W. Enck, Peter Gilbert, Seungyeop Han, Vasant Tendulkar, Byung-Gon Chun, Landon P. Cox, Jaeyeon Jung, P. Mcdaniel, Anmol Sheth
Today’s smartphone operating systems frequently fail to provide users with visibility into how third-party applications collect and share their private data. We address these shortcomings with TaintDroid, an efficient, system-wide dynamic taint tracking and analysis system capable of simultaneously tracking multiple sources of sensitive data. TaintDroid enables realtime analysis by leveraging Android’s virtualized execution environment. TaintDroid incurs only 32% performance overhead on a CPU-bound microbenchmark and imposes negligible overhead on interactive third-party applications. Using TaintDroid to monitor the behavior of 30 popular third-party Android applications, in our 2010 study we found 20 applications potentially misused users’ private information; so did a similar fraction of the tested applications in our 2012 study. Monitoring the flow of privacy-sensitive data with TaintDroid provides valuable input for smartphone users and security service firms seeking to identify misbehaving applications.
如今的智能手机操作系统经常无法向用户提供第三方应用程序如何收集和共享他们的私人数据的可见性。我们用TaintDroid解决了这些缺点,这是一个高效的,系统范围的动态污染跟踪和分析系统,能够同时跟踪多个敏感数据来源。TaintDroid通过利用Android的虚拟化执行环境实现实时分析。在cpu绑定的微基准测试中,TaintDroid只带来32%的性能开销,对交互式第三方应用程序的开销可以忽略不计。在我们2010年的研究中,使用TaintDroid监控30个流行的第三方Android应用程序的行为,我们发现20个应用程序可能滥用用户的私人信息;在我们2012年的研究中,同样比例的测试应用程序也是如此。使用TaintDroid监控隐私敏感数据流为智能手机用户和安全服务公司提供了有价值的输入,帮助他们识别行为不端的应用程序。
{"title":"TaintDroid","authors":"W. Enck, Peter Gilbert, Seungyeop Han, Vasant Tendulkar, Byung-Gon Chun, Landon P. Cox, Jaeyeon Jung, P. Mcdaniel, Anmol Sheth","doi":"10.1145/2619091","DOIUrl":"https://doi.org/10.1145/2619091","url":null,"abstract":"Today’s smartphone operating systems frequently fail to provide users with visibility into how third-party applications collect and share their private data. We address these shortcomings with TaintDroid, an efficient, system-wide dynamic taint tracking and analysis system capable of simultaneously tracking multiple sources of sensitive data. TaintDroid enables realtime analysis by leveraging Android’s virtualized execution environment. TaintDroid incurs only 32% performance overhead on a CPU-bound microbenchmark and imposes negligible overhead on interactive third-party applications. Using TaintDroid to monitor the behavior of 30 popular third-party Android applications, in our 2010 study we found 20 applications potentially misused users’ private information; so did a similar fraction of the tested applications in our 2012 study. Monitoring the flow of privacy-sensitive data with TaintDroid provides valuable input for smartphone users and security service firms seeking to identify misbehaving applications.","PeriodicalId":318554,"journal":{"name":"ACM Transactions on Computer Systems (TOCS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121587955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3385
期刊
ACM Transactions on Computer Systems (TOCS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1