Proceedings of the Eleventh European Conference on Computer Systems最新文献

英文中文

IFDB: decentralized information flow control for databases IFDB:数据库的分散信息流控制

Proceedings of the Eleventh European Conference on Computer Systems

Pub Date : 2013-04-15 DOI: 10.1145/2465351.2465357

David A. Schultz, B. Liskov

Numerous sensitive databases are breached every year due to bugs in applications. These applications typically handle data for many users, and consequently, they have access to large amounts of confidential information. This paper describes IFDB, a DBMS that secures databases by using decentralized information flow control (DIFC). We present the Query by Label model, which introduces new abstractions for managing information flows in a relational database. IFDB also addresses several challenges inherent in bringing DIFC to databases, including how to handle transactions and integrity constraints without introducing covert channels. We implemented IFDB by modifying PostgreSQL, and extended two application environments, PHP and Python, to provide a DIFC platform. IFDB caught several security bugs and prevented information leaks in two web applications we ported to the platform. Our evaluation shows that IFDB's throughput is as good as PostgreSQL for a real web application, and about 1% lower for a database benchmark based on TPC-C.

由于应用程序中的错误，每年都会有许多敏感数据库遭到破坏。这些应用程序通常为许多用户处理数据，因此，它们可以访问大量机密信息。本文描述了IFDB，一个使用分散信息流控制(DIFC)来保护数据库的数据库管理系统。我们提出了按标签查询模型，它为管理关系数据库中的信息流引入了新的抽象。IFDB还解决了将DIFC引入数据库的几个固有挑战，包括如何在不引入隐蔽通道的情况下处理事务和完整性约束。我们通过修改PostgreSQL实现了IFDB，并扩展了PHP和Python两个应用环境，以提供一个DIFC平台。IFDB发现了几个安全漏洞，并阻止了我们移植到平台上的两个web应用程序的信息泄露。我们的评估表明，在真实的web应用程序中，IFDB的吞吐量与PostgreSQL一样好，而在基于TPC-C的数据库基准测试中，IFDB的吞吐量比PostgreSQL低1%左右。

引用次数: 70

CPI2: CPU performance isolation for shared compute clusters CPI2:共享计算集群的CPU性能隔离

Proceedings of the Eleventh European Conference on Computer Systems

Pub Date : 2013-04-15 DOI: 10.1145/2465351.2465388

Xiao Zhang, Eric Tune, R. Hagmann, Rohit Jnagal, Vrigo Gokhale, J. Wilkes

Performance isolation is a key challenge in cloud computing. Unfortunately, Linux has few defenses against performance interference in shared resources such as processor caches and memory buses, so applications in a cloud can experience unpredictable performance caused by other programs' behavior. Our solution, CPI2, uses cycles-per-instruction (CPI) data obtained by hardware performance counters to identify problems, select the likely perpetrators, and then optionally throttle them so that the victims can return to their expected behavior. It automatically learns normal and anomalous behaviors by aggregating data from multiple tasks in the same job. We have rolled out CPI2 to all of Google's shared compute clusters. The paper presents the analysis that lead us to that outcome, including both case studies and a large-scale evaluation of its ability to solve real production issues.

性能隔离是云计算中的一个关键挑战。不幸的是，Linux对共享资源(如处理器缓存和内存总线)中的性能干扰几乎没有防御措施，因此云中的应用程序可能会遇到由其他程序的行为引起的不可预测的性能。我们的解决方案CPI2使用硬件性能计数器获得的每指令周期(CPI)数据来识别问题，选择可能的肇事者，然后有选择地限制它们，以便受害者可以恢复到预期的行为。它通过聚合来自同一工作中多个任务的数据，自动学习正常和异常行为。我们已经在所有谷歌的共享计算集群上推出了CPI2。本文介绍了导致我们得出这一结果的分析，包括案例研究和对其解决实际生产问题的能力的大规模评估。

引用次数: 324

Conversion: multi-version concurrency control for main memory segments 转换:主内存段的多版本并发控制

Proceedings of the Eleventh European Conference on Computer Systems

Pub Date : 2013-04-15 DOI: 10.1145/2465351.2465365

Timothy Merrifield, Jakob Eriksson

We present Conversion, a multi-version concurrency control system for main memory segments. Like the familiar Subversion version control system for files, Conversion provides isolation between processes that each operate on their own working copy. A process retrieves and merges any changes committed to the trunk by calling update(), and a call to commit() pushes any local changes to the trunk. Conversion operations are fast, starting at a few microseconds and growing linearly (by less than 1 μs) with the number of modified pages. This is achieved by leveraging virtual memory hardware, and efficient data structures for keeping track of which pages of memory were modified since the last update. Such extremely low-latency operations make Conversion well suited to a wide variety of concurrent applications. Below, in addition to a micro-benchmark and comparative evaluation, we retrofit Dthreads [28] with a Conversion-based memory model as a case study. This resulted in a speedup (up to 1.75x) for several benchmark programs and reduced the memory management code for Dthreads by 80%.

我们提出了一个多版本的主内存段并发控制系统Conversion。与熟悉的用于文件的Subversion版本控制系统一样，Conversion提供了对各自工作副本进行操作的进程之间的隔离。流程通过调用update()来检索并合并提交到主干的任何更改，而调用commit()则将任何本地更改推送到主干。转换操作很快，从几微秒开始，随着修改页面的数量呈线性增长(增幅小于1 μs)。这是通过利用虚拟内存硬件和有效的数据结构来跟踪自上次更新以来哪些内存页面被修改来实现的。这种极低延迟的操作使得Conversion非常适合各种各样的并发应用程序。下面，除了微基准测试和比较评估之外，我们还将使用基于转换的内存模型对Dthreads[28]进行改进，作为案例研究。这导致了几个基准程序的加速(高达1.75倍)，并将dthread的内存管理代码减少了80%。

引用次数: 45

Improving server applications with system transactions 通过系统事务改进服务器应用程序

Proceedings of the Eleventh European Conference on Computer Systems

Pub Date : 2012-04-10 DOI: 10.1145/2168836.2168839

Sangman Kim, Michael Z. Lee, Alan M. Dunn, O. S. Hofmann, Xuan Wang, E. Witchel, Donald E. Porter

Server applications must process requests as quickly as possible. Because some requests depend on earlier requests, there is often a tension between increasing throughput and maintaining the proper semantics for dependent requests. Operating system transactions make it easier to write reliable, high-throughput server applications because they allow the application to execute non-interfering requests in parallel, even if the requests operate on OS state, such as file data. By changing less than 200 lines of application code, we improve performance of a replicated Byzantine Fault Tolerant (BFT) system by up to 88% using server-side speculation, and we improve concurrent performance up to 80% for an IMAP email server by changing only 40 lines. Achieving these results requires substantial enhancements to system transactions, including the ability to pause and resume transactions, and an API to commit transactions in a pre-defined order.

服务器应用程序必须尽可能快地处理请求。由于某些请求依赖于较早的请求，因此在增加吞吐量和维护依赖请求的适当语义之间经常存在矛盾。操作系统事务使得编写可靠的、高吞吐量的服务器应用程序变得更加容易，因为它们允许应用程序并行执行非干扰请求，即使这些请求操作的是操作系统状态，比如文件数据。通过更改不到200行应用程序代码，我们使用服务器端推测将复制的拜占庭容错(BFT)系统的性能提高了88%，并且通过仅更改40行代码，我们将IMAP电子邮件服务器的并发性能提高了80%。实现这些结果需要对系统事务进行实质性的增强，包括暂停和恢复事务的能力，以及以预定义顺序提交事务的API。

引用次数: 13

CheapBFT: resource-efficient byzantine fault tolerance CheapBFT:资源高效拜占庭式容错

Proceedings of the Eleventh European Conference on Computer Systems

Pub Date : 2012-04-10 DOI: 10.1145/2168836.2168866

R. Kapitza, J. Behl, C. Cachin, T. Distler, Simon Kuhnle, Seyed Vahid Mohammadi, Wolfgang Schröder-Preikschat, Klaus Stengel

One of the main reasons why Byzantine fault-tolerant (BFT) systems are not widely used lies in their high resource consumption: 3f+1 replicas are necessary to tolerate only f faults. Recent works have been able to reduce the minimum number of replicas to 2f+1 by relying on a trusted subsystem that prevents a replica from making conflicting statements to other replicas without being detected. Nevertheless, having been designed with the focus on fault handling, these systems still employ a majority of replicas during normal-case operation for seemingly redundant work. Furthermore, the trusted subsystems available trade off performance for security; that is, they either achieve high throughput or they come with a small trusted computing base. This paper presents CheapBFT, a BFT system that, for the first time, tolerates that all but one of the replicas active in normal-case operation become faulty. CheapBFT runs a composite agreement protocol and exploits passive replication to save resources; in the absence of faults, it requires that only f+1 replicas actively agree on client requests and execute them. In case of suspected faulty behavior, CheapBFT triggers a transition protocol that activates f extra passive replicas and brings all non-faulty replicas into a consistent state again. This approach, for example, allows the system to safely switch to another, more resilient agreement protocol. CheapBFT relies on an FPGA-based trusted subsystem for the authentication of protocol messages that provides high performance and comprises a small trusted computing base.

拜占庭容错(BFT)系统没有得到广泛应用的主要原因之一在于它们的高资源消耗:3f+1副本只需要容忍f个错误。最近的工作已经能够通过依赖可信子系统将最小副本数量减少到2f+1，该子系统可以防止副本在不被检测到的情况下向其他副本发出冲突语句。尽管如此，这些系统的设计重点是故障处理，但在正常情况下的操作中，这些系统仍然使用大部分副本来完成看似冗余的工作。此外，可用的可信子系统在性能与安全性之间进行权衡;也就是说，它们要么实现高吞吐量，要么提供一个小的可信计算基础。本文介绍了CheapBFT，这是一个BFT系统，它首次允许在正常情况下除一个副本外的所有副本都出现故障。CheapBFT运行复合协议协议，并利用被动复制来节省资源;在没有错误的情况下，它只需要f+1个副本主动同意客户端请求并执行它们。在怀疑有错误行为的情况下，CheapBFT触发一个转换协议，该协议激活额外的被动副本，并使所有非故障副本重新进入一致状态。例如，这种方法允许系统安全地切换到另一个更有弹性的协议协议。CheapBFT依赖于基于fpga的可信子系统来对协议消息进行身份验证，该子系统提供高性能，并包含一个小型可信计算基础。

{"title":"CheapBFT: resource-efficient byzantine fault tolerance","authors":"R. Kapitza, J. Behl, C. Cachin, T. Distler, Simon Kuhnle, Seyed Vahid Mohammadi, Wolfgang Schröder-Preikschat, Klaus Stengel","doi":"10.1145/2168836.2168866","DOIUrl":"https://doi.org/10.1145/2168836.2168866","url":null,"abstract":"One of the main reasons why Byzantine fault-tolerant (BFT) systems are not widely used lies in their high resource consumption: 3f+1 replicas are necessary to tolerate only f faults. Recent works have been able to reduce the minimum number of replicas to 2f+1 by relying on a trusted subsystem that prevents a replica from making conflicting statements to other replicas without being detected. Nevertheless, having been designed with the focus on fault handling, these systems still employ a majority of replicas during normal-case operation for seemingly redundant work. Furthermore, the trusted subsystems available trade off performance for security; that is, they either achieve high throughput or they come with a small trusted computing base.\u0000 This paper presents CheapBFT, a BFT system that, for the first time, tolerates that all but one of the replicas active in normal-case operation become faulty. CheapBFT runs a composite agreement protocol and exploits passive replication to save resources; in the absence of faults, it requires that only f+1 replicas actively agree on client requests and execute them. In case of suspected faulty behavior, CheapBFT triggers a transition protocol that activates f extra passive replicas and brings all non-faulty replicas into a consistent state again. This approach, for example, allows the system to safely switch to another, more resilient agreement protocol. CheapBFT relies on an FPGA-based trusted subsystem for the authentication of protocol messages that provides high performance and comprises a small trusted computing base.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"35 1","pages":"295-308"},"PeriodicalIF":0.0,"publicationDate":"2012-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79246483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 223

Fast black-box testing of system recovery code 快速黑盒测试系统恢复代码

Proceedings of the Eleventh European Conference on Computer Systems

Pub Date : 2012-04-10 DOI: 10.1145/2168836.2168865

Radu Banabic, George Candea

Fault injection---a key technique for testing the robustness of software systems---ends up rarely being used in practice, because it is labor-intensive and one needs to choose between performing random injections (which leads to poor coverage and low representativeness) or systematic testing (which takes a long time to wade through large fault spaces). As a result, testers of systems with high reliability requirements, such as MySQL, perform fault injection in an ad-hoc manner, using explicitly-coded injection statements in the base source code and manual triggering of failures. This paper introduces AFEX, a technique and tool for automating the entire fault injection process, from choosing the faults to inject, to setting up the environment, performing the injections, and finally characterizing the results of the tests (e.g., in terms of impact, coverage, and redundancy). The AFEX approach uses a metric-driven search algorithm that aims to maximize the number of bugs discovered in a fixed amount of time. We applied AFEX to real-world systems---MySQL, Apache httpd, UNIX utilities, and MongoDB---and it uncovered new bugs automatically in considerably less time than other black-box approaches.

故障注入——测试软件系统健壮性的关键技术——在实践中很少使用，因为它是劳动密集型的，并且需要在执行随机注入(这会导致低覆盖率和低代表性)或系统测试(需要很长时间来处理大的故障空间)之间进行选择。因此，具有高可靠性要求的系统的测试人员，例如MySQL，以一种特殊的方式执行故障注入，在基本源代码中使用显式编码的注入语句并手动触发故障。本文介绍了AFEX，一种自动化整个故障注入过程的技术和工具，从选择要注入的故障，到设置环境，执行注入，最后表征测试结果(例如，在影响，覆盖和冗余方面)。AFEX方法使用指标驱动的搜索算法，旨在在固定时间内最大限度地发现漏洞。我们将AFEX应用于实际系统——MySQL、Apache httpd、UNIX实用程序和MongoDB——它比其他黑盒方法在更短的时间内自动发现新bug。

{"title":"Fast black-box testing of system recovery code","authors":"Radu Banabic, George Candea","doi":"10.1145/2168836.2168865","DOIUrl":"https://doi.org/10.1145/2168836.2168865","url":null,"abstract":"Fault injection---a key technique for testing the robustness of software systems---ends up rarely being used in practice, because it is labor-intensive and one needs to choose between performing random injections (which leads to poor coverage and low representativeness) or systematic testing (which takes a long time to wade through large fault spaces). As a result, testers of systems with high reliability requirements, such as MySQL, perform fault injection in an ad-hoc manner, using explicitly-coded injection statements in the base source code and manual triggering of failures.\u0000 This paper introduces AFEX, a technique and tool for automating the entire fault injection process, from choosing the faults to inject, to setting up the environment, performing the injections, and finally characterizing the results of the tests (e.g., in terms of impact, coverage, and redundancy). The AFEX approach uses a metric-driven search algorithm that aims to maximize the number of bugs discovered in a fixed amount of time. We applied AFEX to real-world systems---MySQL, Apache httpd, UNIX utilities, and MongoDB---and it uncovered new bugs automatically in considerably less time than other black-box approaches.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"50 1","pages":"281-294"},"PeriodicalIF":0.0,"publicationDate":"2012-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88512390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 41

Improving interrupt response time in a verifiable protected microkernel 改进可验证保护微内核中的中断响应时间

Proceedings of the Eleventh European Conference on Computer Systems

Pub Date : 2012-04-10 DOI: 10.1145/2168836.2168869

Bernard Blackham, Yao Shi, G. Heiser

Many real-time operating systems (RTOSes) offer very small interrupt latencies, in the order of tens or hundreds of cycles. They achieve this by making the RTOS kernel fully preemptible, permitting interrupts at almost any point in execution except for some small critical sections. One drawback of this approach is that it is difficult to reason about or formally model the kernel's behavior for verification, especially when written in a low-level language such as C. An alternate model for an RTOS kernel is to permit interrupts at specific preemption points only. This controls the possible interleavings and enables the use of techniques such as formal verification or model checking. Although this model cannot (yet) obtain the small interrupt latencies achievable with a fully-preemptible kernel, it can still achieve worst-case latencies in the range of 10,000s to 100,000s of cycles. As modern embedded CPUs enter the 1 GHz range, such latencies become acceptable for more applications, particularly when they come with the additional benefit of simplicity and formal models. This is particularly attractive for protected multitasking microkernels, where the (inherently non-preemptible) kernel entry and exit costs dominate the latencies of many system calls. This paper explores how to reduce the worst-case interrupt latency in a (mostly) non-preemptible protected kernel, and still maintain the ability to apply formal methods for analysis. We use the formally-verified seL4 microkernel as a case study and demonstrate that it is possible to achieve reasonable response-time guarantees. By combining short predictable interrupt latencies with formal verification, a design such as seL4's creates a compelling platform for building mixed-criticality real-time systems.

许多实时操作系统(rtos)提供非常小的中断延迟，大约几十或几百个周期。他们通过使RTOS内核完全可抢占来实现这一点，允许在执行过程中的几乎任何点中断，除了一些小的临界区。这种方法的一个缺点是很难对内核的行为进行推理或形式化建模以进行验证，特别是当用c等低级语言编写时。RTOS内核的另一种模型是只允许在特定的抢占点中断。这控制了可能的交织，并允许使用诸如形式验证或模型检查之类的技术。虽然这个模型(还)不能获得完全可抢占内核所能达到的小中断延迟，但它仍然可以实现10000到100000个周期范围内的最坏情况延迟。随着现代嵌入式cpu进入1 GHz范围，这种延迟对于更多应用程序来说变得可以接受，特别是当它们具有简单性和正式模型的额外好处时。这对于受保护的多任务微内核特别有吸引力，因为内核进入和退出成本(固有的不可抢占性)决定了许多系统调用的延迟。本文探讨了如何在一个(大多数)不可抢占的受保护内核中减少最坏情况下的中断延迟，并且仍然保持应用形式化方法进行分析的能力。我们使用经过正式验证的seL4微内核作为案例研究，并证明它可以实现合理的响应时间保证。通过将短的可预测中断延迟与正式验证相结合，像seL4这样的设计为构建混合临界实时系统创建了一个引人注目的平台。

{"title":"Improving interrupt response time in a verifiable protected microkernel","authors":"Bernard Blackham, Yao Shi, G. Heiser","doi":"10.1145/2168836.2168869","DOIUrl":"https://doi.org/10.1145/2168836.2168869","url":null,"abstract":"Many real-time operating systems (RTOSes) offer very small interrupt latencies, in the order of tens or hundreds of cycles. They achieve this by making the RTOS kernel fully preemptible, permitting interrupts at almost any point in execution except for some small critical sections. One drawback of this approach is that it is difficult to reason about or formally model the kernel's behavior for verification, especially when written in a low-level language such as C.\u0000 An alternate model for an RTOS kernel is to permit interrupts at specific preemption points only. This controls the possible interleavings and enables the use of techniques such as formal verification or model checking. Although this model cannot (yet) obtain the small interrupt latencies achievable with a fully-preemptible kernel, it can still achieve worst-case latencies in the range of 10,000s to 100,000s of cycles. As modern embedded CPUs enter the 1 GHz range, such latencies become acceptable for more applications, particularly when they come with the additional benefit of simplicity and formal models. This is particularly attractive for protected multitasking microkernels, where the (inherently non-preemptible) kernel entry and exit costs dominate the latencies of many system calls.\u0000 This paper explores how to reduce the worst-case interrupt latency in a (mostly) non-preemptible protected kernel, and still maintain the ability to apply formal methods for analysis. We use the formally-verified seL4 microkernel as a case study and demonstrate that it is possible to achieve reasonable response-time guarantees. By combining short predictable interrupt latencies with formal verification, a design such as seL4's creates a compelling platform for building mixed-criticality real-time systems.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"1 1","pages":"323-336"},"PeriodicalIF":0.0,"publicationDate":"2012-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82727387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

Isolating commodity hosted hypervisors with HyperLock 使用HyperLock隔离商品托管管理程序

Proceedings of the Eleventh European Conference on Computer Systems

Pub Date : 2012-04-10 DOI: 10.1145/2168836.2168850

Zhi Wang, Chiachih Wu, Michael C. Grace, Xuxian Jiang

Hosted hypervisors (e.g., KVM) are being widely deployed. One key reason is that they can effectively take advantage of the mature features and broad user bases of commodity operating systems. However, they are not immune to exploitable software bugs. Particularly, due to the close integration with the host and the unique presence underneath guest virtual machines, a hosted hypervisor -- if compromised -- can also jeopardize the host system and completely take over all guests in the same physical machine. In this paper, we present HyperLock, a systematic approach to strictly isolate privileged, but potentially vulnerable, hosted hypervisors from compromising the host OSs. Specifically, we provide a secure hypervisor isolation runtime with its own separated address space and a restricted instruction set for safe execution. In addition, we propose another technique, i.e., hypervisor shadowing, to efficiently create a separate shadow hypervisor and pair it with each guest so that a compromised hypervisor can affect only the paired guest, not others. We have built a proof-of-concept HyperLock prototype to confine the popular KVM hypervisor on Linux. Our results show that HyperLock has a much smaller (12%) trusted computing base (TCB) than the original KVM. Moreover, our system completely removes QEMU, the companion user program of KVM (with >531K SLOC), from the TCB. The security experiments and performance measurements also demonstrated the practicality and effectiveness of our approach.

托管管理程序(例如，KVM)正在被广泛部署。一个关键原因是，它们可以有效地利用成熟的功能和广泛的用户基础的商品操作系统。然而，它们也不能幸免于可利用的软件漏洞。特别是，由于与主机的紧密集成以及客户机虚拟机下面的独特存在，托管管理程序(如果受到损害)也可能危及主机系统，并完全接管同一物理机器中的所有客户机。在本文中，我们提出了HyperLock，这是一种系统的方法，可以严格隔离特权，但可能易受攻击的托管管理程序，使其不会危及主机操作系统。具体来说，我们提供了一个安全的虚拟机监控程序隔离运行时，它有自己独立的地址空间和一个用于安全执行的受限指令集。此外，我们提出了另一种技术，即虚拟机监控程序影子，它可以有效地创建一个单独的影子虚拟机监控程序，并将其与每个客户机配对，这样受损的虚拟机监控程序只能影响配对的客户机，而不会影响其他虚拟机监控程序。我们已经构建了一个概念验证的HyperLock原型，将流行的KVM管理程序限制在Linux上。我们的结果表明，HyperLock的可信计算基础(TCB)比原来的KVM小得多(12%)。此外，我们的系统完全从TCB中删除了KVM的配套用户程序QEMU (SLOC >531K)。安全性实验和性能测试也证明了该方法的实用性和有效性。

{"title":"Isolating commodity hosted hypervisors with HyperLock","authors":"Zhi Wang, Chiachih Wu, Michael C. Grace, Xuxian Jiang","doi":"10.1145/2168836.2168850","DOIUrl":"https://doi.org/10.1145/2168836.2168850","url":null,"abstract":"Hosted hypervisors (e.g., KVM) are being widely deployed. One key reason is that they can effectively take advantage of the mature features and broad user bases of commodity operating systems. However, they are not immune to exploitable software bugs. Particularly, due to the close integration with the host and the unique presence underneath guest virtual machines, a hosted hypervisor -- if compromised -- can also jeopardize the host system and completely take over all guests in the same physical machine.\u0000 In this paper, we present HyperLock, a systematic approach to strictly isolate privileged, but potentially vulnerable, hosted hypervisors from compromising the host OSs. Specifically, we provide a secure hypervisor isolation runtime with its own separated address space and a restricted instruction set for safe execution. In addition, we propose another technique, i.e., hypervisor shadowing, to efficiently create a separate shadow hypervisor and pair it with each guest so that a compromised hypervisor can affect only the paired guest, not others. We have built a proof-of-concept HyperLock prototype to confine the popular KVM hypervisor on Linux. Our results show that HyperLock has a much smaller (12%) trusted computing base (TCB) than the original KVM. Moreover, our system completely removes QEMU, the companion user program of KVM (with >531K SLOC), from the TCB. The security experiments and performance measurements also demonstrated the practicality and effectiveness of our approach.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"14 1","pages":"127-140"},"PeriodicalIF":0.0,"publicationDate":"2012-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76981892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 70

Improving network connection locality on multicore systems 改进多核系统的网络连接局部性

Proceedings of the Eleventh European Conference on Computer Systems

Pub Date : 2012-04-10 DOI: 10.1145/2168836.2168870

A. Pesterev, Jacob Strauss, N. Zeldovich, R. Morris

Incoming and outgoing processing for a given TCP connection often execute on different cores: an incoming packet is typically processed on the core that receives the interrupt, while outgoing data processing occurs on the core running the relevant user code. As a result, accesses to read/write connection state (such as TCP control blocks) often involve cache invalidations and data movement between cores' caches. These can take hundreds of processor cycles, enough to significantly reduce performance. We present a new design, called Affinity-Accept, that causes all processing for a given TCP connection to occur on the same core. Affinity-Accept arranges for the network interface to determine the core on which application processing for each new connection occurs, in a lightweight way; it adjusts the card's choices only in response to imbalances in CPU scheduling. Measurements show that for the Apache web server serving static files on a 48-core AMD system, Affinity-Accept reduces time spent in the TCP stack by 30% and improves overall throughput by 24%.

给定TCP连接的传入和传出处理通常在不同的核上执行:传入数据包通常在接收中断的核上处理，而传出数据处理发生在运行相关用户代码的核上。因此，对读/写连接状态(如TCP控制块)的访问通常涉及缓存失效和内核缓存之间的数据移动。这可能需要数百个处理器周期，足以显著降低性能。我们提出了一种新的设计，称为Affinity-Accept，它使给定TCP连接的所有处理都发生在同一核心上。Affinity-Accept安排网络接口以轻量级的方式确定每个新连接的应用程序处理发生的核心;它只在CPU调度不平衡的情况下调整卡的选择。测量表明，对于在48核AMD系统上提供静态文件的Apache web服务器，Affinity-Accept将TCP堆栈的时间减少了30%，并将总体吞吐量提高了24%。

引用次数: 134

MadLINQ: large-scale distributed matrix computation for the cloud MadLINQ:用于云的大规模分布式矩阵计算

Proceedings of the Eleventh European Conference on Computer Systems

Pub Date : 2012-04-10 DOI: 10.1145/2168836.2168857

Zhengping Qian, Xiuwei Chen, Nanxi Kang, Mingcheng Chen, Yuan Yu, T. Moscibroda, Zheng Zhang

The computation core of many data-intensive applications can be best expressed as matrix computations. The MadLINQ project addresses the following two important research problems: the need for a highly scalable, efficient and fault-tolerant matrix computation system that is also easy to program, and the seamless integration of such specialized execution engines in a general purpose data-parallel computing system. MadLINQ exposes a unified programming model to both matrix algorithm and application developers. Matrix algorithms are expressed as sequential programs operating on tiles (i.e., sub-matrices). For application developers, MadLINQ provides a distributed matrix computation library for .NET languages. Via the LINQ technology, MadLINQ also seamlessly integrates with DryadLINQ, a data-parallel computing system focusing on relational algebra. The system automatically handles the parallelization and distributed execution of programs on a large cluster. It outperforms current state-of-the-art systems by employing two key techniques, both of which are enabled by the matrix abstraction: exploiting extra parallelism using fine-grained pipelining and efficient on-demand failure recovery using a distributed fault-tolerant execution engine. We describe the design and implementation of MadLINQ and evaluate system performance using several real-world applications.

许多数据密集型应用的计算核心可以用矩阵计算来最好地表达。MadLINQ项目解决了以下两个重要的研究问题:需要一个高度可扩展、高效和容错的矩阵计算系统，并且易于编程，以及在通用数据并行计算系统中无缝集成这种专门的执行引擎。MadLINQ向矩阵算法和应用程序开发人员公开了统一的编程模型。矩阵算法表示为在块(即子矩阵)上操作的顺序程序。对于应用程序开发人员，MadLINQ为。net语言提供了一个分布式矩阵计算库。通过LINQ技术，MadLINQ还与DryadLINQ无缝集成，DryadLINQ是一个专注于关系代数的数据并行计算系统。该系统自动处理大型集群上程序的并行化和分布式执行。通过采用两项关键技术(这两项技术都是由矩阵抽象实现的)，它优于当前最先进的系统:使用细粒度管道利用额外的并行性，使用分布式容错执行引擎利用高效的按需故障恢复。我们描述了MadLINQ的设计和实现，并使用几个实际应用程序评估了系统性能。

{"title":"MadLINQ: large-scale distributed matrix computation for the cloud","authors":"Zhengping Qian, Xiuwei Chen, Nanxi Kang, Mingcheng Chen, Yuan Yu, T. Moscibroda, Zheng Zhang","doi":"10.1145/2168836.2168857","DOIUrl":"https://doi.org/10.1145/2168836.2168857","url":null,"abstract":"The computation core of many data-intensive applications can be best expressed as matrix computations. The MadLINQ project addresses the following two important research problems: the need for a highly scalable, efficient and fault-tolerant matrix computation system that is also easy to program, and the seamless integration of such specialized execution engines in a general purpose data-parallel computing system.\u0000 MadLINQ exposes a unified programming model to both matrix algorithm and application developers. Matrix algorithms are expressed as sequential programs operating on tiles (i.e., sub-matrices). For application developers, MadLINQ provides a distributed matrix computation library for .NET languages. Via the LINQ technology, MadLINQ also seamlessly integrates with DryadLINQ, a data-parallel computing system focusing on relational algebra.\u0000 The system automatically handles the parallelization and distributed execution of programs on a large cluster. It outperforms current state-of-the-art systems by employing two key techniques, both of which are enabled by the matrix abstraction: exploiting extra parallelism using fine-grained pipelining and efficient on-demand failure recovery using a distributed fault-tolerant execution engine. We describe the design and implementation of MadLINQ and evaluate system performance using several real-world applications.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"10 1","pages":"197-210"},"PeriodicalIF":0.0,"publicationDate":"2012-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82058482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 69

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the Eleventh European Conference on Computer Systems

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀