首页 > 最新文献

Proceedings of the Eleventh European Conference on Computer Systems最新文献

英文 中文
Type-aware transactions for faster concurrent code 支持类型感知的事务,以实现更快的并发代码
Pub Date : 2016-04-18 DOI: 10.1145/2901318.2901348
Nathaniel Herman, J. Inala, Yihe Huang, Lillian Tsai, E. Kohler, B. Liskov, L. Shrira
It is often possible to improve a concurrent system's performance by leveraging the semantics of its datatypes. We build a new software transactional memory (STM) around this observation. A conventional STM tracks read- and write-sets of memory words; even simple operations can generate large sets. Our STM, which we call STO, tracks abstract operations on transactional datatypes instead. Parts of the transactional commit protocol are delegated to these datatypes' implementations, which can use datatype semantics, and new commit protocol features, to reduce bookkeeping, limit false conflicts, and implement efficient concurrency control. We test these ideas on the STAMP benchmark suite for STM applications and on our own prior work, the Silo high-performance in-memory database, observing large performance improvements in both systems.
通常可以通过利用数据类型的语义来提高并发系统的性能。我们围绕这一观察构建了一个新的软件事务性内存(STM)。传统的STM跟踪存储器字的读写集;即使是简单的操作也可以生成大的集合。我们的STM(我们称之为STO)跟踪事务性数据类型上的抽象操作。事务性提交协议的部分内容被委托给这些数据类型的实现,这些实现可以使用数据类型语义和新的提交协议特性来减少簿记、限制错误冲突并实现有效的并发控制。我们在STM应用程序的STAMP基准套件和我们自己之前的工作Silo高性能内存数据库上测试了这些想法,观察到两个系统都有很大的性能改进。
{"title":"Type-aware transactions for faster concurrent code","authors":"Nathaniel Herman, J. Inala, Yihe Huang, Lillian Tsai, E. Kohler, B. Liskov, L. Shrira","doi":"10.1145/2901318.2901348","DOIUrl":"https://doi.org/10.1145/2901318.2901348","url":null,"abstract":"It is often possible to improve a concurrent system's performance by leveraging the semantics of its datatypes. We build a new software transactional memory (STM) around this observation. A conventional STM tracks read- and write-sets of memory words; even simple operations can generate large sets. Our STM, which we call STO, tracks abstract operations on transactional datatypes instead. Parts of the transactional commit protocol are delegated to these datatypes' implementations, which can use datatype semantics, and new commit protocol features, to reduce bookkeeping, limit false conflicts, and implement efficient concurrency control. We test these ideas on the STAMP benchmark suite for STM applications and on our own prior work, the Silo high-performance in-memory database, observing large performance improvements in both systems.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"52 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86259525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
The Linux scheduler: a decade of wasted cores Linux调度器:浪费了十年的内核
Pub Date : 2016-04-18 DOI: 10.1145/2901318.2901326
Jean-Pierre Lozi, Baptiste Lepers, Justin R. Funston, Fabien Gaud, Vivien Quéma, Alexandra Fedorova
As a central part of resource management, the OS thread scheduler must maintain the following, simple, invariant: make sure that ready threads are scheduled on available cores. As simple as it may seem, we found that this invariant is often broken in Linux. Cores may stay idle for seconds while ready threads are waiting in runqueues. In our experiments, these performance bugs caused many-fold performance degradation for synchronization-heavy scientific applications, 13% higher latency for kernel make, and a 14-23% decrease in TPC-H throughput for a widely used commercial database. The main contribution of this work is the discovery and analysis of these bugs and providing the fixes. Conventional testing techniques and debugging tools are ineffective at confirming or understanding this kind of bugs, because their symptoms are often evasive. To drive our investigation, we built new tools that check for violation of the invariant online and visualize scheduling activity. They are simple, easily portable across kernel versions, and run with a negligible overhead. We believe that making these tools part of the kernel developers' tool belt can help keep this type of bug at bay.
作为资源管理的核心部分,操作系统线程调度器必须保持以下简单不变的原则:确保在可用的内核上调度就绪的线程。虽然看起来很简单,但我们发现这个不变量在Linux中经常被破坏。当就绪的线程在运行队列中等待时,内核可能会空闲几秒钟。在我们的实验中,这些性能错误导致同步密集型科学应用程序的性能下降了许多倍,内核生成的延迟增加了13%,广泛使用的商业数据库的TPC-H吞吐量降低了14-23%。这项工作的主要贡献是发现和分析这些错误并提供修复。传统的测试技术和调试工具在确认或理解这类bug方面是无效的,因为它们的症状通常是难以捉摸的。为了推动我们的调查,我们构建了新的工具来检查是否违反了不变式在线和可视化的调度活动。它们很简单,很容易跨内核版本移植,并且运行开销很小。我们相信,让这些工具成为内核开发人员工具带的一部分,可以帮助防止这种类型的bug。
{"title":"The Linux scheduler: a decade of wasted cores","authors":"Jean-Pierre Lozi, Baptiste Lepers, Justin R. Funston, Fabien Gaud, Vivien Quéma, Alexandra Fedorova","doi":"10.1145/2901318.2901326","DOIUrl":"https://doi.org/10.1145/2901318.2901326","url":null,"abstract":"As a central part of resource management, the OS thread scheduler must maintain the following, simple, invariant: make sure that ready threads are scheduled on available cores. As simple as it may seem, we found that this invariant is often broken in Linux. Cores may stay idle for seconds while ready threads are waiting in runqueues. In our experiments, these performance bugs caused many-fold performance degradation for synchronization-heavy scientific applications, 13% higher latency for kernel make, and a 14-23% decrease in TPC-H throughput for a widely used commercial database. The main contribution of this work is the discovery and analysis of these bugs and providing the fixes. Conventional testing techniques and debugging tools are ineffective at confirming or understanding this kind of bugs, because their symptoms are often evasive. To drive our investigation, we built new tools that check for violation of the invariant online and visualize scheduling activity. They are simple, easily portable across kernel versions, and run with a negligible overhead. We believe that making these tools part of the kernel developers' tool belt can help keep this type of bug at bay.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"185 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80598621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 146
An efficient design and implementation of LSM-tree based key-value store on open-channel SSD 基于lsm树的键值存储在开放通道SSD上的高效设计与实现
Pub Date : 2014-04-14 DOI: 10.1145/2592798.2592804
Peng Wang, Guangyu Sun, Song Jiang, Jian Ouyang, Shiding Lin, Chen Zhang, J. Cong
Various key-value (KV) stores are widely employed for data management to support Internet services as they offer higher efficiency, scalability, and availability than relational database systems. The log-structured merge tree (LSM-tree) based KV stores have attracted growing attention because they can eliminate random writes and maintain acceptable read performance. Recently, as the price per unit capacity of NAND flash decreases, solid state disks (SSDs) have been extensively adopted in enterprise-scale data centers to provide high I/O bandwidth and low access latency. However, it is inefficient to naively combine LSM-tree-based KV stores with SSDs, as the high parallelism enabled within the SSD cannot be fully exploited. Current LSM-tree-based KV stores are designed without assuming SSD's multi-channel architecture. To address this inadequacy, we propose LOCS, a system equipped with a customized SSD design, which exposes its internal flash channels to applications, to work with the LSM-tree-based KV store, specifically LevelDB in this work. We extend LevelDB to explicitly leverage the multiple channels of an SSD to exploit its abundant parallelism. In addition, we optimize scheduling and dispatching polices for concurrent I/O requests to further improve the efficiency of data access. Compared with the scenario where a stock LevelDB runs on a conventional SSD, the throughput of storage system can be improved by more than 4X after applying all proposed optimization techniques.
各种键值(KV)存储被广泛用于数据管理以支持Internet服务,因为它们比关系数据库系统提供更高的效率、可伸缩性和可用性。基于日志结构合并树(LSM-tree)的KV存储由于能够消除随机写操作并保持良好的读性能而受到越来越多的关注。近年来,随着NAND闪存单位容量价格的下降,固态硬盘(ssd)被广泛应用于企业级数据中心,以提供高I/O带宽和低访问延迟。然而,天真地将基于lsm树的KV存储与SSD结合起来是低效的,因为SSD内部启用的高并行性不能被充分利用。当前基于lsm树的KV存储在设计时没有考虑SSD的多通道架构。为了解决这一不足,我们提出了LOCS,这是一个配备定制SSD设计的系统,它将其内部闪存通道暴露给应用程序,与基于lsm树的KV存储一起工作,特别是在本工作中的LevelDB。我们扩展LevelDB来显式地利用SSD的多个通道来利用其丰富的并行性。此外,我们优化了并发I/O请求的调度策略,进一步提高了数据访问效率。与在传统SSD上运行存量LevelDB的场景相比,应用所有优化技术后,存储系统的吞吐量可以提高4倍以上。
{"title":"An efficient design and implementation of LSM-tree based key-value store on open-channel SSD","authors":"Peng Wang, Guangyu Sun, Song Jiang, Jian Ouyang, Shiding Lin, Chen Zhang, J. Cong","doi":"10.1145/2592798.2592804","DOIUrl":"https://doi.org/10.1145/2592798.2592804","url":null,"abstract":"Various key-value (KV) stores are widely employed for data management to support Internet services as they offer higher efficiency, scalability, and availability than relational database systems. The log-structured merge tree (LSM-tree) based KV stores have attracted growing attention because they can eliminate random writes and maintain acceptable read performance. Recently, as the price per unit capacity of NAND flash decreases, solid state disks (SSDs) have been extensively adopted in enterprise-scale data centers to provide high I/O bandwidth and low access latency. However, it is inefficient to naively combine LSM-tree-based KV stores with SSDs, as the high parallelism enabled within the SSD cannot be fully exploited. Current LSM-tree-based KV stores are designed without assuming SSD's multi-channel architecture.\u0000 To address this inadequacy, we propose LOCS, a system equipped with a customized SSD design, which exposes its internal flash channels to applications, to work with the LSM-tree-based KV store, specifically LevelDB in this work. We extend LevelDB to explicitly leverage the multiple channels of an SSD to exploit its abundant parallelism. In addition, we optimize scheduling and dispatching polices for concurrent I/O requests to further improve the efficiency of data access. Compared with the scenario where a stock LevelDB runs on a conventional SSD, the throughput of storage system can be improved by more than 4X after applying all proposed optimization techniques.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"38 1","pages":"16:1-16:14"},"PeriodicalIF":0.0,"publicationDate":"2014-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75700387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 180
Kronos: the design and implementation of an event ordering service Kronos:事件排序服务的设计和实现
Pub Date : 2014-04-14 DOI: 10.1145/2592798.2592822
Robert Escriva, Ayush Dubey, B. Wong, E. G. Sirer
This paper proposes a new approach to determining the order of interdependent operations in a distributed system. The key idea behind our approach is to factor the task of tracking happens-before relationships out of components that comprise the system, and to centralize them in a separate event ordering service. This not only simplifies implementation of individual components by freeing them from having to propagate dependence information, but also enables dependence relationships to be maintained across multiple independent systems. A novel API enables the system to detect and take advantage of concurrency whenever possible by maintaining fine-grained information and binding events to a time order as late as possible. We demonstrate the benefits of this approach through several example applications, including a transactional key-value store, and an online graph store. Experiments show that our event ordering service scales well and has low overhead in practice.
本文提出了一种确定分布式系统中相互依赖操作顺序的新方法。我们的方法背后的关键思想是将跟踪事件发生前关系的任务从组成系统的组件中分离出来,并将它们集中在单独的事件排序服务中。这不仅简化了单个组件的实现,使它们不必传播依赖信息,而且还支持跨多个独立系统维护依赖关系。一种新颖的API使系统能够通过维护细粒度信息和尽可能晚地将事件绑定到时间顺序来检测和利用并发性。我们通过几个示例应用程序(包括事务性键值存储和在线图存储)演示了这种方法的好处。实验表明,我们的事件排序服务具有良好的可扩展性和较低的开销。
{"title":"Kronos: the design and implementation of an event ordering service","authors":"Robert Escriva, Ayush Dubey, B. Wong, E. G. Sirer","doi":"10.1145/2592798.2592822","DOIUrl":"https://doi.org/10.1145/2592798.2592822","url":null,"abstract":"This paper proposes a new approach to determining the order of interdependent operations in a distributed system. The key idea behind our approach is to factor the task of tracking happens-before relationships out of components that comprise the system, and to centralize them in a separate event ordering service. This not only simplifies implementation of individual components by freeing them from having to propagate dependence information, but also enables dependence relationships to be maintained across multiple independent systems. A novel API enables the system to detect and take advantage of concurrency whenever possible by maintaining fine-grained information and binding events to a time order as late as possible. We demonstrate the benefits of this approach through several example applications, including a transactional key-value store, and an online graph store. Experiments show that our event ordering service scales well and has low overhead in practice.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"21 1","pages":"3:1-3:14"},"PeriodicalIF":0.0,"publicationDate":"2014-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83935526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Snapshots in a flash with ioSnap 使用ioSnap的flash快照
Pub Date : 2014-04-14 DOI: 10.1145/2592798.2592825
Sriram Subramanian, S. Sundararaman, Nisha Talagala, A. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
Snapshots are a common and heavily relied upon feature in storage systems. The high performance of flash-based storage systems brings new, more stringent, requirements for this classic capability. We present ioSnap, a flash optimized snapshot system. Through careful design exploiting common snapshot usage patterns and flash oriented optimizations, including leveraging native characteristics of Flash Translation Layers, ioSnap delivers low-overhead snapshots with minimal disruption to foreground traffic. Through our evaluation, we show that ioSnap incurs negligible performance overhead during normal operation, and that common-case operations such as snapshot creation and deletion incur little cost. We also demonstrate techniques to mitigate the performance impact on foreground I/O during intensive snapshot operations such as activation. Overall, ioSnap represents a case study of how to integrate snapshots into a modern, well-engineered flash-based storage system.
快照是存储系统中常见且非常依赖的特性。基于闪存的高性能存储系统对这一经典功能提出了新的、更严格的要求。我们提出了ioSnap,一个flash优化的快照系统。通过精心设计,利用常见的快照使用模式和面向flash的优化,包括利用flash翻译层的本地特性,ioSnap提供了低开销的快照,对前景流量的干扰最小。通过我们的评估,我们表明ioSnap在正常操作期间产生的性能开销可以忽略不计,并且快照创建和删除等常见操作的开销很小。我们还演示了在密集快照操作(如激活)期间减轻对前台I/O性能影响的技术。总的来说,ioSnap代表了一个如何将快照集成到一个现代的、设计良好的基于闪存的存储系统中的案例研究。
{"title":"Snapshots in a flash with ioSnap","authors":"Sriram Subramanian, S. Sundararaman, Nisha Talagala, A. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau","doi":"10.1145/2592798.2592825","DOIUrl":"https://doi.org/10.1145/2592798.2592825","url":null,"abstract":"Snapshots are a common and heavily relied upon feature in storage systems. The high performance of flash-based storage systems brings new, more stringent, requirements for this classic capability. We present ioSnap, a flash optimized snapshot system. Through careful design exploiting common snapshot usage patterns and flash oriented optimizations, including leveraging native characteristics of Flash Translation Layers, ioSnap delivers low-overhead snapshots with minimal disruption to foreground traffic. Through our evaluation, we show that ioSnap incurs negligible performance overhead during normal operation, and that common-case operations such as snapshot creation and deletion incur little cost. We also demonstrate techniques to mitigate the performance impact on foreground I/O during intensive snapshot operations such as activation. Overall, ioSnap represents a case study of how to integrate snapshots into a modern, well-engineered flash-based storage system.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"82 1","pages":"23:1-23:14"},"PeriodicalIF":0.0,"publicationDate":"2014-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76773458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Algorithmic improvements for fast concurrent Cuckoo hashing 快速并发布谷鸟哈希算法改进
Pub Date : 2014-04-14 DOI: 10.1145/2592798.2592820
Xiaozhou Li, D. Andersen, M. Kaminsky, M. Freedman
Fast concurrent hash tables are an increasingly important building block as we scale systems to greater numbers of cores and threads. This paper presents the design, implementation, and evaluation of a high-throughput and memory-efficient concurrent hash table that supports multiple readers and writers. The design arises from careful attention to systems-level optimizations such as minimizing critical section length and reducing interprocessor coherence traffic through algorithm re-engineering. As part of the architectural basis for this engineering, we include a discussion of our experience and results adopting Intel's recent hardware transactional memory (HTM) support to this critical building block. We find that naively allowing concurrent access using a coarse-grained lock on existing data structures reduces overall performance with more threads. While HTM mitigates this slowdown somewhat, it does not eliminate it. Algorithmic optimizations that benefit both HTM and designs for fine-grained locking are needed to achieve high performance. Our performance results demonstrate that our new hash table design---based around optimistic cuckoo hashing---outperforms other optimized concurrent hash tables by up to 2.5x for write-heavy workloads, even while using substantially less memory for small key-value items. On a 16-core machine, our hash table executes almost 40 million insert and more than 70 million lookup operations per second.
当我们将系统扩展到更多的内核和线程时,快速并发哈希表是一个越来越重要的构建块。本文介绍了一个支持多个读写器的高吞吐量和内存高效并发哈希表的设计、实现和评估。该设计源于对系统级优化的仔细关注,例如通过算法重新设计最小化临界段长度和减少处理器间一致性流量。作为此工程的体系结构基础的一部分,我们讨论了采用英特尔最近的硬件事务性内存(HTM)支持这一关键构建块的经验和结果。我们发现,在现有数据结构上使用粗粒度锁天真地允许并发访问,会在线程更多的情况下降低整体性能。虽然HTM在一定程度上缓解了这种减速,但它并没有消除这种减速。为了实现高性能,需要对HTM和细粒度锁定设计都有利的算法优化。我们的性能结果表明,我们的新哈希表设计——基于乐观布谷鸟哈希——在写繁重的工作负载上比其他优化的并发哈希表性能高出2.5倍,即使是在为小键值项使用更少内存的情况下。在16核机器上,我们的哈希表每秒执行近4000万次插入操作和超过7000万次查找操作。
{"title":"Algorithmic improvements for fast concurrent Cuckoo hashing","authors":"Xiaozhou Li, D. Andersen, M. Kaminsky, M. Freedman","doi":"10.1145/2592798.2592820","DOIUrl":"https://doi.org/10.1145/2592798.2592820","url":null,"abstract":"Fast concurrent hash tables are an increasingly important building block as we scale systems to greater numbers of cores and threads. This paper presents the design, implementation, and evaluation of a high-throughput and memory-efficient concurrent hash table that supports multiple readers and writers. The design arises from careful attention to systems-level optimizations such as minimizing critical section length and reducing interprocessor coherence traffic through algorithm re-engineering. As part of the architectural basis for this engineering, we include a discussion of our experience and results adopting Intel's recent hardware transactional memory (HTM) support to this critical building block. We find that naively allowing concurrent access using a coarse-grained lock on existing data structures reduces overall performance with more threads. While HTM mitigates this slowdown somewhat, it does not eliminate it. Algorithmic optimizations that benefit both HTM and designs for fine-grained locking are needed to achieve high performance.\u0000 Our performance results demonstrate that our new hash table design---based around optimistic cuckoo hashing---outperforms other optimized concurrent hash tables by up to 2.5x for write-heavy workloads, even while using substantially less memory for small key-value items. On a 16-core machine, our hash table executes almost 40 million insert and more than 70 million lookup operations per second.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"191 1","pages":"27:1-27:14"},"PeriodicalIF":0.0,"publicationDate":"2014-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79772483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 161
Rex: replication at the speed of multi-core Rex:以多核速度复制
Pub Date : 2014-04-14 DOI: 10.1145/2592798.2592800
Zhenyu Guo, C. Hong, Mao Yang, Dong Zhou, Lidong Zhou, Li Zhuang
Standard state-machine replication involves consensus on a sequence of totally ordered requests through, for example, the Paxos protocol. Such a sequential execution model is becoming outdated on prevalent multi-core servers. Highly concurrent executions on multi-core architectures introduce non-determinism related to thread scheduling and lock contentions, and fundamentally break the assumption in state-machine replication. This tension between concurrency and consistency is not inherent because the total-ordering of requests is merely a simplifying convenience that is unnecessary for consistency. Concurrent executions of the application can be decoupled with a sequence of consensus decisions through consensus on partial-order traces, rather than on totally ordered requests, that capture the non-deterministic decisions in one replica execution and to be replayed with the same decisions on others. The result is a new multi-core friendly replicated state-machine framework that achieves strong consistency while preserving parallelism in multi-thread applications. On 12-core machines with hyper-threading, evaluations on typical applications show that we can scale with the number of cores, achieving up to 16 times the throughput of standard replicated state machines.
标准状态机复制涉及通过Paxos协议对一系列完全有序的请求达成一致。这种顺序执行模型在流行的多核服务器上已经过时了。多核架构上的高并发执行引入了与线程调度和锁争用相关的非确定性,并从根本上打破了状态机复制中的假设。并发性和一致性之间的紧张关系并不是固有的,因为请求的总排序仅仅是一种简化的便利,对于一致性来说是不必要的。应用程序的并发执行可以通过部分顺序的一致性跟踪(而不是完全顺序的请求)与一系列一致决策解耦,这些一致决策捕获了一个副本执行中的不确定性决策,并在其他副本执行中使用相同的决策重播。结果是一个新的多核友好的复制状态机框架,在保持多线程应用程序并行性的同时实现了强一致性。在具有超线程的12核机器上,对典型应用程序的评估表明,我们可以随着核心数量的增加而扩展,达到标准复制状态机吞吐量的16倍。
{"title":"Rex: replication at the speed of multi-core","authors":"Zhenyu Guo, C. Hong, Mao Yang, Dong Zhou, Lidong Zhou, Li Zhuang","doi":"10.1145/2592798.2592800","DOIUrl":"https://doi.org/10.1145/2592798.2592800","url":null,"abstract":"Standard state-machine replication involves consensus on a sequence of totally ordered requests through, for example, the Paxos protocol. Such a sequential execution model is becoming outdated on prevalent multi-core servers. Highly concurrent executions on multi-core architectures introduce non-determinism related to thread scheduling and lock contentions, and fundamentally break the assumption in state-machine replication. This tension between concurrency and consistency is not inherent because the total-ordering of requests is merely a simplifying convenience that is unnecessary for consistency. Concurrent executions of the application can be decoupled with a sequence of consensus decisions through consensus on partial-order traces, rather than on totally ordered requests, that capture the non-deterministic decisions in one replica execution and to be replayed with the same decisions on others. The result is a new multi-core friendly replicated state-machine framework that achieves strong consistency while preserving parallelism in multi-thread applications. On 12-core machines with hyper-threading, evaluations on typical applications show that we can scale with the number of cores, achieving up to 16 times the throughput of standard replicated state machines.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"11 1","pages":"11:1-11:14"},"PeriodicalIF":0.0,"publicationDate":"2014-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76424484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 72
Aerie: flexible file-system interfaces to storage-class memory Aerie:灵活的文件系统接口到存储类内存
Pub Date : 2014-04-14 DOI: 10.1145/2592798.2592810
Haris Volos, Sanketh Nalli, S. Panneerselvam, V. Varadarajan, Prashant Saxena, M. Swift
Storage-class memory technologies such as phase-change memory and memristors present a radically different interface to storage than existing block devices. As a result, they provide a unique opportunity to re-examine storage architectures. We find that the existing kernel-based stack of components, well suited for disks, unnecessarily limits the design and implementation of file systems for this new technology. We present Aerie, a flexible file-system architecture that exposes storage-class memory to user-mode programs so they can access files without kernel interaction. Aerie can implement a generic POSIX-like file system with performance similar to or better than a kernel implementation. The main benefit of Aerie, though, comes from enabling applications to optimize the file system interface. We demonstrate a specialized file system that reduces a hierarchical file system abstraction to a key/value store with fewer consistency guarantees but 20-109% higher performance than a kernel file system.
存储类存储技术,如相变存储器和忆阻器,提供了与现有块设备完全不同的存储接口。因此,它们为重新检查存储体系结构提供了独特的机会。我们发现,现有的基于内核的组件堆栈非常适合磁盘,但不必要地限制了这种新技术的文件系统的设计和实现。我们提出了Aerie,一个灵活的文件系统架构,它将存储类内存暴露给用户模式程序,这样它们就可以在没有内核交互的情况下访问文件。Aerie可以实现一个通用的类posix文件系统,其性能与内核实现相似或更好。不过,Aerie的主要优点在于使应用程序能够优化文件系统接口。我们演示了一个专门的文件系统,它将分层文件系统抽象为键/值存储,一致性保证较少,但性能比内核文件系统高20-109%。
{"title":"Aerie: flexible file-system interfaces to storage-class memory","authors":"Haris Volos, Sanketh Nalli, S. Panneerselvam, V. Varadarajan, Prashant Saxena, M. Swift","doi":"10.1145/2592798.2592810","DOIUrl":"https://doi.org/10.1145/2592798.2592810","url":null,"abstract":"Storage-class memory technologies such as phase-change memory and memristors present a radically different interface to storage than existing block devices. As a result, they provide a unique opportunity to re-examine storage architectures. We find that the existing kernel-based stack of components, well suited for disks, unnecessarily limits the design and implementation of file systems for this new technology.\u0000 We present Aerie, a flexible file-system architecture that exposes storage-class memory to user-mode programs so they can access files without kernel interaction. Aerie can implement a generic POSIX-like file system with performance similar to or better than a kernel implementation. The main benefit of Aerie, though, comes from enabling applications to optimize the file system interface. We demonstrate a specialized file system that reduces a hierarchical file system abstraction to a key/value store with fewer consistency guarantees but 20-109% higher performance than a kernel file system.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"58 1","pages":"14:1-14:14"},"PeriodicalIF":0.0,"publicationDate":"2014-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88508219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 200
DIBS: just-in-time congestion mitigation for data centers DIBS:数据中心的实时拥塞缓解
Pub Date : 2014-04-14 DOI: 10.1145/2592798.2592806
K. Zarifis, Rui Miao, Matt Calder, Ethan Katz-Bassett, Minlan Yu, J. Padhye
Data centers must support a range of workloads with differing demands. Although existing approaches handle routine traffic smoothly, intense hotspots--even if ephemeral--cause excessive packet loss and severely degrade performance. This loss occurs even though congestion is typically highly localized, with spare buffer capacity at nearby switches. In this paper, we argue that switches should share buffer capacity to effectively handle this spot congestion without the monetary hit of deploying large buffers at individual switches. Specifically, we present detour-induced buffer sharing (DIBS), a mechanism that achieves a near lossless network without requiring additional buffers at individual switches. Using DIBS, a congested switch detours packets randomly to neighboring switches to avoid dropping the packets. We implement DIBS in hardware, on software routers in a testbed, and in simulation, and we demonstrate that it reduces the 99th percentile of delay-sensitive query completion time by up to 85%, with very little impact on other traffic.
数据中心必须支持具有不同需求的一系列工作负载。尽管现有的方法可以平稳地处理日常流量,但强烈的热点(即使是短暂的)会导致大量的数据包丢失,并严重降低性能。即使拥塞通常是高度局部化的,在附近的交换机上有空闲的缓冲容量,这种损失也会发生。在本文中,我们认为交换机应该共享缓冲容量,以有效地处理这种点拥塞,而不会在单个交换机上部署大型缓冲区。具体来说,我们提出了绕道诱导缓冲区共享(DIBS),这是一种实现近乎无损网络的机制,而不需要在单个交换机上附加缓冲区。使用DIBS,拥塞交换机将数据包随机绕道到相邻交换机,以避免丢弃数据包。我们在硬件、测试平台中的软件路由器和仿真中实现了DIBS,并证明它将延迟敏感查询完成时间的第99百分位数减少了85%,对其他流量的影响很小。
{"title":"DIBS: just-in-time congestion mitigation for data centers","authors":"K. Zarifis, Rui Miao, Matt Calder, Ethan Katz-Bassett, Minlan Yu, J. Padhye","doi":"10.1145/2592798.2592806","DOIUrl":"https://doi.org/10.1145/2592798.2592806","url":null,"abstract":"Data centers must support a range of workloads with differing demands. Although existing approaches handle routine traffic smoothly, intense hotspots--even if ephemeral--cause excessive packet loss and severely degrade performance. This loss occurs even though congestion is typically highly localized, with spare buffer capacity at nearby switches. In this paper, we argue that switches should share buffer capacity to effectively handle this spot congestion without the monetary hit of deploying large buffers at individual switches. Specifically, we present detour-induced buffer sharing (DIBS), a mechanism that achieves a near lossless network without requiring additional buffers at individual switches. Using DIBS, a congested switch detours packets randomly to neighboring switches to avoid dropping the packets. We implement DIBS in hardware, on software routers in a testbed, and in simulation, and we demonstrate that it reduces the 99th percentile of delay-sensitive query completion time by up to 85%, with very little impact on other traffic.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"40 1","pages":"6:1-6:14"},"PeriodicalIF":0.0,"publicationDate":"2014-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87155641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
TAQ: enhancing fairness and performance predictability in small packet regimes TAQ:增强小包机制中的公平性和性能可预测性
Pub Date : 2014-04-14 DOI: 10.1145/2592798.2592819
Jay Chen, L. Subramanian, J. Iyengar, B. Ford
TCP congestion control algorithms implicitly assume that the per-flow throughput is at least a few packets per round trip time. Environments where this assumption does not hold, which we refer to as small packet regimes, are common in the contexts of wired and cellular networks in developing regions. In this paper we show that in small packet regimes TCP flows experience severe unfairness, high packet loss rates, and flow silences due to repetitive timeouts. We propose an approximate Markov model to describe TCP behavior in small packet regimes to characterize the TCP breakdown region that leads to repetitive timeout behavior. To enhance TCP performance in such regimes, we propose Timeout Aware Queuing (TAQ), a readily deployable in-network middlebox approach that uses a multi-level adaptive priority queuing algorithm to reduce the probability of timeouts, improve fairness and performance predictability. We demonstrate the effectiveness of TAQ across a spectrum of small packet regime network conditions using simulations, a prototype implementation, and testbed experiments.
TCP拥塞控制算法隐式地假设每流吞吐量是每次往返时间至少几个数据包。这种假设不成立的环境,我们称之为小数据包制度,在发展中地区的有线和蜂窝网络环境中很常见。在本文中,我们证明了在小数据包制度下,TCP流经历了严重的不公平,高丢包率和由于重复超时而导致的流沉默。我们提出了一个近似的马尔可夫模型来描述TCP在小数据包中的行为,以表征导致重复超时行为的TCP故障区域。为了在这种情况下提高TCP性能,我们提出了超时感知队列(TAQ),这是一种易于部署的网络中间盒方法,它使用多级自适应优先级队列算法来降低超时概率,提高公平性和性能可预测性。我们通过模拟、原型实现和测试平台实验,证明了TAQ在小数据包制度网络条件下的有效性。
{"title":"TAQ: enhancing fairness and performance predictability in small packet regimes","authors":"Jay Chen, L. Subramanian, J. Iyengar, B. Ford","doi":"10.1145/2592798.2592819","DOIUrl":"https://doi.org/10.1145/2592798.2592819","url":null,"abstract":"TCP congestion control algorithms implicitly assume that the per-flow throughput is at least a few packets per round trip time. Environments where this assumption does not hold, which we refer to as small packet regimes, are common in the contexts of wired and cellular networks in developing regions. In this paper we show that in small packet regimes TCP flows experience severe unfairness, high packet loss rates, and flow silences due to repetitive timeouts. We propose an approximate Markov model to describe TCP behavior in small packet regimes to characterize the TCP breakdown region that leads to repetitive timeout behavior. To enhance TCP performance in such regimes, we propose Timeout Aware Queuing (TAQ), a readily deployable in-network middlebox approach that uses a multi-level adaptive priority queuing algorithm to reduce the probability of timeouts, improve fairness and performance predictability. We demonstrate the effectiveness of TAQ across a spectrum of small packet regime network conditions using simulations, a prototype implementation, and testbed experiments.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"40 1","pages":"7:1-7:14"},"PeriodicalIF":0.0,"publicationDate":"2014-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90117715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
期刊
Proceedings of the Eleventh European Conference on Computer Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1