Proceedings of the -- USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Symposium on Operating Systems Design and Implementation最新文献_第2页

Firmament: Fast, Centralized Cluster Scheduling at Scale 苍穹:快速、集中式的大规模集群调度

Proceedings of the -- USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Symposium on Operating Systems Design and Implementation

Pub Date : 2016-11-02 DOI: 10.17863/CAM.9784

Ionel Gog, Malte Schwarzkopf, A. Gleave, R. Watson, S. Hand

Centralized datacenter schedulers can make high-quality placement decisions when scheduling tasks in a cluster. Today, however, high-quality placements come at the cost of high latency at scale, which degrades response time for interactive tasks and reduces cluster utilization. This paper describes Firmament, a centralized scheduler that scales to over ten thousand machines at sub-second placement latency even though it continuously reschedules all tasks via a min-cost max-flow (MCMF) optimization. Firmament achieves low latency by using multiple MCMF algorithms, by solving the problem incrementally, and via problem-specific optimizations. Experiments with a Google workload trace from a 12,500-machine cluster show that Firmament improves placement latency by 20× over Quincy [22], a prior centralized scheduler using the same MCMF optimization. Moreover, even though Firmament is centralized, it matches the placement latency of distributed schedulers for workloads of short tasks. Finally, Firmament exceeds the placement quality of four widely-used centralized and distributed schedulers on a real-world cluster, and hence improves batch task response time by 6×.

集中式数据中心调度器可以在调度集群中的任务时做出高质量的放置决策。然而，今天，高质量的放置是以大规模的高延迟为代价的，这会降低交互式任务的响应时间并降低集群利用率。本文描述了一个集中式调度器Firmament，即使它通过最小成本最大流(MCMF)优化不断地重新调度所有任务，也可以在亚秒的放置延迟下扩展到超过10,000台机器。通过使用多种MCMF算法、增量解决问题以及针对特定问题的优化，Firmament实现了低延迟。对来自12,500台机器集群的Google工作负载跟踪的实验表明，与Quincy相比，Firmament将放置延迟提高了20倍[22]，Quincy是使用相同MCMF优化的先前集中式调度程序。此外，尽管Firmament是集中式的，但对于短任务的工作负载，它与分布式调度器的放置延迟相匹配。最后，Firmament在实际集群上的放置质量超过了四个广泛使用的集中式和分布式调度器，因此将批处理任务响应时间提高了6倍。

{"title":"Firmament: Fast, Centralized Cluster Scheduling at Scale","authors":"Ionel Gog, Malte Schwarzkopf, A. Gleave, R. Watson, S. Hand","doi":"10.17863/CAM.9784","DOIUrl":"https://doi.org/10.17863/CAM.9784","url":null,"abstract":"Centralized datacenter schedulers can make high-quality placement decisions when scheduling tasks in a cluster. Today, however, high-quality placements come at the cost of high latency at scale, which degrades response time for interactive tasks and reduces cluster utilization. \u0000 \u0000This paper describes Firmament, a centralized scheduler that scales to over ten thousand machines at sub-second placement latency even though it continuously reschedules all tasks via a min-cost max-flow (MCMF) optimization. Firmament achieves low latency by using multiple MCMF algorithms, by solving the problem incrementally, and via problem-specific optimizations. \u0000 \u0000Experiments with a Google workload trace from a 12,500-machine cluster show that Firmament improves placement latency by 20× over Quincy [22], a prior centralized scheduler using the same MCMF optimization. Moreover, even though Firmament is centralized, it matches the placement latency of distributed schedulers for workloads of short tasks. Finally, Firmament exceeds the placement quality of four widely-used centralized and distributed schedulers on a real-world cluster, and hence improves batch task response time by 6×.","PeriodicalId":90294,"journal":{"name":"Proceedings of the -- USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Symposium on Operating Systems Design and Implementation","volume":"28 1","pages":"99-115"},"PeriodicalIF":0.0,"publicationDate":"2016-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73883934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 168

Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering 对Paxos开销说不:用网络排序取代共识

Proceedings of the -- USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Symposium on Operating Systems Design and Implementation

Pub Date : 2016-11-02 DOI: 10.5555/3026877.3026914

Jialin Li, Ellis Michael, Naveen Kr. Sharma, Adriana Szekeres, Dan R. K. Ports

Distributed applications use replication, implemented by protocols like Paxos, to ensure data availability and transparently mask server failures. This paper presents a new approach to achieving replication in the data center without the performance cost of traditional methods. Our work carefully divides replication responsibility between the network and protocol layers. The network orders requests but does not ensure reliable delivery - using a new primitive we call ordered unreliable multicast (OUM). Implementing this primitive can be achieved with near-zero-cost in the data center. Our new replication protocol, Network-Ordered Paxos (NOPaxos), exploits network ordering to provide strongly consistent replication without coordination. The resulting system not only outperforms both latency - and throughput-optimized protocols on their respective metrics, but also yields throughput within 2% and latency within 16 ms of an unreplicated system - providing replication without the performance cost.

分布式应用程序使用由Paxos等协议实现的复制来确保数据可用性并透明地屏蔽服务器故障。本文提出了一种新的方法来实现数据中心的复制，而不需要传统方法的性能成本。我们的工作在网络层和协议层之间仔细地划分了复制责任。网络命令请求，但不保证可靠的交付-使用一种新的原语，我们称之为有序不可靠多播(OUM)。在数据中心中，实现这个原语的成本几乎为零。我们的新复制协议，网络有序的Paxos (NOPaxos)，利用网络有序来提供不需要协调的强一致性复制。由此产生的系统不仅在各自的指标上优于延迟和吞吐量优化的协议，而且在未复制的系统上产生2%的吞吐量和16ms的延迟——在不牺牲性能的情况下提供复制。

{"title":"Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering","authors":"Jialin Li, Ellis Michael, Naveen Kr. Sharma, Adriana Szekeres, Dan R. K. Ports","doi":"10.5555/3026877.3026914","DOIUrl":"https://doi.org/10.5555/3026877.3026914","url":null,"abstract":"Distributed applications use replication, implemented by protocols like Paxos, to ensure data availability and transparently mask server failures. This paper presents a new approach to achieving replication in the data center without the performance cost of traditional methods. Our work carefully divides replication responsibility between the network and protocol layers. The network orders requests but does not ensure reliable delivery - using a new primitive we call ordered unreliable multicast (OUM). Implementing this primitive can be achieved with near-zero-cost in the data center. Our new replication protocol, Network-Ordered Paxos (NOPaxos), exploits network ordering to provide strongly consistent replication without coordination. The resulting system not only outperforms both latency - and throughput-optimized protocols on their respective metrics, but also yields throughput within 2% and latency within 16 ms of an unreplicated system - providing replication without the performance cost.","PeriodicalId":90294,"journal":{"name":"Proceedings of the -- USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Symposium on Operating Systems Design and Implementation","volume":"48 1","pages":"467-483"},"PeriodicalIF":0.0,"publicationDate":"2016-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72635711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 170

NetBricks: Taking the V out of NFV NetBricks:从NFV中剔除V

Proceedings of the -- USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Symposium on Operating Systems Design and Implementation

Pub Date : 2016-11-02 DOI: 10.5555/3026877.3026894

Aurojit Panda, Sangjin Han, K. Jang, Melvin Walls, S. Ratnasamy, S. Shenker

The move from hardware middleboxes to software network functions, as advocated by NFV, has proven more challenging than expected. Developing new NFs remains a tedious process, requiring that developers repeatedly rediscover and reapply the same set of optimizations, while current techniques for providing isolation between NFs (using VMs or containers) incur high performance overheads. In this paper we describe NetBricks, a new NFV framework that tackles both these problems. For building NFs we take inspiration from modern data analytics frameworks (e.g., Spark and Dryad) and build a small set of customizable network processing elements. We also embrace type checking and safe runtimes to provide isolation in software, rather than rely on hardware isolation. NetBricks provides the same memory isolation as containers and VMs, without incurring the same performance penalties. To improve I/O efficiency, we introduce a novel technique called zero-copy software isolation.

NFV所倡导的从硬件中间体到软件网络功能的转变比预期的更具挑战性。开发新的NFs仍然是一个乏味的过程，要求开发人员反复重新发现并重新应用相同的优化集，而目前提供NFs之间隔离的技术(使用vm或容器)会带来很高的性能开销。在本文中，我们描述了NetBricks，一个新的NFV框架，解决了这两个问题。为了构建NFs，我们从现代数据分析框架(例如Spark和Dryad)中获得灵感，并构建了一组可定制的网络处理元素。我们还采用类型检查和安全运行时来在软件中提供隔离，而不是依赖于硬件隔离。NetBricks提供了与容器和虚拟机相同的内存隔离，而不会产生相同的性能损失。为了提高I/O效率，我们引入了一种称为零拷贝软件隔离的新技术。

引用次数: 263

Shuffler: Fast and Deployable Continuous Code Re-Randomization Shuffler:快速和可部署的连续代码重新随机化

Proceedings of the -- USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Symposium on Operating Systems Design and Implementation

Pub Date : 2016-11-02 DOI: 10.5555/3026877.3026906

David Williams-King, Graham Gobieski, Kent Williams-King, James P. Blake, Xinhao Yuan, Patrick Colp, Michelle Zheng, V. Kemerlis, Junfeng Yang, W. Aiello

While code injection attacks have been virtually eliminated on modern systems, programs today remain vulnerable to code reuse attacks. Particularly pernicious are Just-In-Time ROP (JIT-ROP) techniques, where an attacker uses a memory disclosure vulnerability to discover code gadgets at runtime. We designed a code-reuse defense, called Shuffler, which continuously re-randomizes code locations on the order of milliseconds, introducing a real-time deadline on the attacker. This deadline makes it extremely difficult to form a complete exploit, particularly against server programs that often sit tens of milliseconds away from attacker machines.Shuffler focuses on being fast, self-hosting, and nonintrusive to the end user. Specifically, for speed, Shuffler randomizes code asynchronously in a separate thread and atomically switches from one code copy to the next. For security, Shuffler adopts an "egalitarian" principle and randomizes itself the same way it does the target. Lastly, to deploy Shuffler, no source, kernel, compiler, or hardware modifications are necessary.Evaluation shows that Shuffler defends against all known forms of code reuse, including ROP, direct JIT-ROP, indirect JIT-ROP, and Blind ROP. We observed 14.9% overhead on SPEC CPU when shuffling every 50 ms, and ran Shuffler on real-world applications such as Nginx. We showed that the shuffled Nginx scales up to 24 worker processes on 12 cores.

虽然代码注入攻击实际上已经在现代系统中消除了，但今天的程序仍然容易受到代码重用攻击。尤其有害的是即时ROP (JIT-ROP)技术，攻击者利用内存披露漏洞在运行时发现代码部件。我们设计了一个代码重用防御，称为Shuffler，它以毫秒为单位不断地重新随机化代码位置，为攻击者引入了一个实时截止日期。这个截止日期使得形成一个完整的漏洞非常困难，特别是针对通常距离攻击者计算机几十毫秒的服务器程序。Shuffler专注于快速、自托管和对最终用户无干扰。具体来说，为了提高速度，Shuffler在单独的线程中异步地随机化代码，并自动地从一个代码副本切换到下一个代码副本。为了安全起见，Shuffler采用了“平等主义”原则，并以与目标相同的方式随机化自己。最后，要部署Shuffler，不需要修改源代码、内核、编译器或硬件。评估表明，Shuffler可以抵御所有已知形式的代码重用，包括ROP、直接JIT-ROP、间接JIT-ROP和盲目ROP。我们观察到，当每50毫秒进行一次shuffle时，SPEC CPU的开销为14.9%，并在实际应用程序(如Nginx)上运行Shuffler。我们展示了经过重组的Nginx可以在12核上扩展到24个工作进程。

{"title":"Shuffler: Fast and Deployable Continuous Code Re-Randomization","authors":"David Williams-King, Graham Gobieski, Kent Williams-King, James P. Blake, Xinhao Yuan, Patrick Colp, Michelle Zheng, V. Kemerlis, Junfeng Yang, W. Aiello","doi":"10.5555/3026877.3026906","DOIUrl":"https://doi.org/10.5555/3026877.3026906","url":null,"abstract":"While code injection attacks have been virtually eliminated on modern systems, programs today remain vulnerable to code reuse attacks. Particularly pernicious are Just-In-Time ROP (JIT-ROP) techniques, where an attacker uses a memory disclosure vulnerability to discover code gadgets at runtime. We designed a code-reuse defense, called Shuffler, which continuously re-randomizes code locations on the order of milliseconds, introducing a real-time deadline on the attacker. This deadline makes it extremely difficult to form a complete exploit, particularly against server programs that often sit tens of milliseconds away from attacker machines.Shuffler focuses on being fast, self-hosting, and nonintrusive to the end user. Specifically, for speed, Shuffler randomizes code asynchronously in a separate thread and atomically switches from one code copy to the next. For security, Shuffler adopts an \"egalitarian\" principle and randomizes itself the same way it does the target. Lastly, to deploy Shuffler, no source, kernel, compiler, or hardware modifications are necessary.Evaluation shows that Shuffler defends against all known forms of code reuse, including ROP, direct JIT-ROP, indirect JIT-ROP, and Blind ROP. We observed 14.9% overhead on SPEC CPU when shuffling every 50 ms, and ran Shuffler on real-world applications such as Nginx. We showed that the shuffled Nginx scales up to 24 worker processes on 12 cores.","PeriodicalId":90294,"journal":{"name":"Proceedings of the -- USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Symposium on Operating Systems Design and Implementation","volume":"51 1","pages":"367-382"},"PeriodicalIF":0.0,"publicationDate":"2016-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79538869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 120

Pebbles: Fine-Grained Data Management Abstractions for Modern Operating Systems 卵石:现代操作系统的细粒度数据管理抽象

Proceedings of the -- USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Symposium on Operating Systems Design and Implementation

Pub Date : 2014-10-06 DOI: 10.5555/2685048.2685058

Riley Spahn, Jonathan Bell, Michael Z. Lee, Sravan Bhamidipati, Roxana Geambasu, G. Kaiser

Support for fine-grained data management has all but disappeared from modern operating systems such as Android and iOS. Instead, we must rely on each individual application to manage our data properly - e.g., to delete our emails, documents, and photos in full upon request; to not collect more data than required for its function; and to back up our data to reliable backends. Yet, research studies and media articles constantly remind us of the poor data management practices applied by our applications. We have developed Pebbles, a fine-grained data management system that enables management at a powerful new level of abstraction: application-level data objects, such as emails, documents, notes, notebooks, bank accounts, etc. The key contribution is Pebbles's ability to discover such high-level objects in arbitrary applications without requiring any input from or modifications to these applications. Intuitively, it seems impossible for an OS-level service to understand object structures in unmodified applications, however we observe that the high-level storage abstractions embedded in modern OSes - relational databases and object-relational mappers - bear significant structural information that makes object recognition possible and accurate.

对细粒度数据管理的支持几乎已经从Android和iOS等现代操作系统中消失了。相反，我们必须依靠每个单独的应用程序来妥善管理我们的数据——例如，根据要求删除我们的电子邮件、文档和照片;不收集超出其功能所需的数据;将我们的数据备份到可靠的后端。然而，研究报告和媒体文章不断提醒我们，我们的应用程序所应用的数据管理实践很差。我们已经开发了卵石，这是一个细粒度的数据管理系统，可以在一个强大的新抽象层次上进行管理:应用程序级的数据对象，如电子邮件、文档、笔记、笔记本、银行账户等。关键的贡献在于，Pebbles能够在任意应用程序中发现这些高级对象，而不需要对这些应用程序进行任何输入或修改。直观地说，操作系统级别的服务似乎不可能理解未经修改的应用程序中的对象结构，然而我们观察到，嵌入在现代操作系统中的高级存储抽象——关系数据库和对象关系映射器——包含重要的结构信息，使对象识别成为可能和准确。

{"title":"Pebbles: Fine-Grained Data Management Abstractions for Modern Operating Systems","authors":"Riley Spahn, Jonathan Bell, Michael Z. Lee, Sravan Bhamidipati, Roxana Geambasu, G. Kaiser","doi":"10.5555/2685048.2685058","DOIUrl":"https://doi.org/10.5555/2685048.2685058","url":null,"abstract":"Support for fine-grained data management has all but disappeared from modern operating systems such as Android and iOS. Instead, we must rely on each individual application to manage our data properly - e.g., to delete our emails, documents, and photos in full upon request; to not collect more data than required for its function; and to back up our data to reliable backends. Yet, research studies and media articles constantly remind us of the poor data management practices applied by our applications. We have developed Pebbles, a fine-grained data management system that enables management at a powerful new level of abstraction: application-level data objects, such as emails, documents, notes, notebooks, bank accounts, etc. The key contribution is Pebbles's ability to discover such high-level objects in arbitrary applications without requiring any input from or modifications to these applications. Intuitively, it seems impossible for an OS-level service to understand object structures in unmodified applications, however we observe that the high-level storage abstractions embedded in modern OSes - relational databases and object-relational mappers - bear significant structural information that makes object recognition possible and accurate.","PeriodicalId":90294,"journal":{"name":"Proceedings of the -- USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Symposium on Operating Systems Design and Implementation","volume":"15 1","pages":"113-129"},"PeriodicalIF":0.0,"publicationDate":"2014-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72915290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 36

Heading Off Correlated Failures through Independence-as-a-Service 通过独立即服务阻止相关故障

Proceedings of the -- USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Symposium on Operating Systems Design and Implementation

Pub Date : 2014-10-06 DOI: 10.5555/2685048.2685073

Ennan Zhai, Ruichuan Chen, D. Wolinsky, B. Ford

Today's systems pervasively rely on redundancy to ensure reliability. In complex multi-layered hardware/software stacks, however - especially in the clouds where many independent businesses deploy interacting services on common infrastructure - seemingly independent systems may share deep, hidden dependencies, undermining redundancy efforts and introducing unanticipated correlated failures. Complementing existing post-failure forensics, we propose Independence-as-a-Service (or INDaaS), an architecture to audit the independence of redundant systems proactively, thus avoiding correlated failures. INDaaS first utilizes pluggable dependency acquisition modules to collect the structural dependency information (including network, hardware, and software dependencies) from a variety of sources. With this information, INDaaS then quantifies the independence of systems of interest using pluggable auditing modules, offering various performance, precision, and data secrecy tradeoffs. While the most general and efficient auditing modules assume the auditor is able to obtain all required information, INDaaS can employ private set intersection cardinality protocols to quantify the independence even across businesses unwilling to share their full structural information with anyone. We evaluate the practicality of INDaaS with three case studies via auditing realistic network, hardware, and software dependency structures.

今天的系统普遍依赖冗余来确保可靠性。然而，在复杂的多层硬件/软件堆栈中——特别是在云中，许多独立的业务在公共基础设施上部署交互服务——看似独立的系统可能会共享深层的、隐藏的依赖关系，从而破坏冗余工作并引入意想不到的相关故障。为了补充现有的故障后取证，我们提出了独立即服务(independence -as-a- service，或INDaaS)，这是一种主动审计冗余系统独立性的体系结构，从而避免相关故障。INDaaS首先利用可插入的依赖项获取模块从各种来源收集结构依赖项信息(包括网络、硬件和软件依赖项)。有了这些信息，INDaaS就可以使用可插入的审计模块来量化相关系统的独立性，从而提供各种性能、精度和数据保密方面的折衷。虽然最通用和最有效的审计模块假设审计员能够获得所有必需的信息，但INDaaS可以使用私有集合交集基数协议来量化独立性，甚至跨不愿与任何人共享其完整结构信息的业务。我们通过审计现实的网络、硬件和软件依赖结构，通过三个案例研究来评估INDaaS的实用性。

{"title":"Heading Off Correlated Failures through Independence-as-a-Service","authors":"Ennan Zhai, Ruichuan Chen, D. Wolinsky, B. Ford","doi":"10.5555/2685048.2685073","DOIUrl":"https://doi.org/10.5555/2685048.2685073","url":null,"abstract":"Today's systems pervasively rely on redundancy to ensure reliability. In complex multi-layered hardware/software stacks, however - especially in the clouds where many independent businesses deploy interacting services on common infrastructure - seemingly independent systems may share deep, hidden dependencies, undermining redundancy efforts and introducing unanticipated correlated failures. Complementing existing post-failure forensics, we propose Independence-as-a-Service (or INDaaS), an architecture to audit the independence of redundant systems proactively, thus avoiding correlated failures. INDaaS first utilizes pluggable dependency acquisition modules to collect the structural dependency information (including network, hardware, and software dependencies) from a variety of sources. With this information, INDaaS then quantifies the independence of systems of interest using pluggable auditing modules, offering various performance, precision, and data secrecy tradeoffs. While the most general and efficient auditing modules assume the auditor is able to obtain all required information, INDaaS can employ private set intersection cardinality protocols to quantify the independence even across businesses unwilling to share their full structural information with anyone. We evaluate the practicality of INDaaS with three case studies via auditing realistic network, hardware, and software dependency structures.","PeriodicalId":90294,"journal":{"name":"Proceedings of the -- USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Symposium on Operating Systems Design and Implementation","volume":"130 1","pages":"317-334"},"PeriodicalIF":0.0,"publicationDate":"2014-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75399065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 44

GraphX: Graph Processing in a Distributed Dataflow Framework GraphX:分布式数据流框架中的图形处理

Proceedings of the -- USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Symposium on Operating Systems Design and Implementation

Pub Date : 2014-10-06 DOI: 10.5555/2685048.2685096

Joseph E. Gonzalez, Reynold Xin, Ankur Dave, D. Crankshaw, M. Franklin, I. Stoica

In pursuit of graph processing performance, the systems community has largely abandoned general-purpose distributed dataflow frameworks in favor of specialized graph processing systems that provide tailored programming abstractions and accelerate the execution of iterative graph algorithms. In this paper we argue that many of the advantages of specialized graph processing systems can be recovered in a modern general-purpose distributed dataflow system. We introduce GraphX, an embedded graph processing framework built on top of Apache Spark, a widely used distributed dataflow system. GraphX presents a familiar composable graph abstraction that is sufficient to express existing graph APIs, yet can be implemented using only a few basic dataflow operators (e.g., join, map, group-by). To achieve performance parity with specialized graph systems, GraphX recasts graph-specific optimizations as distributed join optimizations and materialized view maintenance. By leveraging advances in distributed dataflow frameworks, GraphX brings low-cost fault tolerance to graph processing. We evaluate GraphX on real workloads and demonstrate that GraphX achieves an order of magnitude performance gain over the base dataflow framework and matches the performance of specialized graph processing systems while enabling a wider range of computation.

为了追求图形处理性能，系统社区在很大程度上放弃了通用的分布式数据流框架，转而支持专门的图形处理系统，这些系统提供了定制的编程抽象，并加速了迭代图形算法的执行。在本文中，我们认为在现代通用分布式数据流系统中可以恢复专门图形处理系统的许多优点。本文介绍了基于Apache Spark的嵌入式图形处理框架GraphX, Apache Spark是一个广泛使用的分布式数据流系统。GraphX提供了一种熟悉的可组合图形抽象，它足以表达现有的图形api，但只需要使用几个基本的数据流操作符(例如，join、map、group-by)就可以实现。为了实现与专用图系统的性能对等，GraphX将特定于图的优化重新转换为分布式连接优化和物化视图维护。通过利用分布式数据流框架的先进技术，GraphX为图形处理带来了低成本的容错能力。我们在实际工作负载上评估GraphX，并证明GraphX在基本数据流框架上实现了一个数量级的性能增益，并且在支持更广泛的计算范围的同时，与专门的图形处理系统的性能相匹配。

{"title":"GraphX: Graph Processing in a Distributed Dataflow Framework","authors":"Joseph E. Gonzalez, Reynold Xin, Ankur Dave, D. Crankshaw, M. Franklin, I. Stoica","doi":"10.5555/2685048.2685096","DOIUrl":"https://doi.org/10.5555/2685048.2685096","url":null,"abstract":"In pursuit of graph processing performance, the systems community has largely abandoned general-purpose distributed dataflow frameworks in favor of specialized graph processing systems that provide tailored programming abstractions and accelerate the execution of iterative graph algorithms. In this paper we argue that many of the advantages of specialized graph processing systems can be recovered in a modern general-purpose distributed dataflow system. We introduce GraphX, an embedded graph processing framework built on top of Apache Spark, a widely used distributed dataflow system. GraphX presents a familiar composable graph abstraction that is sufficient to express existing graph APIs, yet can be implemented using only a few basic dataflow operators (e.g., join, map, group-by). To achieve performance parity with specialized graph systems, GraphX recasts graph-specific optimizations as distributed join optimizations and materialized view maintenance. By leveraging advances in distributed dataflow frameworks, GraphX brings low-cost fault tolerance to graph processing. We evaluate GraphX on real workloads and demonstrate that GraphX achieves an order of magnitude performance gain over the base dataflow framework and matches the performance of specialized graph processing systems while enabling a wider range of computation.","PeriodicalId":90294,"journal":{"name":"Proceedings of the -- USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Symposium on Operating Systems Design and Implementation","volume":"45 1","pages":"599-613"},"PeriodicalIF":0.0,"publicationDate":"2014-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85783191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1078

Nail: A Practical Tool for Parsing and Generating Data Formats 解析和生成数据格式的实用工具

Proceedings of the -- USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Symposium on Operating Systems Design and Implementation

Pub Date : 2014-10-06 DOI: 10.5555/2685048.2685098

Julian Bangert, N. Zeldovich

Nail is a tool that greatly reduces the programmer effort for safely parsing and generating data formats defined by a grammar. Nail introduces several key ideas to achieve its goal. First, Nail uses a protocol grammar to define not just the data format, but also the internal object model of the data. Second, Nail eliminates the notion of semantic actions, used by existing parser generators, which reduces the expressive power but allows Nail to both parse data formats and generate them from the internal object model, by establishing a semantic bijection between the data format and the object model. Third, Nail introduces dependent fields and stream transforms to capture protocol features such as size and offset fields, checksums, and compressed data, which are impractical to express in existing protocol languages. Using Nail, we implement an authoritative DNS server in C in under 300 lines of code and grammar, and an unzip program in C in 220 lines of code and grammar, demonstrating that Nail makes it easy to parse complex real-world data formats. Performance experiments show that a Nail-based DNS server can outperform the widely used BIND DNS server on an authoritative workload, demonstrating that systems built with Nail can achieve good performance.

Nail是一种工具，它大大减少了程序员安全解析和生成由语法定义的数据格式的工作量。Nail介绍了几个实现其目标的关键思想。首先，Nail使用协议语法不仅定义数据格式，还定义数据的内部对象模型。其次，通过在数据格式和对象模型之间建立语义双射，Nail消除了现有解析器生成器使用的语义操作的概念，这降低了表达能力，但允许Nail解析数据格式并从内部对象模型生成它们。第三，Nail引入了依赖字段和流转换来捕获协议特性，如大小和偏移量字段、校验和和压缩数据，这些在现有的协议语言中是不切实际的。使用Nail，我们用不到300行代码和语法用C语言实现了一个权威的DNS服务器，用220行代码和语法用C语言实现了一个解压缩程序，证明了Nail可以很容易地解析复杂的现实世界的数据格式。性能实验表明，基于Nail的DNS服务器在权威工作负载下的性能优于广泛使用的BIND DNS服务器，表明使用Nail构建的系统可以获得良好的性能。

{"title":"Nail: A Practical Tool for Parsing and Generating Data Formats","authors":"Julian Bangert, N. Zeldovich","doi":"10.5555/2685048.2685098","DOIUrl":"https://doi.org/10.5555/2685048.2685098","url":null,"abstract":"Nail is a tool that greatly reduces the programmer effort for safely parsing and generating data formats defined by a grammar. Nail introduces several key ideas to achieve its goal. First, Nail uses a protocol grammar to define not just the data format, but also the internal object model of the data. Second, Nail eliminates the notion of semantic actions, used by existing parser generators, which reduces the expressive power but allows Nail to both parse data formats and generate them from the internal object model, by establishing a semantic bijection between the data format and the object model. Third, Nail introduces dependent fields and stream transforms to capture protocol features such as size and offset fields, checksums, and compressed data, which are impractical to express in existing protocol languages. Using Nail, we implement an authoritative DNS server in C in under 300 lines of code and grammar, and an unzip program in C in 220 lines of code and grammar, demonstrating that Nail makes it easy to parse complex real-world data formats. Performance experiments show that a Nail-based DNS server can outperform the widely used BIND DNS server on an authoritative workload, demonstrating that systems built with Nail can achieve good performance.","PeriodicalId":90294,"journal":{"name":"Proceedings of the -- USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Symposium on Operating Systems Design and Implementation","volume":"39 1","pages":"615-628"},"PeriodicalIF":0.0,"publicationDate":"2014-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74800919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 45

Eternal Sunshine of the Spotless Machine: Protecting Privacy with Ephemeral Channels. 一尘不染机器的永恒阳光:用短暂通道保护隐私。

Proceedings of the -- USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Symposium on Operating Systems Design and Implementation

Pub Date : 2012-01-01

Alan M Dunn, Michael Z Lee, Suman Jana, Sangman Kim, Mark Silberstein, Yuanzhong Xu, Vitaly Shmatikov, Emmett Witchel

Modern systems keep long memories. As we show in this paper, an adversary who gains access to a Linux system, even one that implements secure deallocation, can recover the contents of applications' windows, audio buffers, and data remaining in device drivers-long after the applications have terminated. We design and implement Lacuna, a system that allows users to run programs in "private sessions." After the session is over, all memories of its execution are erased. The key abstraction in Lacuna is an ephemeral channel, which allows the protected program to talk to peripheral devices while making it possible to delete the memories of this communication from the host. Lacuna can run unmodified applications that use graphics, sound, USB input devices, and the network, with only 20 percentage points of additional CPU utilization.

现代系统可以保存很长的记忆。正如我们在本文中所展示的那样，攻击者获得了对Linux系统的访问权，即使是实现了安全回收的系统，也可以在应用程序终止很久之后恢复应用程序窗口的内容、音频缓冲区和设备驱动程序中保留的数据。我们设计并实现了Lacuna，一个允许用户在“私人会话”中运行程序的系统。会话结束后，执行会话的所有记忆将被擦除。Lacuna的关键抽象是一个短暂的通道，它允许受保护的程序与外围设备通信，同时可以从主机删除此通信的记忆。Lacuna可以运行使用图形、声音、USB输入设备和网络的未经修改的应用程序，而CPU利用率仅增加20%。

引用次数: 0

Reining in the Outliers in Map-Reduce Clusters using Mantri 使用Mantri控制Map-Reduce集群中的异常值

Proceedings of the -- USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Symposium on Operating Systems Design and Implementation

Pub Date : 2010-10-04 DOI: 10.5555/1924943.1924962

G. Ananthanarayanan, Srikanth Kandula, A. Greenberg, I. Stoica, Yi Lu, Bikas Saha, E. Harris

Experience froman operational Map-Reduce cluster reveals that outliers significantly prolong job completion. The causes for outliers include run-time contention for processor, memory and other resources, disk failures, varying bandwidth and congestion along network paths and, imbalance in task workload. We present Mantri, a system that monitors tasks and culls outliers using cause- and resource-aware techniques. Mantri's strategies include restarting outliers, network-aware placement of tasks and protecting outputs of valuable tasks. Using real-time progress reports, Mantri detects and acts on outliers early in their lifetime. Early action frees up resources that can be used by subsequent tasks and expedites the job overall. Acting based on the causes and the resource and opportunity cost of actions lets Mantri improve over prior work that only duplicates the laggards. Deployment in Bing's production clusters and trace-driven simulations show that Mantri improves job completion times by 32%.

Map-Reduce集群的运行经验表明，异常值显著延长了作业完成时间。异常值的原因包括运行时对处理器、内存和其他资源的争用、磁盘故障、网络路径上的带宽变化和拥塞，以及任务工作负载的不平衡。我们介绍了Mantri，一个监控任务并使用原因和资源感知技术剔除异常值的系统。Mantri的策略包括重新启动异常值，网络感知任务的放置以及保护有价值任务的输出。使用实时进度报告，Mantri可以在异常值生命周期的早期发现并采取行动。早期行动释放了后续任务可以使用的资源，并加快了整个工作。根据行动的原因、资源和机会成本采取行动，使Mantri能够改进之前的工作，而之前的工作只是复制了落后者。在必应生产集群中的部署和跟踪驱动的模拟显示，Mantri将作业完成时间提高了32%。

{"title":"Reining in the Outliers in Map-Reduce Clusters using Mantri","authors":"G. Ananthanarayanan, Srikanth Kandula, A. Greenberg, I. Stoica, Yi Lu, Bikas Saha, E. Harris","doi":"10.5555/1924943.1924962","DOIUrl":"https://doi.org/10.5555/1924943.1924962","url":null,"abstract":"Experience froman operational Map-Reduce cluster reveals that outliers significantly prolong job completion. The causes for outliers include run-time contention for processor, memory and other resources, disk failures, varying bandwidth and congestion along network paths and, imbalance in task workload. We present Mantri, a system that monitors tasks and culls outliers using cause- and resource-aware techniques. Mantri's strategies include restarting outliers, network-aware placement of tasks and protecting outputs of valuable tasks. Using real-time progress reports, Mantri detects and acts on outliers early in their lifetime. Early action frees up resources that can be used by subsequent tasks and expedites the job overall. Acting based on the causes and the resource and opportunity cost of actions lets Mantri improve over prior work that only duplicates the laggards. Deployment in Bing's production clusters and trace-driven simulations show that Mantri improves job completion times by 32%.","PeriodicalId":90294,"journal":{"name":"Proceedings of the -- USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Symposium on Operating Systems Design and Implementation","volume":"11 1","pages":"265-278"},"PeriodicalIF":0.0,"publicationDate":"2010-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77678012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 770