Proceedings of the 27th ACM Symposium on Operating Systems Principles最新文献

英文中文

I4 预告

Proceedings of the 27th ACM Symposium on Operating Systems Principles

Pub Date : 2019-10-27 DOI: 10.1145/3341301.3359651

Haojun Ma, Aman Goel, Jean-Baptiste Jeannin, Manos Kapritsos, Baris Kasikci, K. Sakallah

Designing and implementing distributed systems correctly is a very challenging task. Recently, formal verification has been successfully used to prove the correctness of distributed systems. At the heart of formal verification lies a computer-checked proof with an inductive invariant. Finding this inductive invariant, however, is the most difficult part of the proof. Alas, current proof techniques require inductive invariants to be found manually---and painstakingly---by the developer. In this paper, we present a new approach, Incremental Inference of Inductive Invariants (I4), to automatically generate inductive invariants for distributed protocols. The essence of our idea is simple: the inductive invariant of a finite instance of the protocol can be used to infer a general inductive invariant for the infinite distributed protocol. In I4, we create a finite instance of the protocol; use a model checking tool to automatically derive the inductive invariant for this finite instance; and generalize this invariant to an inductive invariant for the infinite protocol. Our experiments show that I4 can prove the correctness of several distributed protocols like Chord, 2PC and Transaction Chains with little to no human effort.

{"title":"I4","authors":"Haojun Ma, Aman Goel, Jean-Baptiste Jeannin, Manos Kapritsos, Baris Kasikci, K. Sakallah","doi":"10.1145/3341301.3359651","DOIUrl":"https://doi.org/10.1145/3341301.3359651","url":null,"abstract":"Designing and implementing distributed systems correctly is a very challenging task. Recently, formal verification has been successfully used to prove the correctness of distributed systems. At the heart of formal verification lies a computer-checked proof with an inductive invariant. Finding this inductive invariant, however, is the most difficult part of the proof. Alas, current proof techniques require inductive invariants to be found manually---and painstakingly---by the developer. In this paper, we present a new approach, Incremental Inference of Inductive Invariants (I4), to automatically generate inductive invariants for distributed protocols. The essence of our idea is simple: the inductive invariant of a finite instance of the protocol can be used to infer a general inductive invariant for the infinite distributed protocol. In I4, we create a finite instance of the protocol; use a model checking tool to automatically derive the inductive invariant for this finite instance; and generalize this invariant to an inductive invariant for the infinite protocol. Our experiments show that I4 can prove the correctness of several distributed protocols like Chord, 2PC and Transaction Chains with little to no human effort.","PeriodicalId":331561,"journal":{"name":"Proceedings of the 27th ACM Symposium on Operating Systems Principles","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115601646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

KVell

Proceedings of the 27th ACM Symposium on Operating Systems Principles

Pub Date : 2019-10-27 DOI: 10.1145/3341301.3359628

Baptiste Lepers, Oana Balmau, Karan Gupta, W. Zwaenepoel

Modern block-addressable NVMe SSDs provide much higher bandwidth and similar performance for random and sequential access. Persistent key-value stores (KVs) designed for earlier storage devices, using either Log-Structured Merge (LSM) or B trees, do not take full advantage of these new devices. Logic to avoid random accesses, expensive operations for keeping data sorted on disk, and synchronization bottlenecks make these KVs CPU-bound on NVMe SSDs. We present a new persistent KV design. Unlike earlier designs, no attempt is made at sequential access, and data is not sorted when stored on disk. A shared-nothing philosophy is adopted to avoid synchronization overhead. Together with batching of device accesses, these design decisions make for read and write performance close to device bandwidth. Finally, maintaining an inexpensive partial sort in memory produces adequate scan performance. We implement this design in KVell, the first persistent KV able to utilize modern NVMe SSDs at maximum bandwidth. We compare KVell against available state-of-the-art LSM and B tree KVs, both with synthetic benchmarks and production workloads. KVell achieves throughput at least 2x that of its closest competitor on read-dominated workloads, and 5x on write-dominated workloads. For workloads that contain mostly scans, KVell performs comparably or better than its competitors. KVell provides maximum latencies an order of magnitude lower than the best of its competitors, even on scan-based workloads.

{"title":"KVell","authors":"Baptiste Lepers, Oana Balmau, Karan Gupta, W. Zwaenepoel","doi":"10.1145/3341301.3359628","DOIUrl":"https://doi.org/10.1145/3341301.3359628","url":null,"abstract":"Modern block-addressable NVMe SSDs provide much higher bandwidth and similar performance for random and sequential access. Persistent key-value stores (KVs) designed for earlier storage devices, using either Log-Structured Merge (LSM) or B trees, do not take full advantage of these new devices. Logic to avoid random accesses, expensive operations for keeping data sorted on disk, and synchronization bottlenecks make these KVs CPU-bound on NVMe SSDs. We present a new persistent KV design. Unlike earlier designs, no attempt is made at sequential access, and data is not sorted when stored on disk. A shared-nothing philosophy is adopted to avoid synchronization overhead. Together with batching of device accesses, these design decisions make for read and write performance close to device bandwidth. Finally, maintaining an inexpensive partial sort in memory produces adequate scan performance. We implement this design in KVell, the first persistent KV able to utilize modern NVMe SSDs at maximum bandwidth. We compare KVell against available state-of-the-art LSM and B tree KVs, both with synthetic benchmarks and production workloads. KVell achieves throughput at least 2x that of its closest competitor on read-dominated workloads, and 5x on write-dominated workloads. For workloads that contain mostly scans, KVell performs comparably or better than its competitors. KVell provides maximum latencies an order of magnitude lower than the best of its competitors, even on scan-based workloads.","PeriodicalId":331561,"journal":{"name":"Proceedings of the 27th ACM Symposium on Operating Systems Principles","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116909935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 101

Scaling symbolic evaluation for automated verification of systems code with Serval 缩放符号计算，用于系统代码的自动验证

Proceedings of the 27th ACM Symposium on Operating Systems Principles

Pub Date : 2019-10-27 DOI: 10.1145/3341301.3359641

Luke Nelson, James Bornholt, Ronghui Gu, Andrew Baumann, E. Torlak, Xi Wang

This paper presents Serval, a framework for developing automated verifiers for systems software. Serval provides an extensible infrastructure for creating verifiers by lifting interpreters under symbolic evaluation, and a systematic approach to identifying and repairing verification performance bottlenecks using symbolic profiling and optimizations. Using Serval, we build automated verifiers for the RISC-V, x86--32, LLVM, and BPF instruction sets. We report our experience of retrofitting CertiKOS and Komodo, two systems previously verified using Coq and Dafny, respectively, for automated verification using Serval, and discuss trade-offs of different verification methodologies. In addition, we apply Serval to the Keystone security monitor and the BPF compilers in the Linux kernel, and uncover 18 new bugs through verification, all confirmed and fixed by developers.

本文提出了一个开发系统软件自动验证器的框架。它提供了一个可扩展的基础设施，通过在符号评估下提升解释器来创建验证器，并提供了一个系统的方法来识别和修复使用符号分析和优化的验证性能瓶颈。使用few，我们为RISC-V、x86—32、LLVM和BPF指令集构建了自动验证器。我们报告了我们改造CertiKOS和Komodo的经验，这两个系统之前分别使用Coq和Dafny进行了验证，并使用几个进行了自动验证，并讨论了不同验证方法的权衡。此外，我们在Linux内核中的Keystone安全监视器和BPF编译器中应用了几个，通过验证发现了18个新的bug，这些bug都得到了开发人员的确认和修复。

引用次数: 75

Recipe: converting concurrent DRAM indexes to persistent-memory indexes 配方:将并发DRAM索引转换为持久内存索引

Proceedings of the 27th ACM Symposium on Operating Systems Principles

Pub Date : 2019-09-23 DOI: 10.1145/3341301.3359635

Se Kwon Lee, Jayashree Mohan, Sanidhya Kashyap, Taesoo Kim, Vijay Chidambaram

We present Recipe, a principled approach for converting concurrent DRAM indexes into crash-consistent indexes for persistent memory (PM). The main insight behind Recipe is that isolation provided by a certain class of concurrent in-memory indexes can be translated with small changes to crash-consistency when the same index is used in PM. We present a set of conditions that enable the identification of this class of DRAM indexes, and the actions to be taken to convert each index to be persistent. Based on these conditions and conversion actions, we modify five different DRAM indexes based on B+ trees, tries, radix trees, and hash tables to their crash-consistent PM counterparts. The effort involved in this conversion is minimal, requiring 30--200 lines of code. We evaluated the converted PM indexes on Intel DC Persistent Memory, and found that they outperform state-of-the-art, hand-crafted PM indexes in multi-threaded workloads by up-to 5.2x. For example, we built P-CLHT, our PM implementation of the CLHT hash table by modifying only 30 LOC. When running YCSB workloads, P-CLHT performs up to 2.4x better than Cacheline-Conscious Extendible Hashing (CCEH), the state-of-the-art PM hash table.

我们提出Recipe，这是一种将并发DRAM索引转换为持久内存(PM)的崩溃一致索引的原则方法。Recipe背后的主要见解是，当在PM中使用相同的索引时，由某类并发内存索引提供的隔离可以通过微小的更改转换为崩溃一致性。我们提供了一组条件，用于识别这类DRAM索引，以及将每个索引转换为持久索引所采取的操作。基于这些条件和转换操作，我们基于B+树、尝试、基数树和散列表将五种不同的DRAM索引修改为它们的崩溃一致的PM对应项。这种转换所涉及的工作量很小，只需要30- 200行代码。我们在Intel DC Persistent Memory上评估了转换后的PM索引，发现它们在多线程工作负载中比最先进的手工制作的PM索引性能高出5.2倍。例如，我们通过仅修改30个LOC来构建P-CLHT，这是CLHT哈希表的PM实现。当运行YCSB工作负载时，P-CLHT的性能比最先进的PM哈希表cacheline - aware extensible hash (CCEH)高2.4倍。

{"title":"Recipe: converting concurrent DRAM indexes to persistent-memory indexes","authors":"Se Kwon Lee, Jayashree Mohan, Sanidhya Kashyap, Taesoo Kim, Vijay Chidambaram","doi":"10.1145/3341301.3359635","DOIUrl":"https://doi.org/10.1145/3341301.3359635","url":null,"abstract":"We present Recipe, a principled approach for converting concurrent DRAM indexes into crash-consistent indexes for persistent memory (PM). The main insight behind Recipe is that isolation provided by a certain class of concurrent in-memory indexes can be translated with small changes to crash-consistency when the same index is used in PM. We present a set of conditions that enable the identification of this class of DRAM indexes, and the actions to be taken to convert each index to be persistent. Based on these conditions and conversion actions, we modify five different DRAM indexes based on B+ trees, tries, radix trees, and hash tables to their crash-consistent PM counterparts. The effort involved in this conversion is minimal, requiring 30--200 lines of code. We evaluated the converted PM indexes on Intel DC Persistent Memory, and found that they outperform state-of-the-art, hand-crafted PM indexes in multi-threaded workloads by up-to 5.2x. For example, we built P-CLHT, our PM implementation of the CLHT hash table by modifying only 30 LOC. When running YCSB workloads, P-CLHT performs up to 2.4x better than Cacheline-Conscious Extendible Hashing (CCEH), the state-of-the-art PM hash table.","PeriodicalId":331561,"journal":{"name":"Proceedings of the 27th ACM Symposium on Operating Systems Principles","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132053023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 133

SplitFS: reducing software overhead in file systems for persistent memory SplitFS:减少持久内存在文件系统中的软件开销

Proceedings of the 27th ACM Symposium on Operating Systems Principles

Pub Date : 2019-09-23 DOI: 10.1145/3341301.3359631

Rohan Kadekodi, Se Kwon Lee, Sanidhya Kashyap, Taesoo Kim, Aasheesh Kolli, Vijay Chidambaram

We present SplitFS, a file system for persistent memory (PM) that reduces software overhead significantly compared to state-of-the-art PM file systems. SplitFS presents a novel split of responsibilities between a user-space library file system and an existing kernel PM file system. The user-space library file system handles data operations by intercepting POSIX calls, memory-mapping the underlying file, and serving the read and overwrites using processor loads and stores. Metadata operations are handled by the kernel PM file system (ext4 DAX). SplitFS introduces a new primitive termed relink to efficiently support file appends and atomic data operations. SplitFS provides three consistency modes, which different applications can choose from, without interfering with each other. SplitFS reduces software overhead by up-to 4x compared to the NOVA PM file system, and 17x compared to ext4 DAX. On a number of micro-benchmarks and applications such as the LevelDB key-value store running the YCSB benchmark, SplitFS increases application performance by up to 2x compared to ext4 DAX and NOVA while providing similar consistency guarantees.

我们介绍SplitFS，这是一种用于持久内存(PM)的文件系统，与最先进的PM文件系统相比，它显著降低了软件开销。SplitFS在用户空间库文件系统和现有内核PM文件系统之间提出了一种新颖的职责划分。用户空间库文件系统通过拦截POSIX调用、对底层文件进行内存映射以及使用处理器加载和存储服务读取和覆盖来处理数据操作。元数据操作由内核PM文件系统(ext4 DAX)处理。SplitFS引入了一种称为relink的新原语，以有效地支持文件追加和原子数据操作。SplitFS提供了三种一致性模式，不同的应用程序可以从中选择，而不会相互干扰。与NOVA PM文件系统相比，SplitFS最多减少了4倍的软件开销，与ext4 DAX文件系统相比减少了17倍。在许多微基准测试和应用程序上，例如运行YCSB基准测试的LevelDB键值存储，SplitFS将应用程序性能提高了ext4 DAX和NOVA的2倍，同时提供了类似的一致性保证。

{"title":"SplitFS: reducing software overhead in file systems for persistent memory","authors":"Rohan Kadekodi, Se Kwon Lee, Sanidhya Kashyap, Taesoo Kim, Aasheesh Kolli, Vijay Chidambaram","doi":"10.1145/3341301.3359631","DOIUrl":"https://doi.org/10.1145/3341301.3359631","url":null,"abstract":"We present SplitFS, a file system for persistent memory (PM) that reduces software overhead significantly compared to state-of-the-art PM file systems. SplitFS presents a novel split of responsibilities between a user-space library file system and an existing kernel PM file system. The user-space library file system handles data operations by intercepting POSIX calls, memory-mapping the underlying file, and serving the read and overwrites using processor loads and stores. Metadata operations are handled by the kernel PM file system (ext4 DAX). SplitFS introduces a new primitive termed relink to efficiently support file appends and atomic data operations. SplitFS provides three consistency modes, which different applications can choose from, without interfering with each other. SplitFS reduces software overhead by up-to 4x compared to the NOVA PM file system, and 17x compared to ext4 DAX. On a number of micro-benchmarks and applications such as the LevelDB key-value store running the YCSB benchmark, SplitFS increases application performance by up to 2x compared to ext4 DAX and NOVA while providing similar consistency guarantees.","PeriodicalId":331561,"journal":{"name":"Proceedings of the 27th ACM Symposium on Operating Systems Principles","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127883071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 143

Privacy accounting and quality control in the sage differentially private ML platform sage差分私有ML平台中的隐私会计和质量控制

Proceedings of the 27th ACM Symposium on Operating Systems Principles

Pub Date : 2019-09-04 DOI: 10.1145/3341301.3359639

Mathias Lécuyer, Riley Spahn, Kiran Vodrahalli, Roxana Geambasu, Daniel J. Hsu

Companies increasingly expose machine learning (ML) models trained over sensitive user data to untrusted domains, such as end-user devices and wide-access model stores. This creates a need to control the data's leakage through these models. We present Sage, a differentially private (DP) ML platform that bounds the cumulative leakage of training data through models. Sage builds upon the rich literature on DP ML algorithms and contributes pragmatic solutions to two of the most pressing systems challenges of global DP: running out of privacy budget and the privacy-utility tradeoff. To address the former, we develop block composition, a new privacy loss accounting method that leverages the growing database regime of ML workloads to keep training models endlessly on a sensitive data stream while enforcing a global DP guarantee for the stream. To address the latter, we develop privacy-adaptive training, a process that trains a model on growing amounts of data and/or with increasing privacy parameters until, with high probability, the model meets developer-configured quality criteria. Sage's methods are designed to integrate with TensorFlow-Extended, Google's open-source ML platform. They illustrate how a systems focus on characteristics of ML workloads enables pragmatic solutions that are not apparent when one focuses on individual algorithms, as most DP ML literature does.

越来越多的公司将在敏感用户数据上训练的机器学习(ML)模型暴露在不受信任的领域，例如最终用户设备和广泛访问的模型存储。这就需要控制通过这些模型泄漏的数据。我们提出了Sage，一个差分私有(DP)机器学习平台，通过模型限制训练数据的累积泄漏。Sage建立在DP ML算法的丰富文献基础上，并为全球DP的两个最紧迫的系统挑战提供了实用的解决方案:耗尽隐私预算和隐私效用权衡。为了解决前者，我们开发了块组合(block composition)，这是一种新的隐私损失核算方法，它利用ML工作负载不断增长的数据库机制，在对敏感数据流执行全局DP保证的同时，将训练模型无休止地保持在敏感数据流上。为了解决后者，我们开发了自适应隐私训练，这是一个在不断增长的数据量和/或不断增加的隐私参数上训练模型的过程，直到模型很可能满足开发人员配置的质量标准。Sage的方法旨在与TensorFlow-Extended(谷歌的开源机器学习平台)集成。它们说明了一个专注于机器学习工作负载特征的系统如何实现实用的解决方案，当一个人专注于单个算法时，这些解决方案并不明显，正如大多数DP ML文献所做的那样。

{"title":"Privacy accounting and quality control in the sage differentially private ML platform","authors":"Mathias Lécuyer, Riley Spahn, Kiran Vodrahalli, Roxana Geambasu, Daniel J. Hsu","doi":"10.1145/3341301.3359639","DOIUrl":"https://doi.org/10.1145/3341301.3359639","url":null,"abstract":"Companies increasingly expose machine learning (ML) models trained over sensitive user data to untrusted domains, such as end-user devices and wide-access model stores. This creates a need to control the data's leakage through these models. We present Sage, a differentially private (DP) ML platform that bounds the cumulative leakage of training data through models. Sage builds upon the rich literature on DP ML algorithms and contributes pragmatic solutions to two of the most pressing systems challenges of global DP: running out of privacy budget and the privacy-utility tradeoff. To address the former, we develop block composition, a new privacy loss accounting method that leverages the growing database regime of ML workloads to keep training models endlessly on a sensitive data stream while enforcing a global DP guarantee for the stream. To address the latter, we develop privacy-adaptive training, a process that trains a model on growing amounts of data and/or with increasing privacy parameters until, with high probability, the model meets developer-configured quality criteria. Sage's methods are designed to integrate with TensorFlow-Extended, Google's open-source ML platform. They illustrate how a systems focus on characteristics of ML workloads enables pragmatic solutions that are not apparent when one focuses on individual algorithms, as most DP ML literature does.","PeriodicalId":331561,"journal":{"name":"Proceedings of the 27th ACM Symposium on Operating Systems Principles","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125848913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Optimizing data-intensive computations in existing libraries with split annotations 使用拆分注释优化现有库中的数据密集型计算

Proceedings of the 27th ACM Symposium on Operating Systems Principles

Pub Date : 2018-10-29 DOI: 10.1145/3341301.3359652

Shoumik Palkar, M. Zaharia

Data movement between main memory and the CPU is a major bottleneck in parallel data-intensive applications. In response, researchers have proposed using compilers and intermediate representations (IRs) that apply optimizations such as loop fusion under existing high-level APIs such as NumPy and TensorFlow. Even though these techniques generally do not require changes to user applications, they require intrusive changes to the library itself: often, library developers must rewrite each function using a new IR. In this paper, we propose a new technique called split annotations (SAs) that enables key data movement optimizations over unmodified library functions. SAs only require developers to annotate functions and implement an API that specifies how to partition data in the library. The annotation and API describe how to enable cross-function data pipelining and parallelization, while respecting each function's correctness constraints. We implement a parallel runtime for SAs in a system called Mozart. We show that Mozart can accelerate workloads in libraries such as Intel MKL and Pandas by up to 15x, with no library modifications. Mozart also provides performance gains competitive with solutions that require rewriting libraries, and can sometimes outperform these systems by up to 2x by leveraging existing hand-optimized code.

在主存和CPU之间的数据移动是并行数据密集型应用程序的主要瓶颈。作为回应，研究人员建议使用编译器和中间表示(ir)，这些编译器和中间表示(ir)在现有的高级api(如NumPy和TensorFlow)下应用循环融合等优化。尽管这些技术通常不需要更改用户应用程序，但它们需要对库本身进行侵入性更改:通常，库开发人员必须使用新的IR重写每个函数。在本文中，我们提出了一种称为拆分注释(SAs)的新技术，它可以在未修改的库函数上实现关键数据移动优化。sa只要求开发人员注释函数并实现指定如何在库中划分数据的API。注释和API描述了如何启用跨功能的数据流水线和并行化，同时尊重每个函数的正确性约束。我们在一个叫做Mozart的系统中为sa实现一个并行运行时。我们表明，Mozart可以在不修改库的情况下，将Intel MKL和Pandas等库中的工作负载加速15倍。Mozart还提供了与需要重写库的解决方案相比具有竞争力的性能提升，并且通过利用现有的手动优化代码，有时可以比这些系统的性能高出2倍。

{"title":"Optimizing data-intensive computations in existing libraries with split annotations","authors":"Shoumik Palkar, M. Zaharia","doi":"10.1145/3341301.3359652","DOIUrl":"https://doi.org/10.1145/3341301.3359652","url":null,"abstract":"Data movement between main memory and the CPU is a major bottleneck in parallel data-intensive applications. In response, researchers have proposed using compilers and intermediate representations (IRs) that apply optimizations such as loop fusion under existing high-level APIs such as NumPy and TensorFlow. Even though these techniques generally do not require changes to user applications, they require intrusive changes to the library itself: often, library developers must rewrite each function using a new IR. In this paper, we propose a new technique called split annotations (SAs) that enables key data movement optimizations over unmodified library functions. SAs only require developers to annotate functions and implement an API that specifies how to partition data in the library. The annotation and API describe how to enable cross-function data pipelining and parallelization, while respecting each function's correctness constraints. We implement a parallel runtime for SAs in a system called Mozart. We show that Mozart can accelerate workloads in libraries such as Intel MKL and Pandas by up to 15x, with no library modifications. Mozart also provides performance gains competitive with solutions that require rewriting libraries, and can sometimes outperform these systems by up to 2x by leveraging existing hand-optimized code.","PeriodicalId":331561,"journal":{"name":"Proceedings of the 27th ACM Symposium on Operating Systems Principles","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115692135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Teechain

Proceedings of the 27th ACM Symposium on Operating Systems Principles

Pub Date : 2018-06-04 DOI: 10.1145/3211890.3211904

Joshua Lind, O. Naor, Ittay Eyal, Florian Kelbert, E. G. Sirer, P. Pietzuch

Blockchain protocols such as Bitcoin [4] exchange payments in a secure and decentralized manner, but their performance is limited due to their need to achieve consensus across a network [1], and each node in the network needs to process the entire blockchain, which introduces major storage limitations. Cryptographic payment channels [2, 5] have been proposed as a second tier on top of the blockchain, allowing efficient direct payments between parties, and the removal of many payments from the blockchain to only the participating parties of the channel. Existing payment channel protocols, however, have two limitations: (i) their security relies on synchronous access to the underlying blockchain, which an attacker may prevent; and (ii) they suffer from long channel establishment times when placing collateral deposits on the blockchain.

比特币[4]等区块链协议以安全和分散的方式交换支付，但由于需要在整个网络中达成共识[1]，并且网络中的每个节点需要处理整个区块链，因此其性能受到限制，这引入了主要的存储限制。加密支付通道[2,5]被提议作为区块链之上的第二层，允许各方之间有效的直接支付，并将许多支付从区块链中删除，仅给通道的参与方。然而，现有的支付通道协议有两个限制:(i)它们的安全性依赖于对底层区块链的同步访问，这可能会被攻击者阻止;(ii)在区块链上放置抵押存款时，他们的渠道建立时间很长。

引用次数: 47

Teechain: a secure payment network with asynchronous blockchain access Teechain:一个具有异步区块链访问的安全支付网络

Proceedings of the 27th ACM Symposium on Operating Systems Principles

Pub Date : 2017-07-18 DOI: 10.1145/3341301.3359627

Joshua Lind, O. Naor, Ittay Eyal, Florian Kelbert, E. G. Sirer, P. Pietzuch

Blockchains such as Bitcoin and Ethereum execute payment transactions securely, but their performance is limited by the need for global consensus. Payment networks overcome this limitation through off-chain transactions. Instead of writing to the blockchain for each transaction, they only settle the final payment balances with the underlying blockchain. When executing off-chain transactions in current payment networks, parties must access the blockchain within bounded time to detect misbehaving parties that deviate from the protocol. This opens a window for attacks in which a malicious party can steal funds by deliberately delaying other parties' blockchain access and prevents parties from using payment networks when disconnected from the blockchain. We present Teechain, the first layer-two payment network that executes off-chain transactions asynchronously with respect to the underlying blockchain. To prevent parties from misbehaving, Teechain uses treasuries, protected by hardware trusted execution environments (TEEs), to establish off-chain payment channels between parties. Treasuries maintain collateral funds and can exchange transactions efficiently and securely, without interacting with the underlying blockchain. To mitigate against treasury failures and to avoid having to trust all TEEs, Teechain replicates the state of treasuries using committee chains, a new variant of chain replication with threshold secret sharing. Teechain achieves at least a 33X higher transaction throughput than the state-of-the-art Lightning payment network. A 30-machine Teechain deployment can handle over 1 million Bitcoin transactions per second.

比特币和以太坊等区块链可以安全地执行支付交易，但它们的性能受到全球共识需求的限制。支付网络通过链下交易克服了这一限制。他们只与底层区块链结算最终支付余额，而不是为每笔交易写入区块链。在当前支付网络中执行链下交易时，各方必须在限定时间内访问区块链，以检测偏离协议的行为不端的各方。这为攻击打开了一个窗口，恶意方可以通过故意延迟其他方的区块链访问来窃取资金，并阻止各方在与区块链断开连接时使用支付网络。我们介绍了Teechain，这是第一个第二层支付网络，它可以相对于底层区块链异步执行链下交易。为了防止各方行为不端，Teechain使用由硬件可信执行环境(tee)保护的国库，在各方之间建立链下支付渠道。国债维持抵押资金，可以高效安全地交换交易，而无需与底层区块链交互。为了减轻财政失败并避免不得不信任所有tee, Teechain使用委员会链(一种具有阈值秘密共享的链复制的新变体)复制财政状态。与最先进的闪电支付网络相比，Teechain的交易吞吐量至少高出33倍。30台机器的techain部署每秒可以处理超过100万笔比特币交易。

{"title":"Teechain: a secure payment network with asynchronous blockchain access","authors":"Joshua Lind, O. Naor, Ittay Eyal, Florian Kelbert, E. G. Sirer, P. Pietzuch","doi":"10.1145/3341301.3359627","DOIUrl":"https://doi.org/10.1145/3341301.3359627","url":null,"abstract":"Blockchains such as Bitcoin and Ethereum execute payment transactions securely, but their performance is limited by the need for global consensus. Payment networks overcome this limitation through off-chain transactions. Instead of writing to the blockchain for each transaction, they only settle the final payment balances with the underlying blockchain. When executing off-chain transactions in current payment networks, parties must access the blockchain within bounded time to detect misbehaving parties that deviate from the protocol. This opens a window for attacks in which a malicious party can steal funds by deliberately delaying other parties' blockchain access and prevents parties from using payment networks when disconnected from the blockchain. We present Teechain, the first layer-two payment network that executes off-chain transactions asynchronously with respect to the underlying blockchain. To prevent parties from misbehaving, Teechain uses treasuries, protected by hardware trusted execution environments (TEEs), to establish off-chain payment channels between parties. Treasuries maintain collateral funds and can exchange transactions efficiently and securely, without interacting with the underlying blockchain. To mitigate against treasury failures and to avoid having to trust all TEEs, Teechain replicates the state of treasuries using committee chains, a new variant of chain replication with threshold secret sharing. Teechain achieves at least a 33X higher transaction throughput than the state-of-the-art Lightning payment network. A 30-machine Teechain deployment can handle over 1 million Bitcoin transactions per second.","PeriodicalId":331561,"journal":{"name":"Proceedings of the 27th ACM Symposium on Operating Systems Principles","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115566012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 117

Proceedings of the 27th ACM Symposium on Operating Systems Principles 第27届ACM操作系统原理研讨会论文集

Proceedings of the 27th ACM Symposium on Operating Systems Principles

Pub Date : 1977-11-16 DOI: 10.1145/3341301

S. Rosen, P. Denning

引用次数: 2

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 27th ACM Symposium on Operating Systems Principles

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀