We describe our ongoing research that aims to eliminate microarchitectural timing channels through time protection, which eliminates the root cause of these channels, competition for capacity-limited hardware resources. A proof-ofconcept implementation of time protection demonstrated the approach can be effective a nd l ow o verhead, b ut also that present hardware fails to support the approach in some aspects and that we need an improved hardXare-software contract to achieve real security. We have demonstrated that these mechanisms are not hard to provide, and are working on their inclusion in the RISC-V ISA. Assuming compliant hardware, we outline how we think we can then formally prove that timing channels are eliminated.
{"title":"Towards Provable Timing-Channel Prevention","authors":"G. Heiser, Toby C. Murray, G. Klein","doi":"10.1145/3421473.3421475","DOIUrl":"https://doi.org/10.1145/3421473.3421475","url":null,"abstract":"We describe our ongoing research that aims to eliminate microarchitectural timing channels through time protection, which eliminates the root cause of these channels, competition for capacity-limited hardware resources. A proof-ofconcept implementation of time protection demonstrated the approach can be effective a nd l ow o verhead, b ut also that present hardware fails to support the approach in some aspects and that we need an improved hardXare-software contract to achieve real security. We have demonstrated that these mechanisms are not hard to provide, and are working on their inclusion in the RISC-V ISA. Assuming compliant hardware, we outline how we think we can then formally prove that timing channels are eliminated.","PeriodicalId":38935,"journal":{"name":"Operating Systems Review (ACM)","volume":"54 1","pages":"1 - 7"},"PeriodicalIF":0.0,"publicationDate":"2020-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3421473.3421475","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47151735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Srinath T. V. Setty, Sebastian Angel, Jonathan Lee
This article describes recent progress in realizing verifiable state machines, a primitive that enables untrusted services to provide cryptographic proofs that they operate correctly. Applications of this primitive range from proving the correct operation of distributed and concurrent cloud services to reducing blockchain transaction costs by leveraging inexpensive off-chain computation without trust.
{"title":"Verifiable state machines","authors":"Srinath T. V. Setty, Sebastian Angel, Jonathan Lee","doi":"10.1145/3421473.3421479","DOIUrl":"https://doi.org/10.1145/3421473.3421479","url":null,"abstract":"This article describes recent progress in realizing verifiable state machines, a primitive that enables untrusted services to provide cryptographic proofs that they operate correctly. Applications of this primitive range from proving the correct operation of distributed and concurrent cloud services to reducing blockchain transaction costs by leveraging inexpensive off-chain computation without trust.","PeriodicalId":38935,"journal":{"name":"Operating Systems Review (ACM)","volume":"54 1","pages":"40 - 46"},"PeriodicalIF":0.0,"publicationDate":"2020-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3421473.3421479","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46931109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Today’s computing ecosystem, comprising commodity heterogeneous interconnected computing (CHIC) platforms, is increasingly being employed for critical applications, consequently demanding fairly strong end-to-end assurances. However, the generality and system complexity of today’s CHIC stack seem to outpace existing tools and methodologies towards provable end-to-end guarantees. This paper describes our on-going research, and presents überSpark†, a system architecture that argues for structuring the CHIC stack around Universal Object Abstractions (üobjects), a fundamental system abstraction and building block towards practical and provable end-to-end guarantees. überSpark is designed to be realizable on heterogeneous hardware platforms with disparate capabilities, and facilitates compositional end-to-end reasoning and efficient implementation. überSpark also supports the use of multiple verification techniques towards properties of different flavors, for development compatible, incremental verification, co-existing and meshing with unverified components, at a fine granularity, and wide applicability to all layers of the CHIC stack. We discuss the CHIC stack challenges, illustrate our design decisions, describe the überSpark architecture, present our foundational steps, and outline on-going and future research activities. We anticipate überSpark to retrofit and unlock a wide range of unprecedented end-to-end provable guarantees on today’s continuously evolving CHIC stack.
{"title":"überSpark: Practical, Provable, End-to-End Guarantees on Commodity Heterogenous Interconnected Computing Platforms","authors":"Amit Vasudevan, Petros Maniatis, R. Martins","doi":"10.1145/3421473.3421476","DOIUrl":"https://doi.org/10.1145/3421473.3421476","url":null,"abstract":"Today’s computing ecosystem, comprising commodity heterogeneous interconnected computing (CHIC) platforms, is increasingly being employed for critical applications, consequently demanding fairly strong end-to-end assurances. However, the generality and system complexity of today’s CHIC stack seem to outpace existing tools and methodologies towards provable end-to-end guarantees. This paper describes our on-going research, and presents überSpark†, a system architecture that argues for structuring the CHIC stack around Universal Object Abstractions (üobjects), a fundamental system abstraction and building block towards practical and provable end-to-end guarantees. überSpark is designed to be realizable on heterogeneous hardware platforms with disparate capabilities, and facilitates compositional end-to-end reasoning and efficient implementation. überSpark also supports the use of multiple verification techniques towards properties of different flavors, for development compatible, incremental verification, co-existing and meshing with unverified components, at a fine granularity, and wide applicability to all layers of the CHIC stack. We discuss the CHIC stack challenges, illustrate our design decisions, describe the überSpark architecture, present our foundational steps, and outline on-going and future research activities. We anticipate überSpark to retrofit and unlock a wide range of unprecedented end-to-end provable guarantees on today’s continuously evolving CHIC stack.","PeriodicalId":38935,"journal":{"name":"Operating Systems Review (ACM)","volume":"54 1","pages":"8-22"},"PeriodicalIF":0.0,"publicationDate":"2020-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3421473.3421476","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64034127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luke Nelson, James Bornholt, A. Krishnamurthy, E. Torlak, Xi Wang
This paper presents an analysis of noninterference specifications used in a range of formally verified systems. The main findings are that these systems use distinct specifications and that they often employ small variations, both complicating their security implications. We categorize these specifications and discuss their trade-offs for reasoning about information flows in systems.
{"title":"Noninterference specifications for secure systems","authors":"Luke Nelson, James Bornholt, A. Krishnamurthy, E. Torlak, Xi Wang","doi":"10.1145/3421473.3421478","DOIUrl":"https://doi.org/10.1145/3421473.3421478","url":null,"abstract":"This paper presents an analysis of noninterference specifications used in a range of formally verified systems. The main findings are that these systems use distinct specifications and that they often employ small variations, both complicating their security implications. We categorize these specifications and discuss their trade-offs for reasoning about information flows in systems.","PeriodicalId":38935,"journal":{"name":"Operating Systems Review (ACM)","volume":"54 1","pages":"31 - 39"},"PeriodicalIF":0.0,"publicationDate":"2020-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3421473.3421478","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47621083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. McKenney, Joel Fernandes, Silas Boyd-Wickizer, J. Walpole
Read-copy update (RCU) is a scalable high-performance synchronization mechanism implemented in the Linux kernel. RCU's novel properties include support for concurrent forward progress for readers and writers as well as highly optimized inter-CPU synchronization. RCU was introduced into the Linux kernel eighteen years ago and most subsystems now use RCU. This paper discusses the requirements that drove the development of RCU, the design and API of the Linux RCU implementation, and how kernel developers apply RCU.
{"title":"RCU Usage In the Linux Kernel","authors":"P. McKenney, Joel Fernandes, Silas Boyd-Wickizer, J. Walpole","doi":"10.1145/3421473.3421481","DOIUrl":"https://doi.org/10.1145/3421473.3421481","url":null,"abstract":"Read-copy update (RCU) is a scalable high-performance synchronization mechanism implemented in the Linux kernel. RCU's novel properties include support for concurrent forward progress for readers and writers as well as highly optimized inter-CPU synchronization. RCU was introduced into the Linux kernel eighteen years ago and most subsystems now use RCU. This paper discusses the requirements that drove the development of RCU, the design and API of the Linux RCU implementation, and how kernel developers apply RCU.","PeriodicalId":38935,"journal":{"name":"Operating Systems Review (ACM)","volume":"54 1","pages":"47 - 63"},"PeriodicalIF":0.0,"publicationDate":"2020-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3421473.3421481","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43259733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kostas Ferles, Jacob Van Geffen, Işıl Dillig, Y. Smaragdakis
Explicit signaling between threads is a perennial cause of bugs in concurrent programs. While there are several runtime techniques to automatically notify threads upon the availability of some shared resource, such techniques are not widely-adopted due to their run-time overhead. This paper proposes a new solution based on static analysis for automatically generating a performant explicit-signal program from its corresponding implicit-signal implementation. The key idea is to generate verification conditions that allow us to minimize the number of required signals and unnecessary context switches, while guaranteeing semantic equivalence between the source and target programs. We have implemented our method in a tool called Expresso and evaluate it on challenging benchmarks from prior papers and open-source software. Expresso-generated code significantly outperforms past automatic signaling mechanisms (avg. 1.56x speedup) and closely matches the performance of hand-optimized explicit-signal code.
{"title":"Symbolic Reasoning for Automatic Signal Placement","authors":"Kostas Ferles, Jacob Van Geffen, Işıl Dillig, Y. Smaragdakis","doi":"10.1145/3421473.3421482","DOIUrl":"https://doi.org/10.1145/3421473.3421482","url":null,"abstract":"Explicit signaling between threads is a perennial cause of bugs in concurrent programs. While there are several runtime techniques to automatically notify threads upon the availability of some shared resource, such techniques are not widely-adopted due to their run-time overhead. This paper proposes a new solution based on static analysis for automatically generating a performant explicit-signal program from its corresponding implicit-signal implementation. The key idea is to generate verification conditions that allow us to minimize the number of required signals and unnecessary context switches, while guaranteeing semantic equivalence between the source and target programs. We have implemented our method in a tool called Expresso and evaluate it on challenging benchmarks from prior papers and open-source software. Expresso-generated code significantly outperforms past automatic signaling mechanisms (avg. 1.56x speedup) and closely matches the performance of hand-optimized explicit-signal code.","PeriodicalId":38935,"journal":{"name":"Operating Systems Review (ACM)","volume":"54 1","pages":"64 - 76"},"PeriodicalIF":0.0,"publicationDate":"2020-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3421473.3421482","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42996801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Graph analytics systems must analyze graphs with billions of vertices and edges which require several terabytes of storage. Distributed-memory clusters are often used for analyzing such large graphs since the main memory of a single machine is usually restricted to a few hundreds of gigabytes. This requires partitioning the graph among the machines in the cluster. Existing graph analytics systems use a built-in partitioner that incorporates a particular partitioning policy, but the best policy is dependent on the algorithm, input graph, and platform. Therefore, built-in partitioners are not sufficiently flexible. Stand-alone graph partitioners are available, but they too implement only a few policies. CuSP is a fast streaming edge partitioning framework which permits users to specify the desired partitioning policy at a high level of abstraction and quickly generates highquality graph partitions. For example, it can partition wdc12, the largest publicly available web-crawl graph with 4 billion vertices and 129 billion edges, in under 2 minutes for clusters with 128 machines. Our experiments show that it can produce quality partitions 6× faster on average than the state-of-theart stand-alone partitioner in the literature while supporting a wider range of partitioning policies.
{"title":"CuSP","authors":"Loc Hoang, Roshan Dathathri, G. Gill, K. Pingali","doi":"10.1145/3469379.3469385","DOIUrl":"https://doi.org/10.1145/3469379.3469385","url":null,"abstract":"Graph analytics systems must analyze graphs with billions of vertices and edges which require several terabytes of storage. Distributed-memory clusters are often used for analyzing such large graphs since the main memory of a single machine is usually restricted to a few hundreds of gigabytes. This requires partitioning the graph among the machines in the cluster. Existing graph analytics systems use a built-in partitioner that incorporates a particular partitioning policy, but the best policy is dependent on the algorithm, input graph, and platform. Therefore, built-in partitioners are not sufficiently flexible. Stand-alone graph partitioners are available, but they too implement only a few policies. CuSP is a fast streaming edge partitioning framework which permits users to specify the desired partitioning policy at a high level of abstraction and quickly generates highquality graph partitions. For example, it can partition wdc12, the largest publicly available web-crawl graph with 4 billion vertices and 129 billion edges, in under 2 minutes for clusters with 128 machines. Our experiments show that it can produce quality partitions 6× faster on average than the state-of-theart stand-alone partitioner in the literature while supporting a wider range of partitioning policies.","PeriodicalId":38935,"journal":{"name":"Operating Systems Review (ACM)","volume":"55 1","pages":"47 - 60"},"PeriodicalIF":0.0,"publicationDate":"2020-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3469379.3469385","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43337964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recommendation of items to users is a problem faced by many companies in a wide spectrum of industries. This problem was traditionally approached in a one-shot manner, such as recommending movies to users based on all the movie ratings observed so far. The evolution of user activity over time was relatively unexplored. This paper presents a Machine Learning approach developed at Box Inc. for making repeated recommendations of files to users in a collaborative work environment. Our results on historical data show that this approach noticeably outperforms the approach currently implemented at Box and also the traditional Matrix Factorization approach.
{"title":"A Machine Learning Approach to Recommending Files in a Collaborative Work Environment","authors":"D. Vengerov, Sesh Jalagam","doi":"10.1145/3352020.3352028","DOIUrl":"https://doi.org/10.1145/3352020.3352028","url":null,"abstract":"Recommendation of items to users is a problem faced by many companies in a wide spectrum of industries. This problem was traditionally approached in a one-shot manner, such as recommending movies to users based on all the movie ratings observed so far. The evolution of user activity over time was relatively unexplored. This paper presents a Machine Learning approach developed at Box Inc. for making repeated recommendations of files to users in a collaborative work environment. Our results on historical data show that this approach noticeably outperforms the approach currently implemented at Box and also the traditional Matrix Factorization approach.","PeriodicalId":38935,"journal":{"name":"Operating Systems Review (ACM)","volume":"53 1","pages":"46 - 51"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3352020.3352028","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43291677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mathias Lécuyer, Riley Spahn, Kiran Vodrahalli, Roxana Geambasu, Daniel J. Hsu
We present Sage, the first ML platform that enforces a global differential privacy (DP) guarantee across all models produced from a sensitive data stream. Sage extends the Tensorflow-Extended ML platform with novel mechanisms and DP theory to address operational challenges that arise from incorporating DP into ML training processes. First, to avoid the typical problem with DP systems of "running out of privacy budget" after a pre-established number of training processes, we develop block composition. It is a new DP composition theory that leverages the time-bounded structure of training processes to keep training models endlessly on a sensitive data stream while enforcing event-level DP on the stream. Second, to control the quality of ML models produced by Sage, we develop a novel iterative training process that trains a model on increasing amounts of data from a stream until, with high probability, the model meets developer-configured quality criteria.
{"title":"Privacy Accounting and Quality Control in the Sage Differentially Private ML Platform","authors":"Mathias Lécuyer, Riley Spahn, Kiran Vodrahalli, Roxana Geambasu, Daniel J. Hsu","doi":"10.1145/3352020.3352032","DOIUrl":"https://doi.org/10.1145/3352020.3352032","url":null,"abstract":"We present Sage, the first ML platform that enforces a global differential privacy (DP) guarantee across all models produced from a sensitive data stream. Sage extends the Tensorflow-Extended ML platform with novel mechanisms and DP theory to address operational challenges that arise from incorporating DP into ML training processes. First, to avoid the typical problem with DP systems of \"running out of privacy budget\" after a pre-established number of training processes, we develop block composition. It is a new DP composition theory that leverages the time-bounded structure of training processes to keep training models endlessly on a sensitive data stream while enforcing event-level DP on the stream. Second, to control the quality of ML models produced by Sage, we develop a novel iterative training process that trains a model on increasing amounts of data from a stream until, with high probability, the model meets developer-configured quality criteria.","PeriodicalId":38935,"journal":{"name":"Operating Systems Review (ACM)","volume":"53 1","pages":"75 - 84"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3352020.3352032","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41458896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
While decision-makings in systems are commonly solved with explicit rules and heuristics, machine learning (ML) and deep learning (DL) have been driving a paradigm shift in modern system design. Based on our decade of experience in operationalizing a large production cloud system, Web Search, learning fills the gap in comprehending and taming the system design and operation complexity. However, rather than just improving specific ML/DL algorithms or system features, we posit that the key to unlocking the full potential of learning-augmented systems is a principled methodology promoting learning-and-system co-design. On this basis, we present the AutoSys, a common framework for the development of learning-augmented systems.
{"title":"The Case for Learning-and-System Co-design","authors":"C. Liang, Hui Xue, Mao Yang, Lidong Zhou","doi":"10.1145/3352020.3352031","DOIUrl":"https://doi.org/10.1145/3352020.3352031","url":null,"abstract":"While decision-makings in systems are commonly solved with explicit rules and heuristics, machine learning (ML) and deep learning (DL) have been driving a paradigm shift in modern system design. Based on our decade of experience in operationalizing a large production cloud system, Web Search, learning fills the gap in comprehending and taming the system design and operation complexity. However, rather than just improving specific ML/DL algorithms or system features, we posit that the key to unlocking the full potential of learning-augmented systems is a principled methodology promoting learning-and-system co-design. On this basis, we present the AutoSys, a common framework for the development of learning-augmented systems.","PeriodicalId":38935,"journal":{"name":"Operating Systems Review (ACM)","volume":"53 1","pages":"68 - 74"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3352020.3352031","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45412227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}