Nathaniel Morris, Christopher Stewart, R. Birke, L. Chen, Jaimie Kelley
Ever tightening power caps constrain the sustained processing speed of modern processors. With computational sprinting, processors reserve a small power budget that can be used to increase processing speed for short bursts. Computational sprinting speeds up query executions that would otherwise yield slow response time. Common mechanisms used for sprinting include DVFS, core scaling, CPU throttling and application-specific accelerators.
{"title":"Early work on modeling computational sprinting","authors":"Nathaniel Morris, Christopher Stewart, R. Birke, L. Chen, Jaimie Kelley","doi":"10.1145/3127479.3132691","DOIUrl":"https://doi.org/10.1145/3127479.3132691","url":null,"abstract":"Ever tightening power caps constrain the sustained processing speed of modern processors. With computational sprinting, processors reserve a small power budget that can be used to increase processing speed for short bursts. Computational sprinting speeds up query executions that would otherwise yield slow response time. Common mechanisms used for sprinting include DVFS, core scaling, CPU throttling and application-specific accelerators.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84195562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jaehyun Nam, Hyeonseong Jo, Yeonkeun Kim, Phillip A. Porras, V. Yegneswaran, Seungwon Shin
We design Barista, as a new framework that seeks to enable flexible and customizable instantiations of network operating systems (NOSs) supporting diverse design choices, using two key features that harmonize architectural differences across design choices: component synthesis and dynamic event control. With these capabilities, Barista operators to easily enable functionalities and dynamically adjust the control flows among those functionalities.
{"title":"Bridging the architectural gap between NOS design principles in software-defined networks","authors":"Jaehyun Nam, Hyeonseong Jo, Yeonkeun Kim, Phillip A. Porras, V. Yegneswaran, Seungwon Shin","doi":"10.1145/3127479.3132567","DOIUrl":"https://doi.org/10.1145/3127479.3132567","url":null,"abstract":"We design Barista, as a new framework that seeks to enable flexible and customizable instantiations of network operating systems (NOSs) supporting diverse design choices, using two key features that harmonize architectural differences across design choices: component synthesis and dynamic event control. With these capabilities, Barista operators to easily enable functionalities and dynamically adjust the control flows among those functionalities.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84552602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Savvas Savvides, J. Stephen, Masoud Saeida Ardekani, V. Sundaram, P. Eugster
Cloud computing offers a cost-efficient data analytics platform. However, due to the sensitive nature of data, many organizations are reluctant to analyze their data in public clouds. Both software-based and hardware-based solutions have been proposed to address the stalemate, yet all have substantial limitations. We observe that a main issue cutting across all solutions is that they attempt to support confidentiality in data queries in a way transparent to queries. We propose the novel abstraction of secure data types with corresponding annotations for programmers to conveniently denote constraints relevant to security. These abstractions are leveraged by novel compilation techniques in our system Cuttlefish to compute data analytics queries in public cloud infrastructures while keeping sensitive data confidential. Cuttlefish encrypts all sensitive data residing in the cloud and employs partially homomorphic encryption schemes to perform operations securely, resorting however to client-side completion, re-encryption, or secure hardware-based re-encryption based on Intel's SGX when available based on a novel planner engine. Our evaluation shows that our prototype can execute all queries in standard benchmarks such as TPC-H and TPC-DS with an average overhead of 2.34× and 1.69× respectively compared to a plaintext execution that reveals all data.
{"title":"Secure data types: a simple abstraction for confidentiality-preserving data analytics","authors":"Savvas Savvides, J. Stephen, Masoud Saeida Ardekani, V. Sundaram, P. Eugster","doi":"10.1145/3127479.3129256","DOIUrl":"https://doi.org/10.1145/3127479.3129256","url":null,"abstract":"Cloud computing offers a cost-efficient data analytics platform. However, due to the sensitive nature of data, many organizations are reluctant to analyze their data in public clouds. Both software-based and hardware-based solutions have been proposed to address the stalemate, yet all have substantial limitations. We observe that a main issue cutting across all solutions is that they attempt to support confidentiality in data queries in a way transparent to queries. We propose the novel abstraction of secure data types with corresponding annotations for programmers to conveniently denote constraints relevant to security. These abstractions are leveraged by novel compilation techniques in our system Cuttlefish to compute data analytics queries in public cloud infrastructures while keeping sensitive data confidential. Cuttlefish encrypts all sensitive data residing in the cloud and employs partially homomorphic encryption schemes to perform operations securely, resorting however to client-side completion, re-encryption, or secure hardware-based re-encryption based on Intel's SGX when available based on a novel planner engine. Our evaluation shows that our prototype can execute all queries in standard benchmarks such as TPC-H and TPC-DS with an average overhead of 2.34× and 1.69× respectively compared to a plaintext execution that reveals all data.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84664636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Traub, S. Breß, T. Rabl, Asterios Katsifodimos, V. Markl
Real-time sensor data enables diverse applications such as smart metering, traffic monitoring, and sport analysis. In the Internet of Things, billions of sensor nodes form a sensor cloud and offer data streams to analysis systems. However, it is impossible to transfer all available data with maximal frequencies to all applications. Therefore, we need to tailor data streams to the demand of applications. We contribute a technique that optimizes communication costs while maintaining the desired accuracy. Our technique schedules reads across huge amounts of sensors based on the data-demands of a huge amount of concurrent queries. We introduce user-defined sampling functions that define the data-demand of queries and facilitate various adaptive sampling techniques, which decrease the amount of transferred data. Moreover, we share sensor reads and data transfers among queries. Our experiments with real-world data show that our approach saves up to 87% in data transmissions.
{"title":"Optimized on-demand data streaming from sensor nodes","authors":"J. Traub, S. Breß, T. Rabl, Asterios Katsifodimos, V. Markl","doi":"10.1145/3127479.3131621","DOIUrl":"https://doi.org/10.1145/3127479.3131621","url":null,"abstract":"Real-time sensor data enables diverse applications such as smart metering, traffic monitoring, and sport analysis. In the Internet of Things, billions of sensor nodes form a sensor cloud and offer data streams to analysis systems. However, it is impossible to transfer all available data with maximal frequencies to all applications. Therefore, we need to tailor data streams to the demand of applications. We contribute a technique that optimizes communication costs while maintaining the desired accuracy. Our technique schedules reads across huge amounts of sensors based on the data-demands of a huge amount of concurrent queries. We introduce user-defined sampling functions that define the data-demand of queries and facilitate various adaptive sampling techniques, which decrease the amount of transferred data. Moreover, we share sensor reads and data transfers among queries. Our experiments with real-world data show that our approach saves up to 87% in data transmissions.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88724443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Viktor Rosenfeld, René Müller, Pınar Tözün, Fatma Özcan
Many popular big data analytics systems today make liberal use of user-defined functions (UDFs) in their programming interface and are written in languages based on the Java Virtual Machine (JVM). This combination creates a barrier when we want to integrate processing engines written in a language that compiles down to machine code with a JVM-based big data analytics ecosystem. In this paper, we investigate efficient ways of executing UDFs written in Java inside a data processing engine written in C++. While it is possible to call Java code from machine code via the Java Native Interface (JNI), a naive implementation that applies the UDF one row at a time incurs a significant overhead, up to an order of magnitude. Instead, we can significantly reduce the costs of JNI calls and data copies between Java and machine code, if we execute UDFs on batches of rows, and reuse input/output buffers when possible. Our evaluation of these techniques using different scalar UDFs, in a prototype system that combines Spark and a columnar data processing engine written in C++, shows that such a combination does not slow down the execution of SparkSQL queries containing such UDFs. In fact, we find that the execution of Java UDFs inside an embedded JVM in our C++ engine is 1.12X to 1.53X faster than executing in Spark alone. Our analysis also shows that compiling Java UDFs directly into machine code is not always beneficial over strided execution in the JVM.
{"title":"Processing Java UDFs in a C++ environment","authors":"Viktor Rosenfeld, René Müller, Pınar Tözün, Fatma Özcan","doi":"10.1145/3127479.3132022","DOIUrl":"https://doi.org/10.1145/3127479.3132022","url":null,"abstract":"Many popular big data analytics systems today make liberal use of user-defined functions (UDFs) in their programming interface and are written in languages based on the Java Virtual Machine (JVM). This combination creates a barrier when we want to integrate processing engines written in a language that compiles down to machine code with a JVM-based big data analytics ecosystem. In this paper, we investigate efficient ways of executing UDFs written in Java inside a data processing engine written in C++. While it is possible to call Java code from machine code via the Java Native Interface (JNI), a naive implementation that applies the UDF one row at a time incurs a significant overhead, up to an order of magnitude. Instead, we can significantly reduce the costs of JNI calls and data copies between Java and machine code, if we execute UDFs on batches of rows, and reuse input/output buffers when possible. Our evaluation of these techniques using different scalar UDFs, in a prototype system that combines Spark and a columnar data processing engine written in C++, shows that such a combination does not slow down the execution of SparkSQL queries containing such UDFs. In fact, we find that the execution of Java UDFs inside an embedded JVM in our C++ engine is 1.12X to 1.53X faster than executing in Spark alone. Our analysis also shows that compiling Java UDFs directly into machine code is not always beneficial over strided execution in the JVM.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76177865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chenggang Wu, Jose M. Faleiro, Yihan Lin, J. Hellerstein
Early iterations of datacenter-scale computing were a reaction to the expensive multiprocessors and supercomputers of their day. They were built on clusters of commodity hardware, which at the time were packages with 2--4 CPUs. However, as datacenter-scale computing has matured, cloud vendors have provided denser, more powerful hardware. Today's cloud infrastructure aims to deliver not only reliable and cost-effective computing, but also excellent performance.
{"title":"Indy: a software system for the dense cloud","authors":"Chenggang Wu, Jose M. Faleiro, Yihan Lin, J. Hellerstein","doi":"10.1145/3127479.3134429","DOIUrl":"https://doi.org/10.1145/3127479.3134429","url":null,"abstract":"Early iterations of datacenter-scale computing were a reaction to the expensive multiprocessors and supercomputers of their day. They were built on clusters of commodity hardware, which at the time were packages with 2--4 CPUs. However, as datacenter-scale computing has matured, cloud vendors have provided denser, more powerful hardware. Today's cloud infrastructure aims to deliver not only reliable and cost-effective computing, but also excellent performance.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73382885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chen Yang, Qi Guo, Xiaofeng Meng, Rihui Xin, Chunkai Wang
Big data systems for large-scale data processing are now in widespread use. To improve their performance, both academia and industry have expended a great deal of effort in the analysis of performance bottlenecks. Most big data systems, as Hadoop and Spark, allow distributed computing across clusters. As a result, the execution of systems always parallelizes the use of the CPU, memory, disk and network. If a given resource has the greatest limiting impact on performance, systems will be bottlenecked on it. For a system designer, it is effective for the improvement of performance to tune the bottleneck resource. The key point for the aforementioned scenario is how to determine the bottleneck resource. The nature clue is to quantify the impact of the four major components and identify one causing the greatest impact factor as the bottleneck resource.
{"title":"Revisiting performance in big data systems: an resource decoupling approach","authors":"Chen Yang, Qi Guo, Xiaofeng Meng, Rihui Xin, Chunkai Wang","doi":"10.1145/3127479.3132685","DOIUrl":"https://doi.org/10.1145/3127479.3132685","url":null,"abstract":"Big data systems for large-scale data processing are now in widespread use. To improve their performance, both academia and industry have expended a great deal of effort in the analysis of performance bottlenecks. Most big data systems, as Hadoop and Spark, allow distributed computing across clusters. As a result, the execution of systems always parallelizes the use of the CPU, memory, disk and network. If a given resource has the greatest limiting impact on performance, systems will be bottlenecked on it. For a system designer, it is effective for the improvement of performance to tune the bottleneck resource. The key point for the aforementioned scenario is how to determine the bottleneck resource. The nature clue is to quantify the impact of the four major components and identify one causing the greatest impact factor as the bottleneck resource.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"2012 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73596441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The increasing interest in the Internet-of-Things (IoT) suggests that a new source of big data is imminent---the machines and sensors in the IoT ecosystem. The fundamental characteristic of the data produced by these sources is that they are inherently geospatial in nature. In addition, they exhibit unprecedented and unpredictable skews. Thus, big data systems designed for IoT applications must be able to efficiently ingest, index and query spatial data having heavy and unpredictable skews. Spatial indexing is well explored area of research in literature, but little attention has been given to the topic of efficient distributed spatial indexing. In this paper, we propose Sift, a distributed spatial index and its implementation. Unlike systems that depend on load balancing mechanisms that kick-in post ingestion, Sift tries to distribute the incoming data along the distributed structure at indexing time and thus incurs minimal rebalancing overhead. Sift depends only on an underlying key-value store, hence is implementable in many existing big data stores. Our evaluations of Sift on a popular open source data store show promising results---Sift achieves up to 8× reduction in indexing overhead while simultaneously reducing the query latency and index size by over 2× and 3× respectively, in a distributed environment compared to the state-of-the-art.
{"title":"A scalable distributed spatial index for the internet-of-things","authors":"A. Iyer, I. Stoica","doi":"10.1145/3127479.3132254","DOIUrl":"https://doi.org/10.1145/3127479.3132254","url":null,"abstract":"The increasing interest in the Internet-of-Things (IoT) suggests that a new source of big data is imminent---the machines and sensors in the IoT ecosystem. The fundamental characteristic of the data produced by these sources is that they are inherently geospatial in nature. In addition, they exhibit unprecedented and unpredictable skews. Thus, big data systems designed for IoT applications must be able to efficiently ingest, index and query spatial data having heavy and unpredictable skews. Spatial indexing is well explored area of research in literature, but little attention has been given to the topic of efficient distributed spatial indexing. In this paper, we propose Sift, a distributed spatial index and its implementation. Unlike systems that depend on load balancing mechanisms that kick-in post ingestion, Sift tries to distribute the incoming data along the distributed structure at indexing time and thus incurs minimal rebalancing overhead. Sift depends only on an underlying key-value store, hence is implementable in many existing big data stores. Our evaluations of Sift on a popular open source data store show promising results---Sift achieves up to 8× reduction in indexing overhead while simultaneously reducing the query latency and index size by over 2× and 3× respectively, in a distributed environment compared to the state-of-the-art.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86142934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The cloud servers have routinely adopted machine virtualization for high energy efficiency. Such virtualization notably improves energy efficiency not only through consolidation, but also through Dynamic Voltage/Frequency Scaling (DVFS). Thus, current hypervisors such as Xen and KVM support power management (PM) policies statically or dynamically setting a Voltage/Frequency (V/F) level, similar to ones deployed by the Linux. However, the current hypervisors can promote only a single PM policy (i.e., host governor) per physical core. This poses a unique challenge for VMs sharing a physical core and running applications with opposite runtime characteristics in a time-shared manner (i.e., heterogeneous VMs); note that the consolidation policy often encourages heterogeneous VMs to share a physical core, since such VMs use different resources in the system [2].
{"title":"Janus: supporting heterogeneous power management in virtualized environments","authors":"Daehoon Kim, Mohammad Alian, Jaehyuk Huh, N. Kim","doi":"10.1145/3127479.3132566","DOIUrl":"https://doi.org/10.1145/3127479.3132566","url":null,"abstract":"The cloud servers have routinely adopted machine virtualization for high energy efficiency. Such virtualization notably improves energy efficiency not only through consolidation, but also through Dynamic Voltage/Frequency Scaling (DVFS). Thus, current hypervisors such as Xen and KVM support power management (PM) policies statically or dynamically setting a Voltage/Frequency (V/F) level, similar to ones deployed by the Linux. However, the current hypervisors can promote only a single PM policy (i.e., host governor) per physical core. This poses a unique challenge for VMs sharing a physical core and running applications with opposite runtime characteristics in a time-shared manner (i.e., heterogeneous VMs); note that the consolidation policy often encourages heterogeneous VMs to share a physical core, since such VMs use different resources in the system [2].","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"48 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72593480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thomas Pasquier, Xueyuan Han, Mark Goldstein, Thomas Moyer, D. Eyers, M. Seltzer, J. Bacon
Data provenance describes how data came to be in its present form. It includes data sources and the transformations that have been applied to them. Data provenance has many uses, from forensics and security to aiding the reproducibility of scientific experiments. We present CamFlow, a whole-system provenance capture mechanism that integrates easily into a PaaS offering. While there have been several prior whole-system provenance systems that captured a comprehensive, systemic and ubiquitous record of a system's behavior, none have been widely adopted. They either A) impose too much overhead, B) are designed for long-outdated kernel releases and are hard to port to current systems, C) generate too much data, or D) are designed for a single system. CamFlow addresses these shortcoming by: 1) leveraging the latest kernel design advances to achieve efficiency; 2) using a self-contained, easily maintainable implementation relying on a Linux Security Module, NetFilter, and other existing kernel facilities; 3) providing a mechanism to tailor the captured provenance data to the needs of the application; and 4) making it easy to integrate provenance across distributed systems. The provenance we capture is streamed and consumed by tenant-built auditor applications. We illustrate the usability of our implementation by describing three such applications: demonstrating compliance with data regulations; performing fault/intrusion detection; and implementing data loss prevention. We also show how CamFlow can be leveraged to capture meaningful provenance without modifying existing applications.
{"title":"Practical whole-system provenance capture","authors":"Thomas Pasquier, Xueyuan Han, Mark Goldstein, Thomas Moyer, D. Eyers, M. Seltzer, J. Bacon","doi":"10.1145/3127479.3129249","DOIUrl":"https://doi.org/10.1145/3127479.3129249","url":null,"abstract":"Data provenance describes how data came to be in its present form. It includes data sources and the transformations that have been applied to them. Data provenance has many uses, from forensics and security to aiding the reproducibility of scientific experiments. We present CamFlow, a whole-system provenance capture mechanism that integrates easily into a PaaS offering. While there have been several prior whole-system provenance systems that captured a comprehensive, systemic and ubiquitous record of a system's behavior, none have been widely adopted. They either A) impose too much overhead, B) are designed for long-outdated kernel releases and are hard to port to current systems, C) generate too much data, or D) are designed for a single system. CamFlow addresses these shortcoming by: 1) leveraging the latest kernel design advances to achieve efficiency; 2) using a self-contained, easily maintainable implementation relying on a Linux Security Module, NetFilter, and other existing kernel facilities; 3) providing a mechanism to tailor the captured provenance data to the needs of the application; and 4) making it easy to integrate provenance across distributed systems. The provenance we capture is streamed and consumed by tenant-built auditor applications. We illustrate the usability of our implementation by describing three such applications: demonstrating compliance with data regulations; performing fault/intrusion detection; and implementing data loss prevention. We also show how CamFlow can be leveraged to capture meaningful provenance without modifying existing applications.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"157 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77536678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}