Proceedings of the 2017 Symposium on Cloud Computing最新文献

英文中文

Early work on modeling computational sprinting 早期对计算冲刺建模的研究

Proceedings of the 2017 Symposium on Cloud Computing

Pub Date : 2017-09-24 DOI: 10.1145/3127479.3132691

Nathaniel Morris, Christopher Stewart, R. Birke, L. Chen, Jaimie Kelley

Ever tightening power caps constrain the sustained processing speed of modern processors. With computational sprinting, processors reserve a small power budget that can be used to increase processing speed for short bursts. Computational sprinting speeds up query executions that would otherwise yield slow response time. Common mechanisms used for sprinting include DVFS, core scaling, CPU throttling and application-specific accelerators.

不断收紧的功率上限限制了现代处理器的持续处理速度。通过计算冲刺，处理器保留了一个小的功率预算，可以用来提高短脉冲的处理速度。计算冲刺加快了查询执行的速度，否则会导致响应时间变慢。用于冲刺的常见机制包括DVFS、核心缩放、CPU节流和特定于应用程序的加速器。

引用次数: 2

Bridging the architectural gap between NOS design principles in software-defined networks 弥合软件定义网络中NOS设计原则之间的架构差距

Proceedings of the 2017 Symposium on Cloud Computing

Pub Date : 2017-09-24 DOI: 10.1145/3127479.3132567

Jaehyun Nam, Hyeonseong Jo, Yeonkeun Kim, Phillip A. Porras, V. Yegneswaran, Seungwon Shin

We design Barista, as a new framework that seeks to enable flexible and customizable instantiations of network operating systems (NOSs) supporting diverse design choices, using two key features that harmonize architectural differences across design choices: component synthesis and dynamic event control. With these capabilities, Barista operators to easily enable functionalities and dynamically adjust the control flows among those functionalities.

我们将Barista设计为一个新的框架，旨在实现灵活的、可定制的网络操作系统实例，支持不同的设计选择，使用两个关键特性来协调不同设计选择的架构差异:组件合成和动态事件控制。有了这些功能，Barista操作员可以轻松地启用功能并动态调整这些功能之间的控制流。

引用次数: 1

Secure data types: a simple abstraction for confidentiality-preserving data analytics 安全数据类型:用于保密性数据分析的简单抽象

Proceedings of the 2017 Symposium on Cloud Computing

Pub Date : 2017-09-24 DOI: 10.1145/3127479.3129256

Savvas Savvides, J. Stephen, Masoud Saeida Ardekani, V. Sundaram, P. Eugster

Cloud computing offers a cost-efficient data analytics platform. However, due to the sensitive nature of data, many organizations are reluctant to analyze their data in public clouds. Both software-based and hardware-based solutions have been proposed to address the stalemate, yet all have substantial limitations. We observe that a main issue cutting across all solutions is that they attempt to support confidentiality in data queries in a way transparent to queries. We propose the novel abstraction of secure data types with corresponding annotations for programmers to conveniently denote constraints relevant to security. These abstractions are leveraged by novel compilation techniques in our system Cuttlefish to compute data analytics queries in public cloud infrastructures while keeping sensitive data confidential. Cuttlefish encrypts all sensitive data residing in the cloud and employs partially homomorphic encryption schemes to perform operations securely, resorting however to client-side completion, re-encryption, or secure hardware-based re-encryption based on Intel's SGX when available based on a novel planner engine. Our evaluation shows that our prototype can execute all queries in standard benchmarks such as TPC-H and TPC-DS with an average overhead of 2.34× and 1.69× respectively compared to a plaintext execution that reveals all data.

云计算提供了一个经济高效的数据分析平台。然而，由于数据的敏感性，许多组织不愿意在公共云中分析他们的数据。人们提出了基于软件和基于硬件的解决方案来解决这一僵局，但它们都有很大的局限性。我们观察到，所有解决方案的一个主要问题是，它们试图以对查询透明的方式支持数据查询的机密性。我们提出了一种新颖的安全数据类型抽象和相应的注释，以便程序员方便地表示与安全相关的约束。在我们的系统Cuttlefish中，这些抽象通过新颖的编译技术来计算公共云基础设施中的数据分析查询，同时保持敏感数据的机密性。Cuttlefish对驻留在云中的所有敏感数据进行加密，并采用部分同态加密方案来安全执行操作，然而，在基于新型规划器引擎的情况下，可以采用客户端完成、重新加密或基于英特尔SGX的基于安全硬件的重新加密。我们的评估表明，与显示所有数据的明文执行相比，我们的原型可以执行TPC-H和TPC-DS等标准基准中的所有查询，平均开销分别为2.34倍和1.69倍。

{"title":"Secure data types: a simple abstraction for confidentiality-preserving data analytics","authors":"Savvas Savvides, J. Stephen, Masoud Saeida Ardekani, V. Sundaram, P. Eugster","doi":"10.1145/3127479.3129256","DOIUrl":"https://doi.org/10.1145/3127479.3129256","url":null,"abstract":"Cloud computing offers a cost-efficient data analytics platform. However, due to the sensitive nature of data, many organizations are reluctant to analyze their data in public clouds. Both software-based and hardware-based solutions have been proposed to address the stalemate, yet all have substantial limitations. We observe that a main issue cutting across all solutions is that they attempt to support confidentiality in data queries in a way transparent to queries. We propose the novel abstraction of secure data types with corresponding annotations for programmers to conveniently denote constraints relevant to security. These abstractions are leveraged by novel compilation techniques in our system Cuttlefish to compute data analytics queries in public cloud infrastructures while keeping sensitive data confidential. Cuttlefish encrypts all sensitive data residing in the cloud and employs partially homomorphic encryption schemes to perform operations securely, resorting however to client-side completion, re-encryption, or secure hardware-based re-encryption based on Intel's SGX when available based on a novel planner engine. Our evaluation shows that our prototype can execute all queries in standard benchmarks such as TPC-H and TPC-DS with an average overhead of 2.34× and 1.69× respectively compared to a plaintext execution that reveals all data.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84664636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Optimized on-demand data streaming from sensor nodes 从传感器节点优化按需数据流

Proceedings of the 2017 Symposium on Cloud Computing

Pub Date : 2017-09-24 DOI: 10.1145/3127479.3131621

J. Traub, S. Breß, T. Rabl, Asterios Katsifodimos, V. Markl

Real-time sensor data enables diverse applications such as smart metering, traffic monitoring, and sport analysis. In the Internet of Things, billions of sensor nodes form a sensor cloud and offer data streams to analysis systems. However, it is impossible to transfer all available data with maximal frequencies to all applications. Therefore, we need to tailor data streams to the demand of applications. We contribute a technique that optimizes communication costs while maintaining the desired accuracy. Our technique schedules reads across huge amounts of sensors based on the data-demands of a huge amount of concurrent queries. We introduce user-defined sampling functions that define the data-demand of queries and facilitate various adaptive sampling techniques, which decrease the amount of transferred data. Moreover, we share sensor reads and data transfers among queries. Our experiments with real-world data show that our approach saves up to 87% in data transmissions.

实时传感器数据支持多种应用，如智能计量、交通监控和体育分析。在物联网中，数十亿传感器节点形成传感器云，并为分析系统提供数据流。然而，不可能以最大频率将所有可用数据传输到所有应用程序。因此，我们需要根据应用程序的需求定制数据流。我们提供了一种技术，可以在保持所需准确性的同时优化通信成本。我们的技术根据大量并发查询的数据需求来安排跨大量传感器的读取。我们引入了用户定义的采样函数，这些函数定义了查询的数据需求，并促进了各种自适应采样技术，从而减少了传输的数据量。此外，我们在查询之间共享传感器读取和数据传输。我们对真实数据的实验表明，我们的方法可以节省高达87%的数据传输。

引用次数: 35

Processing Java UDFs in a C++ environment 在c++环境中处理Java udf

Proceedings of the 2017 Symposium on Cloud Computing

Pub Date : 2017-09-24 DOI: 10.1145/3127479.3132022

Viktor Rosenfeld, René Müller, Pınar Tözün, Fatma Özcan

Many popular big data analytics systems today make liberal use of user-defined functions (UDFs) in their programming interface and are written in languages based on the Java Virtual Machine (JVM). This combination creates a barrier when we want to integrate processing engines written in a language that compiles down to machine code with a JVM-based big data analytics ecosystem. In this paper, we investigate efficient ways of executing UDFs written in Java inside a data processing engine written in C++. While it is possible to call Java code from machine code via the Java Native Interface (JNI), a naive implementation that applies the UDF one row at a time incurs a significant overhead, up to an order of magnitude. Instead, we can significantly reduce the costs of JNI calls and data copies between Java and machine code, if we execute UDFs on batches of rows, and reuse input/output buffers when possible. Our evaluation of these techniques using different scalar UDFs, in a prototype system that combines Spark and a columnar data processing engine written in C++, shows that such a combination does not slow down the execution of SparkSQL queries containing such UDFs. In fact, we find that the execution of Java UDFs inside an embedded JVM in our C++ engine is 1.12X to 1.53X faster than executing in Spark alone. Our analysis also shows that compiling Java UDFs directly into machine code is not always beneficial over strided execution in the JVM.

如今，许多流行的大数据分析系统在其编程接口中自由使用用户定义函数(udf)，并使用基于Java虚拟机(JVM)的语言编写。当我们想要将用编译成机器代码的语言编写的处理引擎与基于jvm的大数据分析生态系统集成在一起时，这种组合创造了一个障碍。在本文中，我们研究了在用c++编写的数据处理引擎中执行用Java编写的udf的有效方法。虽然可以通过Java本机接口(Java Native Interface, JNI)从机器码调用Java代码，但是一次只应用一行UDF的幼稚实现会导致巨大的开销，最高可达一个数量级。相反，如果我们对成批的行执行udf，并在可能的情况下重用输入/输出缓冲区，我们可以显著降低JNI调用和Java与机器码之间数据复制的成本。我们在一个结合了Spark和用c++编写的列数据处理引擎的原型系统中，使用不同的标量udf对这些技术进行了评估，结果表明，这种组合不会减慢包含此类udf的SparkSQL查询的执行速度。事实上，我们发现在c++引擎的嵌入式JVM中执行Java udf比单独在Spark中执行快1.12到1.53倍。我们的分析还表明，将Java udf直接编译为机器码并不总是比在JVM中跨行执行更有利。

{"title":"Processing Java UDFs in a C++ environment","authors":"Viktor Rosenfeld, René Müller, Pınar Tözün, Fatma Özcan","doi":"10.1145/3127479.3132022","DOIUrl":"https://doi.org/10.1145/3127479.3132022","url":null,"abstract":"Many popular big data analytics systems today make liberal use of user-defined functions (UDFs) in their programming interface and are written in languages based on the Java Virtual Machine (JVM). This combination creates a barrier when we want to integrate processing engines written in a language that compiles down to machine code with a JVM-based big data analytics ecosystem. In this paper, we investigate efficient ways of executing UDFs written in Java inside a data processing engine written in C++. While it is possible to call Java code from machine code via the Java Native Interface (JNI), a naive implementation that applies the UDF one row at a time incurs a significant overhead, up to an order of magnitude. Instead, we can significantly reduce the costs of JNI calls and data copies between Java and machine code, if we execute UDFs on batches of rows, and reuse input/output buffers when possible. Our evaluation of these techniques using different scalar UDFs, in a prototype system that combines Spark and a columnar data processing engine written in C++, shows that such a combination does not slow down the execution of SparkSQL queries containing such UDFs. In fact, we find that the execution of Java UDFs inside an embedded JVM in our C++ engine is 1.12X to 1.53X faster than executing in Spark alone. Our analysis also shows that compiling Java UDFs directly into machine code is not always beneficial over strided execution in the JVM.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76177865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Indy: a software system for the dense cloud Indy:一个用于密集云的软件系统

Proceedings of the 2017 Symposium on Cloud Computing

Pub Date : 2017-09-24 DOI: 10.1145/3127479.3134429

Chenggang Wu, Jose M. Faleiro, Yihan Lin, J. Hellerstein

Early iterations of datacenter-scale computing were a reaction to the expensive multiprocessors and supercomputers of their day. They were built on clusters of commodity hardware, which at the time were packages with 2--4 CPUs. However, as datacenter-scale computing has matured, cloud vendors have provided denser, more powerful hardware. Today's cloud infrastructure aims to deliver not only reliable and cost-effective computing, but also excellent performance.

数据中心规模计算的早期迭代是对当时昂贵的多处理器和超级计算机的反应。它们是建立在商用硬件集群上的，这些硬件在当时是带有2- 4个cpu的软件包。然而，随着数据中心规模计算的成熟，云供应商提供了更密集、更强大的硬件。今天的云基础设施的目标不仅是提供可靠和经济高效的计算，而且还提供卓越的性能。

引用次数: 0

Revisiting performance in big data systems: an resource decoupling approach 重新审视大数据系统中的性能:一种资源解耦方法

Proceedings of the 2017 Symposium on Cloud Computing

Pub Date : 2017-09-24 DOI: 10.1145/3127479.3132685

Chen Yang, Qi Guo, Xiaofeng Meng, Rihui Xin, Chunkai Wang

Big data systems for large-scale data processing are now in widespread use. To improve their performance, both academia and industry have expended a great deal of effort in the analysis of performance bottlenecks. Most big data systems, as Hadoop and Spark, allow distributed computing across clusters. As a result, the execution of systems always parallelizes the use of the CPU, memory, disk and network. If a given resource has the greatest limiting impact on performance, systems will be bottlenecked on it. For a system designer, it is effective for the improvement of performance to tune the bottleneck resource. The key point for the aforementioned scenario is how to determine the bottleneck resource. The nature clue is to quantify the impact of the four major components and identify one causing the greatest impact factor as the bottleneck resource.

用于大规模数据处理的大数据系统正在得到广泛应用。为了提高它们的性能，学术界和工业界都花费了大量的精力来分析性能瓶颈。大多数大数据系统，如Hadoop和Spark，都允许跨集群的分布式计算。因此，系统的执行总是并行地使用CPU、内存、磁盘和网络。如果给定的资源对性能有最大的限制影响，系统就会在它上面遇到瓶颈。对于系统设计人员来说，优化瓶颈资源是提高性能的有效方法。上述场景的关键点是如何确定瓶颈资源。自然线索是量化四个主要组成部分的影响，并确定造成最大影响因素的瓶颈资源。

引用次数: 0

A scalable distributed spatial index for the internet-of-things 物联网的可扩展分布式空间索引

Proceedings of the 2017 Symposium on Cloud Computing

Pub Date : 2017-09-24 DOI: 10.1145/3127479.3132254

A. Iyer, I. Stoica

The increasing interest in the Internet-of-Things (IoT) suggests that a new source of big data is imminent---the machines and sensors in the IoT ecosystem. The fundamental characteristic of the data produced by these sources is that they are inherently geospatial in nature. In addition, they exhibit unprecedented and unpredictable skews. Thus, big data systems designed for IoT applications must be able to efficiently ingest, index and query spatial data having heavy and unpredictable skews. Spatial indexing is well explored area of research in literature, but little attention has been given to the topic of efficient distributed spatial indexing. In this paper, we propose Sift, a distributed spatial index and its implementation. Unlike systems that depend on load balancing mechanisms that kick-in post ingestion, Sift tries to distribute the incoming data along the distributed structure at indexing time and thus incurs minimal rebalancing overhead. Sift depends only on an underlying key-value store, hence is implementable in many existing big data stores. Our evaluations of Sift on a popular open source data store show promising results---Sift achieves up to 8× reduction in indexing overhead while simultaneously reducing the query latency and index size by over 2× and 3× respectively, in a distributed environment compared to the state-of-the-art.

人们对物联网(IoT)日益增长的兴趣表明，一个新的大数据来源即将到来——物联网生态系统中的机器和传感器。这些来源产生的数据的基本特征是它们本质上是地理空间的。此外，它们还表现出前所未有的、不可预测的偏差。因此，为物联网应用设计的大数据系统必须能够有效地摄取、索引和查询具有严重和不可预测偏差的空间数据。空间标引是文献中研究较多的领域，但关于高效分布式空间标引的研究却很少。本文提出了一种分布式空间索引Sift及其实现方法。与依赖于摄取后启动的负载平衡机制的系统不同，Sift尝试在索引时沿着分布式结构分发传入的数据，从而产生最小的再平衡开销。Sift仅依赖于底层的键值存储，因此可以在许多现有的大数据存储中实现。我们在一个流行的开源数据存储上对Sift进行的评估显示出了令人鼓舞的结果——与最先进的方法相比，在分布式环境中，Sift将索引开销减少了8倍，同时将查询延迟和索引大小分别减少了2倍和3倍以上。

{"title":"A scalable distributed spatial index for the internet-of-things","authors":"A. Iyer, I. Stoica","doi":"10.1145/3127479.3132254","DOIUrl":"https://doi.org/10.1145/3127479.3132254","url":null,"abstract":"The increasing interest in the Internet-of-Things (IoT) suggests that a new source of big data is imminent---the machines and sensors in the IoT ecosystem. The fundamental characteristic of the data produced by these sources is that they are inherently geospatial in nature. In addition, they exhibit unprecedented and unpredictable skews. Thus, big data systems designed for IoT applications must be able to efficiently ingest, index and query spatial data having heavy and unpredictable skews. Spatial indexing is well explored area of research in literature, but little attention has been given to the topic of efficient distributed spatial indexing. In this paper, we propose Sift, a distributed spatial index and its implementation. Unlike systems that depend on load balancing mechanisms that kick-in post ingestion, Sift tries to distribute the incoming data along the distributed structure at indexing time and thus incurs minimal rebalancing overhead. Sift depends only on an underlying key-value store, hence is implementable in many existing big data stores. Our evaluations of Sift on a popular open source data store show promising results---Sift achieves up to 8× reduction in indexing overhead while simultaneously reducing the query latency and index size by over 2× and 3× respectively, in a distributed environment compared to the state-of-the-art.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86142934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Janus: supporting heterogeneous power management in virtualized environments Janus:支持虚拟化环境中的异构电源管理

Proceedings of the 2017 Symposium on Cloud Computing

Pub Date : 2017-09-24 DOI: 10.1145/3127479.3132566

Daehoon Kim, Mohammad Alian, Jaehyuk Huh, N. Kim

The cloud servers have routinely adopted machine virtualization for high energy efficiency. Such virtualization notably improves energy efficiency not only through consolidation, but also through Dynamic Voltage/Frequency Scaling (DVFS). Thus, current hypervisors such as Xen and KVM support power management (PM) policies statically or dynamically setting a Voltage/Frequency (V/F) level, similar to ones deployed by the Linux. However, the current hypervisors can promote only a single PM policy (i.e., host governor) per physical core. This poses a unique challenge for VMs sharing a physical core and running applications with opposite runtime characteristics in a time-shared manner (i.e., heterogeneous VMs); note that the consolidation policy often encourages heterogeneous VMs to share a physical core, since such VMs use different resources in the system [2].

云服务器通常采用机器虚拟化来实现高能效。这种虚拟化不仅通过整合，而且通过动态电压/频率缩放(DVFS)显著提高了能源效率。因此，当前的管理程序(如Xen和KVM)支持静态或动态设置电压/频率(V/F)级别的电源管理(PM)策略，类似于Linux部署的管理程序。然而，当前的管理程序只能为每个物理核心提升一个PM策略(即主机调控器)。这对共享物理核心的vm和以分时方式运行具有相反运行时特征的应用程序(即异构vm)提出了独特的挑战;注意，整合策略通常鼓励异构虚拟机共享一个物理核心，因为这些虚拟机在系统中使用不同的资源[2]。

引用次数: 0

Practical whole-system provenance capture 实用的全系统来源捕获

Proceedings of the 2017 Symposium on Cloud Computing

Pub Date : 2017-09-24 DOI: 10.1145/3127479.3129249

Thomas Pasquier, Xueyuan Han, Mark Goldstein, Thomas Moyer, D. Eyers, M. Seltzer, J. Bacon

Data provenance describes how data came to be in its present form. It includes data sources and the transformations that have been applied to them. Data provenance has many uses, from forensics and security to aiding the reproducibility of scientific experiments. We present CamFlow, a whole-system provenance capture mechanism that integrates easily into a PaaS offering. While there have been several prior whole-system provenance systems that captured a comprehensive, systemic and ubiquitous record of a system's behavior, none have been widely adopted. They either A) impose too much overhead, B) are designed for long-outdated kernel releases and are hard to port to current systems, C) generate too much data, or D) are designed for a single system. CamFlow addresses these shortcoming by: 1) leveraging the latest kernel design advances to achieve efficiency; 2) using a self-contained, easily maintainable implementation relying on a Linux Security Module, NetFilter, and other existing kernel facilities; 3) providing a mechanism to tailor the captured provenance data to the needs of the application; and 4) making it easy to integrate provenance across distributed systems. The provenance we capture is streamed and consumed by tenant-built auditor applications. We illustrate the usability of our implementation by describing three such applications: demonstrating compliance with data regulations; performing fault/intrusion detection; and implementing data loss prevention. We also show how CamFlow can be leveraged to capture meaningful provenance without modifying existing applications.

数据来源描述了数据是如何以现在的形式出现的。它包括数据源和应用于数据源的转换。数据来源有很多用途，从取证和安全到帮助科学实验的可重复性。我们介绍了CamFlow，这是一个完整的系统来源捕获机制，可以轻松集成到PaaS产品中。虽然有几个先前的全系统溯源系统捕获了系统行为的全面、系统和无处不在的记录，但没有一个被广泛采用。它们要么A)带来太多开销，B)为过时的内核版本设计，很难移植到当前系统，C)生成太多数据，或者D)为单个系统设计。CamFlow通过以下方式解决了这些缺点:1)利用最新的内核设计进步来实现效率;2)使用独立的、易于维护的实现，依赖于Linux安全模块、NetFilter和其他现有的内核设施;3)提供一种机制，以根据应用程序的需要定制所捕获的来源数据;4)使跨分布式系统集成来源变得容易。我们捕获的来源是流的，并由租户构建的审计应用程序使用。我们通过描述三个这样的应用程序来说明我们实现的可用性:证明符合数据法规;执行故障/入侵检测;实施数据丢失预防。我们还展示了如何利用CamFlow在不修改现有应用程序的情况下捕获有意义的来源。

{"title":"Practical whole-system provenance capture","authors":"Thomas Pasquier, Xueyuan Han, Mark Goldstein, Thomas Moyer, D. Eyers, M. Seltzer, J. Bacon","doi":"10.1145/3127479.3129249","DOIUrl":"https://doi.org/10.1145/3127479.3129249","url":null,"abstract":"Data provenance describes how data came to be in its present form. It includes data sources and the transformations that have been applied to them. Data provenance has many uses, from forensics and security to aiding the reproducibility of scientific experiments. We present CamFlow, a whole-system provenance capture mechanism that integrates easily into a PaaS offering. While there have been several prior whole-system provenance systems that captured a comprehensive, systemic and ubiquitous record of a system's behavior, none have been widely adopted. They either A) impose too much overhead, B) are designed for long-outdated kernel releases and are hard to port to current systems, C) generate too much data, or D) are designed for a single system. CamFlow addresses these shortcoming by: 1) leveraging the latest kernel design advances to achieve efficiency; 2) using a self-contained, easily maintainable implementation relying on a Linux Security Module, NetFilter, and other existing kernel facilities; 3) providing a mechanism to tailor the captured provenance data to the needs of the application; and 4) making it easy to integrate provenance across distributed systems. The provenance we capture is streamed and consumed by tenant-built auditor applications. We illustrate the usability of our implementation by describing three such applications: demonstrating compliance with data regulations; performing fault/intrusion detection; and implementing data loss prevention. We also show how CamFlow can be leveraged to capture meaningful provenance without modifying existing applications.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"157 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77536678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 114

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 2017 Symposium on Cloud Computing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀