2014 IEEE 7th International Conference on Cloud Computing最新文献

英文中文

Improving MapReduce Performance in a Heterogeneous Cloud: A Measurement Study 在异构云环境中提高MapReduce性能:一项测量研究

2014 IEEE 7th International Conference on Cloud Computing

Pub Date : 2014-06-27 DOI: 10.1109/CLOUD.2014.61

Xu Zhao, Ling Liu, Qi Zhang, Xiaoshe Dong

Hybrid clouds, geo-distributed cloud and continuous upgrades of computing, storage and networking resources in the cloud have driven datacenters evolving towards heterogeneous clusters. Unfortunately, most of MapReduce implementations are designed for homogeneous computing environments and perform poorly in heterogeneous clusters. Although a fair of research efforts have dedicated to improve MapReduce performance, there still lacks of in-depth understanding of the key factors that affect the performance of MapReduce jobs in heterogeneous clusters. In this paper, we present an extensive experimental study on two categories of factors: system configuration and task scheduling. Our measurement study shows that an in-depth understanding of these factors is critical for improving MapReduce performance in a heterogeneous environment. We conclude with five key findings: (1) Early shuffle, though effective for reducing the latency of MapReduce jobs, can impact the performance of map tasks and reduce tasks differently when running on different types of nodes. (2) Two phases in map tasks have different sensitive to input block size and the ratio of sort phase with different block size is different for different type of nodes. (3) Scheduling map or reduce tasks dynamically with node capacity and workload awareness can further enhance the job performance and improve resource consumption efficiency. (4) Although random scheduling of reduce tasks works well in homogeneous clusters, it can significantly degrade the performance in heterogeneous clusters when shuffled data size is large. (5) Phase-aware progress rate estimation and speculation strategy can provide substantial performance gain over the state of art speculation scheduler.

混合云、地理分布式云以及云计算、存储和网络资源的不断升级，推动数据中心向异构集群发展。不幸的是，大多数MapReduce实现都是为同构计算环境设计的，在异构集群中表现不佳。尽管有大量的研究致力于提高MapReduce的性能，但对于影响异构集群中MapReduce作业性能的关键因素仍然缺乏深入的了解。在本文中，我们对两类因素:系统配置和任务调度进行了广泛的实验研究。我们的测量研究表明，深入了解这些因素对于在异构环境中提高MapReduce性能至关重要。我们总结了五个关键发现:(1)早期shuffle虽然可以有效地减少MapReduce作业的延迟，但在不同类型的节点上运行时，会对map任务的性能产生不同的影响。(2)映射任务的两个阶段对输入块大小的敏感性不同，不同类型节点的不同块大小排序阶段所占的比例也不同。(3)利用节点容量和负载感知动态调度映射或减少任务，可以进一步提高作业性能，提高资源消耗效率。(4)虽然reduce任务的随机调度在同构集群中效果良好，但在异构集群中，当shuffle数据量较大时，reduce任务的随机调度会显著降低性能。(5)相位感知进度率估计和推测策略可以提供比最先进的推测调度程序更大的性能增益。

{"title":"Improving MapReduce Performance in a Heterogeneous Cloud: A Measurement Study","authors":"Xu Zhao, Ling Liu, Qi Zhang, Xiaoshe Dong","doi":"10.1109/CLOUD.2014.61","DOIUrl":"https://doi.org/10.1109/CLOUD.2014.61","url":null,"abstract":"Hybrid clouds, geo-distributed cloud and continuous upgrades of computing, storage and networking resources in the cloud have driven datacenters evolving towards heterogeneous clusters. Unfortunately, most of MapReduce implementations are designed for homogeneous computing environments and perform poorly in heterogeneous clusters. Although a fair of research efforts have dedicated to improve MapReduce performance, there still lacks of in-depth understanding of the key factors that affect the performance of MapReduce jobs in heterogeneous clusters. In this paper, we present an extensive experimental study on two categories of factors: system configuration and task scheduling. Our measurement study shows that an in-depth understanding of these factors is critical for improving MapReduce performance in a heterogeneous environment. We conclude with five key findings: (1) Early shuffle, though effective for reducing the latency of MapReduce jobs, can impact the performance of map tasks and reduce tasks differently when running on different types of nodes. (2) Two phases in map tasks have different sensitive to input block size and the ratio of sort phase with different block size is different for different type of nodes. (3) Scheduling map or reduce tasks dynamically with node capacity and workload awareness can further enhance the job performance and improve resource consumption efficiency. (4) Although random scheduling of reduce tasks works well in homogeneous clusters, it can significantly degrade the performance in heterogeneous clusters when shuffled data size is large. (5) Phase-aware progress rate estimation and speculation strategy can provide substantial performance gain over the state of art speculation scheduler.","PeriodicalId":288542,"journal":{"name":"2014 IEEE 7th International Conference on Cloud Computing","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127771556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Evolving Big Data Stream Classification with MapReduce 用MapReduce发展大数据流分类

2014 IEEE 7th International Conference on Cloud Computing

Pub Date : 2014-06-27 DOI: 10.1109/CLOUD.2014.82

Ahsanul Haque, Brandon Parker, L. Khan, B. Thuraisingham

Big Data Stream mining has some inherent challenges which are not present in traditional data mining. Not only Big Data Stream receives large volume of data continuously, but also it may have different types of features. Moreover, the concepts and features tend to evolve throughout the stream. Traditional data mining techniques are not sufficient to address these challenges. In our current work, we have designed a multi-tiered ensemble based method HSMiner to address aforementioned challenges to label instances in an evolving Big Data Stream. However, this method requires building large number of AdaBoost ensembles for each of the numeric features after receiving each new data chunk which is very costly. Thus, HSMiner may face scalability issue in case of classifying Big Data Stream. To address this problem, we propose three approaches to build these large number of AdaBoost ensembles using MapReduce based parallelism. We compare each of these approaches from different aspects of design. We also empirically show that, these approaches are very useful for our base method to achieve significant scalability and speedup.

大数据流挖掘具有传统数据挖掘所不具备的一些固有挑战。大数据流不仅连续接收大量数据，而且可能具有不同类型的特征。此外，概念和特性倾向于在整个流程中不断发展。传统的数据挖掘技术不足以应对这些挑战。在我们目前的工作中，我们设计了一个基于多层集成的方法HSMiner，以解决上述在不断发展的大数据流中标记实例的挑战。然而，这种方法需要在接收到每个新数据块后为每个数字特征构建大量的AdaBoost集成，这是非常昂贵的。因此，在对大数据流进行分类的情况下，HSMiner可能面临可扩展性问题。为了解决这个问题，我们提出了三种使用基于MapReduce的并行性来构建这些大量AdaBoost集成的方法。我们从设计的不同方面来比较这些方法。经验还表明，这些方法对我们的基本方法非常有用，可以实现显著的可扩展性和加速。

引用次数: 11

Fast Live Migration with Small IO Performance Penalty by Exploiting SAN in Parallel 通过并行利用SAN，快速实时迁移和小IO性能损失

2014 IEEE 7th International Conference on Cloud Computing

Pub Date : 2014-06-27 DOI: 10.1109/CLOUD.2014.16

Soramichi Akiyama, Takahiro Hirofuchi, Ryousei Takano, S. Honiden

Virtualization techniques greatly benefit cloud computing. Live migration enables a datacenter to dynamically replace virtual machines (VMs) without disrupting services running on them. Efficient live migration is the key to improve the energy efficiency and resource utilization of a datacenter through dynamic placement of VMs. Recent studies have achieved efficient live migration by deleting the page cache of the guest OS to shrink the memory size of it before a migration. However, these studies do not solve the problem of IO performance penalty after a migration due to the loss of page cache. We propose an advanced memory transfer mechanism for live migration, which skips transferring the page cache to shorten total migration time while restoring it transparently from the guest OS via the SAN to prevent IO performance penalty. To start a migration, our mechanism collects the mapping information between page cache and disk blocks. During a migration, the source host skips transferring the page cache but transfers other memory content, while the destination host transfers the same data as the page cache from the disk blocks via the SAN. Experiments with web server and database workloads showed that our mechanism reduced total migration time with significantly small IO performance penalty.

虚拟化技术极大地有利于云计算。动态迁移使数据中心能够动态地替换虚拟机，而不会中断虚拟机上运行的业务。高效的实时迁移是通过动态放置虚拟机来提高数据中心能源效率和资源利用率的关键。最近的研究通过在迁移之前删除客户操作系统的页面缓存来缩小其内存大小，从而实现了高效的实时迁移。然而，这些研究并没有解决迁移后由于页面缓存丢失而导致的IO性能损失问题。我们提出了一种用于实时迁移的高级内存传输机制，它跳过传输页面缓存以缩短总迁移时间，同时通过SAN透明地从客户机操作系统恢复页面缓存，以防止IO性能损失。要开始迁移，我们的机制收集页缓存和磁盘块之间的映射信息。在迁移过程中，源主机跳过传输页缓存，而是传输其他内存内容，而目标主机通过SAN从磁盘块传输与页缓存相同的数据。对web服务器和数据库工作负载的实验表明，我们的机制减少了总迁移时间，并且IO性能损失很小。

{"title":"Fast Live Migration with Small IO Performance Penalty by Exploiting SAN in Parallel","authors":"Soramichi Akiyama, Takahiro Hirofuchi, Ryousei Takano, S. Honiden","doi":"10.1109/CLOUD.2014.16","DOIUrl":"https://doi.org/10.1109/CLOUD.2014.16","url":null,"abstract":"Virtualization techniques greatly benefit cloud computing. Live migration enables a datacenter to dynamically replace virtual machines (VMs) without disrupting services running on them. Efficient live migration is the key to improve the energy efficiency and resource utilization of a datacenter through dynamic placement of VMs. Recent studies have achieved efficient live migration by deleting the page cache of the guest OS to shrink the memory size of it before a migration. However, these studies do not solve the problem of IO performance penalty after a migration due to the loss of page cache. We propose an advanced memory transfer mechanism for live migration, which skips transferring the page cache to shorten total migration time while restoring it transparently from the guest OS via the SAN to prevent IO performance penalty. To start a migration, our mechanism collects the mapping information between page cache and disk blocks. During a migration, the source host skips transferring the page cache but transfers other memory content, while the destination host transfers the same data as the page cache from the disk blocks via the SAN. Experiments with web server and database workloads showed that our mechanism reduced total migration time with significantly small IO performance penalty.","PeriodicalId":288542,"journal":{"name":"2014 IEEE 7th International Conference on Cloud Computing","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131920546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

FRESH: Fair and Efficient Slot Configuration and Scheduling for Hadoop Clusters FRESH:公平有效的Hadoop集群槽位配置和调度

2014 IEEE 7th International Conference on Cloud Computing

Pub Date : 2014-06-27 DOI: 10.1109/CLOUD.2014.106

Jiayin Wang, Yi Yao, Ying Mao, B. Sheng, N. Mi

Hadoop is an emerging framework for parallel big data processing. While becoming popular, Hadoop is too complex for regular users to fully understand all the system parameters and tune them appropriately. Especially when processing a batch of jobs, default Hadoop setting may cause inefficient resource utilization and unnecessarily prolong the execution time. This paper considers an extremely important setting of slot configuration which by default is fixed and static. We proposed an enhanced Hadoop system called FRESH which can derive the best slot setting, dynamically configure slots, and appropriately assign tasks to the available slots. The experimental results show that when serving a batch of MapReduce jobs, FRESH significantly improves the makespan as well as the fairness among jobs.

Hadoop是一个新兴的并行大数据处理框架。虽然Hadoop变得越来越流行，但对于普通用户来说，它太复杂了，无法完全理解所有的系统参数并进行适当的调优。特别是在处理批量作业时，默认的Hadoop设置可能会导致资源利用率低下，不必要地延长执行时间。本文考虑了一个非常重要的槽位配置设置，默认情况下槽位配置是固定的和静态的。我们提出了一种名为FRESH的增强Hadoop系统，它可以导出最佳槽位设置，动态配置槽位，并适当地将任务分配到可用的槽位。实验结果表明，FRESH在处理一批MapReduce作业时，显著提高了作业之间的完工时间和公平性。

引用次数: 46

Evaluating Dynamic Resource Allocation Strategies in Virtualized Data Centers 评估虚拟化数据中心中的动态资源分配策略

2014 IEEE 7th International Conference on Cloud Computing

Pub Date : 2014-06-27 DOI: 10.1109/CLOUD.2014.52

A. Wolke, Lukas Ziegler

Virtualization technology allows a dynamic allocation of VMs to servers. It reduces server demand and increases energy efficiency of data centers. Dynamic control strategies migrate VMs between servers in dependence to their actual workload. A concept that promises further improvements in VM allocation efficiency. In this paper we evaluate the applicability of DSAP in a deterministic environment. DSAP is a linear program, calculating VM allocations and live-migrations on workload patterns known a priori. Efficiency is evaluated by simulations as well as an experimental test bed infrastructure. Results are compared against alternative control approaches that we studied in preliminary works. Our findings are, dynamic allocation can reduce server demand at a reasonable service quality. Countermeasures are required to keep the number of live-migrations under control.

虚拟化技术支持将虚拟机动态分配给服务器。减少服务器需求，提高数据中心的能源效率。动态控制策略在依赖于实际工作负载的服务器之间迁移虚拟机。这个概念承诺进一步提高虚拟机分配效率。本文评估了DSAP在确定性环境中的适用性。DSAP是一个线性程序，根据已知的先验工作负载模式计算VM分配和实时迁移。通过仿真和实验试验台基础设施对效率进行了评估。结果与我们在初步工作中研究的其他控制方法进行了比较。我们的研究结果是，动态分配可以在合理的服务质量下减少服务器需求。需要采取对策来控制活移民的数量。

引用次数: 12

Mixed-Tenancy in the Wild - Applicability of Mixed-Tenancy for Real-World Enterprise SaaS-Applications 野外混合租赁——混合租赁在实际企业saas应用程序中的适用性

2014 IEEE 7th International Conference on Cloud Computing

Pub Date : 2014-06-27 DOI: 10.1109/CLOUD.2014.119

S. Ruehl, Malte Rupprecht, Bjorn Morr, Matthias Reinhardt, S. Verclas

Software-as-a-Service (SaaS) is a delivery model whose basic idea is to provide applications to the customer on demand over the Internet. SaaS thereby promotes multi-tenancy as a tool to exploit economies of scale. This means that a single application instance serves multiple customers. However, a major drawback of SaaS is the customers' hesitation of sharing infrastructure, application code, or data with other tenants. This is due to the fact that one of the major threats of multi-tenancy is information disclosure due to a system malfunction, system error, or aggressive actions. So far the only approach in research to counteract on this hesitation has been to enhance the isolation between tenants using the same instance. Our approach (presented in earlier work) tackles this hesitation differently. It allows customers to choose if or even with whom they want to share the application. The approach enables the customer to define their constraints for individual application components and the underlying infrastructure. The contribution of this paper is an analysis of real-world applicability of the mixed-tenancy approach. This is done experimentally by applying the mixed-tenancy approach to OpenERP, an open source enterprise resource planning system used in industry. The conclusion gained from this experiment is that the mixed-tenancy approach is technically realizable for cases of the real world. However, there are scenarios where the mixed-tenancy approach is not economically worthwhile for the operator.

软件即服务(SaaS)是一种交付模型，其基本思想是通过Internet按需向客户提供应用程序。因此，SaaS将多租户作为一种利用规模经济的工具来推广。这意味着单个应用程序实例为多个客户提供服务。然而，SaaS的一个主要缺点是客户在与其他租户共享基础设施、应用程序代码或数据时的犹豫。这是因为多租户的主要威胁之一是由于系统故障、系统错误或攻击性操作导致的信息泄露。到目前为止，研究中消除这种犹豫的唯一方法是增强使用同一实例的租户之间的隔离。我们的方法(在早期的工作中提出)以不同的方式解决了这种犹豫。它允许客户选择是否或甚至与谁共享应用程序。该方法使客户能够为单个应用程序组件和底层基础设施定义约束。本文的贡献在于分析了混合租赁方法在现实世界中的适用性。这是通过将混合租赁方法应用于OpenERP(一种在工业中使用的开源企业资源规划系统)来实验性地完成的。从这个实验中得到的结论是，混合租赁方法在技术上是可以实现的，适用于现实世界的情况。然而，在某些情况下，混合租赁方法对运营商来说在经济上是不值得的。

{"title":"Mixed-Tenancy in the Wild - Applicability of Mixed-Tenancy for Real-World Enterprise SaaS-Applications","authors":"S. Ruehl, Malte Rupprecht, Bjorn Morr, Matthias Reinhardt, S. Verclas","doi":"10.1109/CLOUD.2014.119","DOIUrl":"https://doi.org/10.1109/CLOUD.2014.119","url":null,"abstract":"Software-as-a-Service (SaaS) is a delivery model whose basic idea is to provide applications to the customer on demand over the Internet. SaaS thereby promotes multi-tenancy as a tool to exploit economies of scale. This means that a single application instance serves multiple customers. However, a major drawback of SaaS is the customers' hesitation of sharing infrastructure, application code, or data with other tenants. This is due to the fact that one of the major threats of multi-tenancy is information disclosure due to a system malfunction, system error, or aggressive actions. So far the only approach in research to counteract on this hesitation has been to enhance the isolation between tenants using the same instance. Our approach (presented in earlier work) tackles this hesitation differently. It allows customers to choose if or even with whom they want to share the application. The approach enables the customer to define their constraints for individual application components and the underlying infrastructure. The contribution of this paper is an analysis of real-world applicability of the mixed-tenancy approach. This is done experimentally by applying the mixed-tenancy approach to OpenERP, an open source enterprise resource planning system used in industry. The conclusion gained from this experiment is that the mixed-tenancy approach is technically realizable for cases of the real world. However, there are scenarios where the mixed-tenancy approach is not economically worthwhile for the operator.","PeriodicalId":288542,"journal":{"name":"2014 IEEE 7th International Conference on Cloud Computing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130767523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

MediaPaaS: A Cloud-Based Media Processing Platform for Elastic Live Broadcasting MediaPaaS:基于云的弹性直播媒体处理平台

2014 IEEE 7th International Conference on Cloud Computing

Pub Date : 2014-06-27 DOI: 10.1109/CLOUD.2014.100

Bin Cheng

Mobility is changing the way of how people consume live media content. By staying always connected with the Internet from various mobile devices, people expect to have enhanced TV viewing experience from anywhere on any device. Therefore, live broadcasting needs to be widely accessible and customizable, instead of being passive content only on TV. In this paper we present a cloud-based media processing platform, called MediaPaaS, for enabling elastic live broadcasting in the cloud. As an ecosystem-oriented solution for content providers, we outsource complex media processing from both content providers and terminal devices to the cloud. A distributed media processing model is proposed to enable dynamic pipeline composition and cross-pipeline task sharing in the cloud for flexible live content processing. Also, a prediction-based task scheduling algorithm is presented to minimize cloud resource usage without affecting quality of streams. The MediaPaaS platform allows third-party application developers to extend its capability to enable certain customization for running live channels. To our knowledge, this paper is the first work to openly discuss the detailed design issues of a cloud-based platform for elastic live broadcasting.

移动性正在改变人们消费直播媒体内容的方式。通过各种移动设备始终与互联网保持连接，人们期望在任何地方、任何设备上都能获得更好的电视观看体验。因此，直播需要广泛的可访问性和可定制性，而不是仅仅是电视上的被动内容。在本文中，我们提出了一个基于云的媒体处理平台，称为MediaPaaS，用于在云中实现弹性直播。作为面向内容提供商的生态系统解决方案，我们将内容提供商和终端设备的复杂媒体处理外包给云。提出了一种分布式媒体处理模型，在云中实现动态管道组合和跨管道任务共享，实现灵活的实时内容处理。同时，提出了一种基于预测的任务调度算法，在不影响流质量的前提下最大限度地减少云资源的使用。MediaPaaS平台允许第三方应用程序开发人员扩展其功能，以支持运行实时频道的某些定制。据我们所知，本文是第一篇公开讨论弹性直播云平台详细设计问题的论文。

{"title":"MediaPaaS: A Cloud-Based Media Processing Platform for Elastic Live Broadcasting","authors":"Bin Cheng","doi":"10.1109/CLOUD.2014.100","DOIUrl":"https://doi.org/10.1109/CLOUD.2014.100","url":null,"abstract":"Mobility is changing the way of how people consume live media content. By staying always connected with the Internet from various mobile devices, people expect to have enhanced TV viewing experience from anywhere on any device. Therefore, live broadcasting needs to be widely accessible and customizable, instead of being passive content only on TV. In this paper we present a cloud-based media processing platform, called MediaPaaS, for enabling elastic live broadcasting in the cloud. As an ecosystem-oriented solution for content providers, we outsource complex media processing from both content providers and terminal devices to the cloud. A distributed media processing model is proposed to enable dynamic pipeline composition and cross-pipeline task sharing in the cloud for flexible live content processing. Also, a prediction-based task scheduling algorithm is presented to minimize cloud resource usage without affecting quality of streams. The MediaPaaS platform allows third-party application developers to extend its capability to enable certain customization for running live channels. To our knowledge, this paper is the first work to openly discuss the detailed design issues of a cloud-based platform for elastic live broadcasting.","PeriodicalId":288542,"journal":{"name":"2014 IEEE 7th International Conference on Cloud Computing","volume":"69 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114036961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Introducing SSDs to the Hadoop MapReduce Framework 介绍ssd到Hadoop MapReduce框架

2014 IEEE 7th International Conference on Cloud Computing

Pub Date : 2014-06-27 DOI: 10.1109/CLOUD.2014.45

Sangwhan Moon, J. Lee, Yang-Suk Kee

Solid State Drive (SSD) cost-per-bit continues to decrease. Consequently, system architects increasingly consider replacing Hard Disk Drives (HDDs) with SSDs to accelerate Hadoop MapReduce processing. When attempting this, system architects usually realize that SSD characteristics and today's Hadoop framework exhibit mismatches that impede indiscriminate SSD integration. Hence, cost-effective SSD utilization has proved challenging within many Hadoop environments. This paper compares SSD performance to HDD performance within a Hadoop MapReduce framework. It identifies extensible best practices that can exploit SSD benefits within Hadoop frameworks when combined with high network bandwidth and increased parallel storage access. Terasort benchmark results demonstrate that SSDs presently deliver significant cost-effectiveness when they store intermediate Hadoop data, leaving HDDs to store Hadoop Distributed File System (HDFS) source data.

SSD (Solid State Drive)的每比特成本持续下降。因此，系统架构师越来越多地考虑将hdd (Hard Disk Drives)替换为ssd来加速Hadoop MapReduce的处理。当尝试这样做时，系统架构师通常会意识到SSD特性和今天的Hadoop框架表现出不匹配，从而阻碍了SSD的任意集成。因此，在许多Hadoop环境中，具有成本效益的SSD利用率被证明是具有挑战性的。本文比较了Hadoop MapReduce框架下SSD和HDD的性能。它确定了可扩展的最佳实践，当与高网络带宽和增加的并行存储访问相结合时，可以在Hadoop框架中利用SSD的优势。Terasort基准测试结果表明，ssd目前在存储中间Hadoop数据时提供了显著的成本效益，而hdd则存储Hadoop分布式文件系统(HDFS)源数据。

引用次数: 44

Progger: An Efficient, Tamper-Evident Kernel-Space Logger for Cloud Data Provenance Tracking Progger:一个高效的，防篡改的内核空间记录器，用于云数据来源跟踪

2014 IEEE 7th International Conference on Cloud Computing

Pub Date : 2014-06-27 DOI: 10.1109/CLOUD.2014.121

R. Ko, M. Will

Cloud data provenance, or "what has happened to my data in the cloud", is a critical data security component which addresses pressing data accountability and data governance issues in cloud computing systems. In this paper, we present Progger (Provenance Logger), a kernel-space logger which potentially empowers all cloud stakeholders to trace their data. Logging from the kernel space empowers security analysts to collect provenance from the lowest possible atomic data actions, and enables several higher-level tools to be built for effective end-to-end tracking of data provenance. Within the last few years, there has been an increasing number of proposed kernel space provenance tools but they faced several critical data security and integrity problems. Some of these prior tools' limitations include (1) the inability to provide log tamper-evidence and prevention of fake/manual entries, (2) accurate and granular timestamp synchronisation across several machines, (3) log space requirements and growth, and (4) the efficient logging of root usage of the system. Progger has resolved all these critical issues, and as such, provides high assurance of data security and data activity audit. With this in mind, the paper will discuss these elements of high-assurance cloud data provenance, describe the design of Progger and its efficiency, and present compelling results which paves the way for Progger being a foundation tool used for data activity tracking across all cloud systems.

云数据来源，或“我的数据在云中发生了什么”，是一个关键的数据安全组件，它解决了云计算系统中紧迫的数据问责制和数据治理问题。在本文中，我们介绍了Progger(出处日志记录器)，这是一个内核空间日志记录器，它可能使所有云利益相关者都能够跟踪他们的数据。来自内核空间的日志记录使安全分析人员能够从尽可能低的原子数据操作中收集来源，并支持构建一些高级工具，以便有效地对数据来源进行端到端跟踪。在过去的几年中，提出了越来越多的内核空间来源工具，但它们面临着几个关键的数据安全和完整性问题。这些先前工具的一些限制包括:(1)无法提供日志篡改证据和防止伪造/手动条目，(2)跨多台机器的精确和粒度时间戳同步，(3)日志空间需求和增长，以及(4)有效记录系统的根使用情况。Progger解决了所有这些关键问题，因此，提供了数据安全和数据活动审计的高度保证。考虑到这一点，本文将讨论高保证云数据来源的这些元素，描述Progger的设计及其效率，并给出令人信服的结果，为Progger成为用于跨所有云系统的数据活动跟踪的基础工具铺平道路。

{"title":"Progger: An Efficient, Tamper-Evident Kernel-Space Logger for Cloud Data Provenance Tracking","authors":"R. Ko, M. Will","doi":"10.1109/CLOUD.2014.121","DOIUrl":"https://doi.org/10.1109/CLOUD.2014.121","url":null,"abstract":"Cloud data provenance, or \"what has happened to my data in the cloud\", is a critical data security component which addresses pressing data accountability and data governance issues in cloud computing systems. In this paper, we present Progger (Provenance Logger), a kernel-space logger which potentially empowers all cloud stakeholders to trace their data. Logging from the kernel space empowers security analysts to collect provenance from the lowest possible atomic data actions, and enables several higher-level tools to be built for effective end-to-end tracking of data provenance. Within the last few years, there has been an increasing number of proposed kernel space provenance tools but they faced several critical data security and integrity problems. Some of these prior tools' limitations include (1) the inability to provide log tamper-evidence and prevention of fake/manual entries, (2) accurate and granular timestamp synchronisation across several machines, (3) log space requirements and growth, and (4) the efficient logging of root usage of the system. Progger has resolved all these critical issues, and as such, provides high assurance of data security and data activity audit. With this in mind, the paper will discuss these elements of high-assurance cloud data provenance, describe the design of Progger and its efficiency, and present compelling results which paves the way for Progger being a foundation tool used for data activity tracking across all cloud systems.","PeriodicalId":288542,"journal":{"name":"2014 IEEE 7th International Conference on Cloud Computing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121060500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 69

A Note on Verifiable Privacy-Preserving Tries 关于可验证隐私保护尝试的说明

2014 IEEE 7th International Conference on Cloud Computing

Pub Date : 2014-06-27 DOI: 10.1109/CLOUD.2014.134

Zachary A. Kissel, Jie Wang

We describe a security flaw in the construction of the privacy-preserving trie presented in an ICC'12 paper. The flaw allows a semi-honest-but-curious cloud to forge a verifiable dictionary entry with a set of documents that do not contain the keyword in the query. We then proceed to offer a fix.

我们描述了在ICC'12论文中提出的隐私保护试验构建中的安全缺陷。该漏洞允许半诚实但好奇的云使用一组不包含查询关键字的文档伪造可验证的字典条目。然后我们继续提供修复。

引用次数: 1

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2014 IEEE 7th International Conference on Cloud Computing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀