Optimizing Energy, Locality and Priority in a MapReduce Cluster

2015 IEEE International Conference on Autonomic Computing Pub Date : 2015-07-07 DOI:10.1109/ICAC.2015.30

Yijun Ying, R. Birke, Cheng Wang, L. Chen, N. Gautam

{"title":"Optimizing Energy, Locality and Priority in a MapReduce Cluster","authors":"Yijun Ying, R. Birke, Cheng Wang, L. Chen, N. Gautam","doi":"10.1109/ICAC.2015.30","DOIUrl":null,"url":null,"abstract":"To strike a balance between optimizing for energy versus performance in data centers is extremely tricky because the workloads are significantly different with varying constraints on performance. This issue is exacerbated with the introduction of MapReduce over and above conventional web applications. In particular, with batch versus interactive MapReduce, e.g., Spark system, data availability and locality drive performance while exhibiting different degrees of delay sensitivities. In this paper we consider an energy minimization framework (which is formulated as a concave minimization problem) with explicit modeling of (i) time variability, (ii) data locality, and (iii) delay sensitivity of web applications, batch MapReduce, and interactive MapReduce. Our objective is to maximize the usage of MapReduce servers by delaying the batch MapReduce and offering the execution to web workloads whenever capacity permits. We propose a two-step approach which first employs a controller dynamically allocating servers to the three types of workloads and secondly designs a MapReduce scheduler achieving the optimal data locality. To cater to the stochastic nature of workloads, we use a Makov Decision Process model to design the allocation algorithm at the controller and derive the structure of the optimal. The proposed locality-aware scheduler is specifically engineered to sustain the throughput during the transient overload caused by insufficient server allocation for the batch-MapReduce. We conclude by presenting simulation results from an extensive set of experiments, and these results indicate the efficacy of the methodology proposed by keeping the data center costs to a minimum while ensuring the delay constraints of workloads are met.","PeriodicalId":6643,"journal":{"name":"2015 IEEE International Conference on Autonomic Computing","volume":"1 1","pages":"21-30"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Autonomic Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAC.2015.30","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 21

Abstract

To strike a balance between optimizing for energy versus performance in data centers is extremely tricky because the workloads are significantly different with varying constraints on performance. This issue is exacerbated with the introduction of MapReduce over and above conventional web applications. In particular, with batch versus interactive MapReduce, e.g., Spark system, data availability and locality drive performance while exhibiting different degrees of delay sensitivities. In this paper we consider an energy minimization framework (which is formulated as a concave minimization problem) with explicit modeling of (i) time variability, (ii) data locality, and (iii) delay sensitivity of web applications, batch MapReduce, and interactive MapReduce. Our objective is to maximize the usage of MapReduce servers by delaying the batch MapReduce and offering the execution to web workloads whenever capacity permits. We propose a two-step approach which first employs a controller dynamically allocating servers to the three types of workloads and secondly designs a MapReduce scheduler achieving the optimal data locality. To cater to the stochastic nature of workloads, we use a Makov Decision Process model to design the allocation algorithm at the controller and derive the structure of the optimal. The proposed locality-aware scheduler is specifically engineered to sustain the throughput during the transient overload caused by insufficient server allocation for the batch-MapReduce. We conclude by presenting simulation results from an extensive set of experiments, and these results indicate the efficacy of the methodology proposed by keeping the data center costs to a minimum while ensuring the delay constraints of workloads are met.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MapReduce集群的能量、位置和优先级优化

在数据中心中，要在能源优化和性能优化之间取得平衡是非常棘手的，因为工作负载因性能限制的不同而有很大的不同。这个问题随着MapReduce在传统web应用程序之上的引入而加剧。特别是，批处理与交互式MapReduce(例如Spark系统)相比，数据可用性和局域性驱动性能表现出不同程度的延迟敏感性。在本文中，我们考虑了一个能量最小化框架(被表述为凹最小化问题)，该框架具有(i)时间可变性，(ii)数据局域性和(iii) web应用程序，批处理MapReduce和交互式MapReduce的延迟敏感性的显式建模。我们的目标是通过延迟批处理MapReduce并在容量允许的情况下为web工作负载提供执行来最大化MapReduce服务器的使用。我们提出了一个两步的方法，首先使用一个控制器动态地将服务器分配给三种类型的工作负载，然后设计一个MapReduce调度器来实现最佳的数据局部性。为了适应工作负载的随机性，我们使用Makov决策过程模型来设计控制器上的分配算法，并推导出最优分配算法的结构。所提出的位置感知调度器是专门设计用于在批处理mapreduce服务器分配不足导致的瞬态过载期间维持吞吐量的。最后，我们给出了一组广泛实验的模拟结果，这些结果表明，通过将数据中心成本降至最低，同时确保满足工作负载的延迟约束，所提出的方法是有效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2015 IEEE International Conference on Autonomic Computing

自引率

0.00%

发文量

期刊最新文献

A Control-Based Approach to Autonomic Performance Management in Computing Systems Trace Analysis for Fault Detection in Application Servers A Programming System for Autonomic Self-Managing Applications A Taxonomy for Self-∗ Properties in Decentralized Autonomic Computing Transparent Autonomization in Composite Systems