Optimizing Energy, Locality and Priority in a MapReduce Cluster

Yijun Ying, R. Birke, Cheng Wang, L. Chen, N. Gautam
{"title":"Optimizing Energy, Locality and Priority in a MapReduce Cluster","authors":"Yijun Ying, R. Birke, Cheng Wang, L. Chen, N. Gautam","doi":"10.1109/ICAC.2015.30","DOIUrl":null,"url":null,"abstract":"To strike a balance between optimizing for energy versus performance in data centers is extremely tricky because the workloads are significantly different with varying constraints on performance. This issue is exacerbated with the introduction of MapReduce over and above conventional web applications. In particular, with batch versus interactive MapReduce, e.g., Spark system, data availability and locality drive performance while exhibiting different degrees of delay sensitivities. In this paper we consider an energy minimization framework (which is formulated as a concave minimization problem) with explicit modeling of (i) time variability, (ii) data locality, and (iii) delay sensitivity of web applications, batch MapReduce, and interactive MapReduce. Our objective is to maximize the usage of MapReduce servers by delaying the batch MapReduce and offering the execution to web workloads whenever capacity permits. We propose a two-step approach which first employs a controller dynamically allocating servers to the three types of workloads and secondly designs a MapReduce scheduler achieving the optimal data locality. To cater to the stochastic nature of workloads, we use a Makov Decision Process model to design the allocation algorithm at the controller and derive the structure of the optimal. The proposed locality-aware scheduler is specifically engineered to sustain the throughput during the transient overload caused by insufficient server allocation for the batch-MapReduce. We conclude by presenting simulation results from an extensive set of experiments, and these results indicate the efficacy of the methodology proposed by keeping the data center costs to a minimum while ensuring the delay constraints of workloads are met.","PeriodicalId":6643,"journal":{"name":"2015 IEEE International Conference on Autonomic Computing","volume":"1 1","pages":"21-30"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Autonomic Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAC.2015.30","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21

Abstract

To strike a balance between optimizing for energy versus performance in data centers is extremely tricky because the workloads are significantly different with varying constraints on performance. This issue is exacerbated with the introduction of MapReduce over and above conventional web applications. In particular, with batch versus interactive MapReduce, e.g., Spark system, data availability and locality drive performance while exhibiting different degrees of delay sensitivities. In this paper we consider an energy minimization framework (which is formulated as a concave minimization problem) with explicit modeling of (i) time variability, (ii) data locality, and (iii) delay sensitivity of web applications, batch MapReduce, and interactive MapReduce. Our objective is to maximize the usage of MapReduce servers by delaying the batch MapReduce and offering the execution to web workloads whenever capacity permits. We propose a two-step approach which first employs a controller dynamically allocating servers to the three types of workloads and secondly designs a MapReduce scheduler achieving the optimal data locality. To cater to the stochastic nature of workloads, we use a Makov Decision Process model to design the allocation algorithm at the controller and derive the structure of the optimal. The proposed locality-aware scheduler is specifically engineered to sustain the throughput during the transient overload caused by insufficient server allocation for the batch-MapReduce. We conclude by presenting simulation results from an extensive set of experiments, and these results indicate the efficacy of the methodology proposed by keeping the data center costs to a minimum while ensuring the delay constraints of workloads are met.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
MapReduce集群的能量、位置和优先级优化
在数据中心中,要在能源优化和性能优化之间取得平衡是非常棘手的,因为工作负载因性能限制的不同而有很大的不同。这个问题随着MapReduce在传统web应用程序之上的引入而加剧。特别是,批处理与交互式MapReduce(例如Spark系统)相比,数据可用性和局域性驱动性能表现出不同程度的延迟敏感性。在本文中,我们考虑了一个能量最小化框架(被表述为凹最小化问题),该框架具有(i)时间可变性,(ii)数据局域性和(iii) web应用程序,批处理MapReduce和交互式MapReduce的延迟敏感性的显式建模。我们的目标是通过延迟批处理MapReduce并在容量允许的情况下为web工作负载提供执行来最大化MapReduce服务器的使用。我们提出了一个两步的方法,首先使用一个控制器动态地将服务器分配给三种类型的工作负载,然后设计一个MapReduce调度器来实现最佳的数据局部性。为了适应工作负载的随机性,我们使用Makov决策过程模型来设计控制器上的分配算法,并推导出最优分配算法的结构。所提出的位置感知调度器是专门设计用于在批处理mapreduce服务器分配不足导致的瞬态过载期间维持吞吐量的。最后,我们给出了一组广泛实验的模拟结果,这些结果表明,通过将数据中心成本降至最低,同时确保满足工作负载的延迟约束,所提出的方法是有效的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Control-Based Approach to Autonomic Performance Management in Computing Systems Trace Analysis for Fault Detection in Application Servers A Programming System for Autonomic Self-Managing Applications A Taxonomy for Self-∗ Properties in Decentralized Autonomic Computing Transparent Autonomization in Composite Systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1