Prometheus: online estimation of optimal memory demands for workers in in-memory distributed computation

Guoyao Xu, Chengzhong Xu
{"title":"Prometheus: online estimation of optimal memory demands for workers in in-memory distributed computation","authors":"Guoyao Xu, Chengzhong Xu","doi":"10.1145/3127479.3132689","DOIUrl":null,"url":null,"abstract":"Modern in-memory distributed computation frameworks like Spark adequately leverage memory resources to cache intermediate data across multi-stage tasks in pre-allocated worker processes, so as to speedup executions. They rely on a cluster resource manager like Yarn or Mesos to pre-reserve specific amount of CPU and memory for workers ahead of task scheduling. Since a worker is executed for an entire application and runs multiple batches of DAG tasks from multi-stages, its memory demands change over time [3]. Resource managers like Yarn solve the non-trivial allocation problem of determining right amounts of memory provision for workers by requiring users to make explicit reservations before execution. Since the underlying execution frameworks, workload and complex codebases are invisible, users tend to over-estimate or under-estimate workers' demands, leading to over-provisioning or under-provisioning of memory resources. We observed there exists a performance inflection point with respect to memory reservation per stage of applications. After that, performance fluctuates little even under over-provisioned memory [1]. It is the minimum required memory to achieve expected nearly optimal performance. We call these capacities as optimal demands. They are capacity cut lines to divide over-provisioning and under-provisioning. To relieve the burden of users, and provide guarantees over both maximum cluster memory utilization and optimal application performance, we present a system namely Prometheus for online estimation of optimal memory demand for workers per future stage, without involving users' efforts. The procedure to explore optimal demands is essentially a search problem correlated memory reservation and performance. Most existing searching methods [2] need multiple profiling runs or prior historical execution statistics, which are not applicable to online configuration of newly submitted or non-recurring jobs. The recurring applications' optimal demands also change over time under variations of input datasets, algorithmic parameters or source code. It becomes too expensive and infeasible to rebuild new search model for every setting. Prometheus adopts a two-step approach to tackle the problem: 1) For newly submitted or non-recurring jobs, we do profiling and histogram frequency analysis of job's runtime memory footprints from only one pilot run under over-provisioned memory. It achieves a highly accurate (over 80% accuracy) initial estimation of optimal demands per stage for each worker. By analyzing frequency of past memory usages per sampling time, we efficiently estimate probability of base demands and distinguish them from unnecessarily excessive usages. Allocation of base demands tends to achieve near-optimal performance, so as to approach optimal demands. 2) Histogram frequency analysis algorithm has an intrinsic property of self-decay. For subsequent recurring submissions, Prometheus exploits this property to efficiently perform a recursive search. It obtains stepwise refinement and rapidly reaches optimal demands through few recurring executions. We demonstrate this recursive search reaches up to 3--4 times lower searching overheads and 2--4 times more accuracy compared with alternative solutions like random search. We validate the design by implementing Prometheus atop of Spark and Yarn. The experimental results show that it achieves an ultimate accuracy of more than 92%. By deploying Prometheus and reserving memory according to the optimal demands, one could improve cluster memory utilization by about 40%. It simultaneously reduces individual application execution time by over 35% comparing to the state-of-the-art approaches. Overall, the optimal memory demands knowledge provided by Prometheus enables cluster managers to effectively avoid over-provisioning or under-provisioning of memory resources, and achieve optimal application performance and maximum resource efficiency.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"110 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 Symposium on Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3127479.3132689","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

Modern in-memory distributed computation frameworks like Spark adequately leverage memory resources to cache intermediate data across multi-stage tasks in pre-allocated worker processes, so as to speedup executions. They rely on a cluster resource manager like Yarn or Mesos to pre-reserve specific amount of CPU and memory for workers ahead of task scheduling. Since a worker is executed for an entire application and runs multiple batches of DAG tasks from multi-stages, its memory demands change over time [3]. Resource managers like Yarn solve the non-trivial allocation problem of determining right amounts of memory provision for workers by requiring users to make explicit reservations before execution. Since the underlying execution frameworks, workload and complex codebases are invisible, users tend to over-estimate or under-estimate workers' demands, leading to over-provisioning or under-provisioning of memory resources. We observed there exists a performance inflection point with respect to memory reservation per stage of applications. After that, performance fluctuates little even under over-provisioned memory [1]. It is the minimum required memory to achieve expected nearly optimal performance. We call these capacities as optimal demands. They are capacity cut lines to divide over-provisioning and under-provisioning. To relieve the burden of users, and provide guarantees over both maximum cluster memory utilization and optimal application performance, we present a system namely Prometheus for online estimation of optimal memory demand for workers per future stage, without involving users' efforts. The procedure to explore optimal demands is essentially a search problem correlated memory reservation and performance. Most existing searching methods [2] need multiple profiling runs or prior historical execution statistics, which are not applicable to online configuration of newly submitted or non-recurring jobs. The recurring applications' optimal demands also change over time under variations of input datasets, algorithmic parameters or source code. It becomes too expensive and infeasible to rebuild new search model for every setting. Prometheus adopts a two-step approach to tackle the problem: 1) For newly submitted or non-recurring jobs, we do profiling and histogram frequency analysis of job's runtime memory footprints from only one pilot run under over-provisioned memory. It achieves a highly accurate (over 80% accuracy) initial estimation of optimal demands per stage for each worker. By analyzing frequency of past memory usages per sampling time, we efficiently estimate probability of base demands and distinguish them from unnecessarily excessive usages. Allocation of base demands tends to achieve near-optimal performance, so as to approach optimal demands. 2) Histogram frequency analysis algorithm has an intrinsic property of self-decay. For subsequent recurring submissions, Prometheus exploits this property to efficiently perform a recursive search. It obtains stepwise refinement and rapidly reaches optimal demands through few recurring executions. We demonstrate this recursive search reaches up to 3--4 times lower searching overheads and 2--4 times more accuracy compared with alternative solutions like random search. We validate the design by implementing Prometheus atop of Spark and Yarn. The experimental results show that it achieves an ultimate accuracy of more than 92%. By deploying Prometheus and reserving memory according to the optimal demands, one could improve cluster memory utilization by about 40%. It simultaneously reduces individual application execution time by over 35% comparing to the state-of-the-art approaches. Overall, the optimal memory demands knowledge provided by Prometheus enables cluster managers to effectively avoid over-provisioning or under-provisioning of memory resources, and achieve optimal application performance and maximum resource efficiency.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Prometheus:在线估计内存分布式计算中工作人员的最佳内存需求
像Spark这样的现代内存分布式计算框架充分利用内存资源,在预分配的工作进程中缓存跨多阶段任务的中间数据,从而加快执行速度。它们依赖于像Yarn或Mesos这样的集群资源管理器,在任务调度之前为工人预先预留特定数量的CPU和内存。由于工作线程是为整个应用程序执行的,并且在多个阶段运行多批DAG任务,因此其内存需求会随着时间的推移而变化[3]。像Yarn这样的资源管理器通过要求用户在执行前进行显式的预留,解决了为worker确定适当内存供应的分配问题。由于底层的执行框架、工作负载和复杂的代码库是不可见的,用户倾向于高估或低估工人的需求,从而导致内存资源的过度供应或不足供应。我们观察到,在应用程序的每个阶段的内存保留方面存在一个性能拐点。在此之后,即使在内存过度配置的情况下,性能波动也很小[1]。这是实现预期的近乎最佳性能所需的最小内存。我们把这些能力称为最优需求。它们是划分供应过剩和供应不足的容量分割线。为了减轻用户的负担,并提供最大集群内存利用率和最佳应用程序性能的保证,我们提出了一个名为Prometheus的系统,用于在线估计每个未来阶段工作人员的最佳内存需求,而不涉及用户的努力。探索最优需求的过程本质上是一个与内存保留和性能相关的搜索问题。大多数现有的搜索方法[2]需要多次分析运行或先前的历史执行统计数据,这不适用于新提交或非循环作业的在线配置。在输入数据集、算法参数或源代码的变化下,反复出现的应用程序的最佳需求也会随着时间的推移而变化。为每个设置重新构建新的搜索模型变得过于昂贵和不可行的。Prometheus采用两步方法来解决这个问题:1)对于新提交的或非重复出现的作业,我们对作业的运行时内存占用进行分析和直方图频率分析,这些作业只在内存过度供应的情况下进行一次试验运行。它实现了对每个工人每个阶段的最佳需求的高度精确(超过80%的精度)的初始估计。通过分析每个采样时间过去内存使用的频率,我们有效地估计基本需求的概率,并将它们与不必要的过度使用区分开来。基本需求的分配趋向于达到接近最优的性能,从而接近最优需求。2)直方图频率分析算法具有固有的自衰减特性。对于后续的重复提交,Prometheus利用此属性有效地执行递归搜索。它通过少量的重复执行,逐步细化并迅速达到最优需求。我们证明,与随机搜索等替代解决方案相比,这种递归搜索的搜索开销降低了3- 4倍,准确率提高了2- 4倍。我们通过在Spark和Yarn之上实现Prometheus来验证设计。实验结果表明,该方法的最终准确率达到92%以上。通过部署Prometheus并根据最佳需求保留内存,可以将集群内存利用率提高约40%。与最先进的方法相比,它同时将单个应用程序的执行时间减少了35%以上。总的来说,Prometheus提供的最优内存需求知识使集群管理器能够有效地避免内存资源的过量供应或不足供应,实现最优的应用程序性能和最大的资源效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Janus: supporting heterogeneous power management in virtualized environments On-demand virtualization for live migration in bare metal cloud Preserving I/O prioritization in virtualized OSes To edge or not to edge? Indy: a software system for the dense cloud
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1