Energy proportionality and workload consolidation for latency-critical applications

Proceedings of the Sixth ACM Symposium on Cloud Computing Pub Date : 2015-08-27 DOI:10.1145/2806777.2806848

G. Prekas, Mia Primorac, A. Belay, C. Kozyrakis, Edouard Bugnion

{"title":"Energy proportionality and workload consolidation for latency-critical applications","authors":"G. Prekas, Mia Primorac, A. Belay, C. Kozyrakis, Edouard Bugnion","doi":"10.1145/2806777.2806848","DOIUrl":null,"url":null,"abstract":"Energy proportionality and workload consolidation are important objectives towards increasing efficiency in large-scale datacenters. Our work focuses on achieving these goals in the presence of applications with μs-scale tail latency requirements. Such applications represent a growing subset of datacenter workloads and are typically deployed on dedicated servers, which is the simplest way to ensure low tail latency across all loads. Unfortunately, it also leads to low energy efficiency and low resource utilization during the frequent periods of medium or low load. We present the OS mechanisms and dynamic control needed to adjust core allocation and voltage/frequency settings based on the measured delays for latency-critical workloads. This allows for energy proportionality and frees the maximum amount of resources per server for other background applications, while respecting service-level objectives. Monitoring hardware queue depths allows us to detect increases in queuing latencies. Carefully coordinated adjustments to the NIC's packet redirection table enable us to reassign flow groups between the threads of a latency-critical application in milliseconds without dropping or reordering packets. We compare the efficiency of our solution to the Pareto-optimal frontier of 224 distinct static configurations. Dynamic resource control saves 44%--54% of processor energy, which corresponds to 85%--93% of the Pareto-optimal upper bound. Dynamic resource control also allows background jobs to run at 32%--46% of their standalone throughput, which corresponds to 82%--92% of the Pareto bound.","PeriodicalId":275158,"journal":{"name":"Proceedings of the Sixth ACM Symposium on Cloud Computing","volume":"214 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"71","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Sixth ACM Symposium on Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2806777.2806848","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 71

Abstract

Energy proportionality and workload consolidation are important objectives towards increasing efficiency in large-scale datacenters. Our work focuses on achieving these goals in the presence of applications with μs-scale tail latency requirements. Such applications represent a growing subset of datacenter workloads and are typically deployed on dedicated servers, which is the simplest way to ensure low tail latency across all loads. Unfortunately, it also leads to low energy efficiency and low resource utilization during the frequent periods of medium or low load. We present the OS mechanisms and dynamic control needed to adjust core allocation and voltage/frequency settings based on the measured delays for latency-critical workloads. This allows for energy proportionality and frees the maximum amount of resources per server for other background applications, while respecting service-level objectives. Monitoring hardware queue depths allows us to detect increases in queuing latencies. Carefully coordinated adjustments to the NIC's packet redirection table enable us to reassign flow groups between the threads of a latency-critical application in milliseconds without dropping or reordering packets. We compare the efficiency of our solution to the Pareto-optimal frontier of 224 distinct static configurations. Dynamic resource control saves 44%--54% of processor energy, which corresponds to 85%--93% of the Pareto-optimal upper bound. Dynamic resource control also allows background jobs to run at 32%--46% of their standalone throughput, which corresponds to 82%--92% of the Pareto bound.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

延迟关键型应用程序的能量比例和工作负载整合

能源比例和工作负载整合是提高大型数据中心效率的重要目标。我们的工作重点是在具有μs级尾延迟需求的应用程序中实现这些目标。这类应用程序代表了数据中心工作负载的一个不断增长的子集，通常部署在专用服务器上，这是确保跨所有负载的低尾部延迟的最简单方法。然而，在频繁的中负荷或低负荷期间，这也导致能源效率低，资源利用率低。我们介绍了基于延迟关键工作负载的测量延迟来调整核心分配和电压/频率设置所需的操作系统机制和动态控制。这允许能量比例，并为其他后台应用程序释放每台服务器的最大资源量，同时尊重服务级目标。监视硬件队列深度使我们能够检测队列延迟的增加。仔细协调调整NIC的数据包重定向表，使我们能够在毫秒内重新分配延迟关键应用程序线程之间的流组，而不会丢弃或重新排序数据包。我们将我们的解决方案的效率与224种不同静态配置的帕累托最优边界进行了比较。动态资源控制可以节省44% ~ 54%的处理器能量，相当于pareto最优上界的85% ~ 93%。动态资源控制还允许后台作业以独立吞吐量的32%- 46%运行，这相当于帕累托界限的82%- 92%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the Sixth ACM Symposium on Cloud Computing

自引率

0.00%

发文量

期刊最新文献

Software-defined caching: managing caches in multi-tenant data centers Managed communication and consistency for fast data-parallel iterative analytics MemcachedGPU: scaling-up scale-out key-value stores Database high availability using SHADOW systems Proceedings of the Sixth ACM Symposium on Cloud Computing