Lukewarm serverless functions: characterization and optimization

Proceedings of the 49th Annual International Symposium on Computer Architecture Pub Date : 2022-06-11 DOI:10.1145/3470496.3527390

David Schall, Artemiy Margaritov, Dmitrii Ustiugov, Andreas Sandberg, Boris Grot

{"title":"Lukewarm serverless functions: characterization and optimization","authors":"David Schall, Artemiy Margaritov, Dmitrii Ustiugov, Andreas Sandberg, Boris Grot","doi":"10.1145/3470496.3527390","DOIUrl":null,"url":null,"abstract":"Serverless computing has emerged as a widely-used paradigm for running services in the cloud. In serverless, developers organize their applications as a set of functions, which are invoked on-demand in response to events, such as an HTTP request. To avoid long start-up delays of launching a new function instance, cloud providers tend to keep recently-triggered instances idle (or warm) for some time after the most recent invocation in anticipation of future invocations. Thus, at any given moment on a server, there may be thousands of warm instances of various functions whose executions are interleaved in time based on incoming invocations. This paper observes that (1) there is a high degree of interleaving among warm instances on a given server; (2) the individual warm functions are invoked relatively infrequently, often at the granularity of seconds or minutes; and (3) many function invocations complete within a few milliseconds. Interleaved execution of rarely invoked functions on a server leads to thrashing of each function's microarchitectural state between invocations. Meanwhile, the short execution time of a function impedes amortization of the warm-up latency of the cache hierarchy, causing a 31--114% increase in CPI compared to execution with warm microarchitectural state. We identify on-chip misses for instructions as a major contributor to the performance loss. In response we propose Jukebox, a record-and-replay instruction prefetcher specifically designed for reducing the start-up latency of warm function instances. Jukebox requires just 32KB of metadata per function instance and boosts performance by an average of 18.7% for a wide range of functions, which translates into a corresponding throughput improvement.","PeriodicalId":337932,"journal":{"name":"Proceedings of the 49th Annual International Symposium on Computer Architecture","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 49th Annual International Symposium on Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3470496.3527390","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

Serverless computing has emerged as a widely-used paradigm for running services in the cloud. In serverless, developers organize their applications as a set of functions, which are invoked on-demand in response to events, such as an HTTP request. To avoid long start-up delays of launching a new function instance, cloud providers tend to keep recently-triggered instances idle (or warm) for some time after the most recent invocation in anticipation of future invocations. Thus, at any given moment on a server, there may be thousands of warm instances of various functions whose executions are interleaved in time based on incoming invocations. This paper observes that (1) there is a high degree of interleaving among warm instances on a given server; (2) the individual warm functions are invoked relatively infrequently, often at the granularity of seconds or minutes; and (3) many function invocations complete within a few milliseconds. Interleaved execution of rarely invoked functions on a server leads to thrashing of each function's microarchitectural state between invocations. Meanwhile, the short execution time of a function impedes amortization of the warm-up latency of the cache hierarchy, causing a 31--114% increase in CPI compared to execution with warm microarchitectural state. We identify on-chip misses for instructions as a major contributor to the performance loss. In response we propose Jukebox, a record-and-replay instruction prefetcher specifically designed for reducing the start-up latency of warm function instances. Jukebox requires just 32KB of metadata per function instance and boosts performance by an average of 18.7% for a wide range of functions, which translates into a corresponding throughput improvement.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

无服务器功能:表征和优化

无服务器计算已经成为在云中运行服务的一种广泛使用的范例。在无服务器中，开发人员将其应用程序组织为一组函数，这些函数在响应事件(如HTTP请求)时按需调用。为了避免启动新功能实例的长时间启动延迟，云提供商倾向于在最近的调用之后将最近触发的实例保持空闲(或热)一段时间，以预测未来的调用。因此，在服务器上的任何给定时刻，可能有数千个各种函数的热实例，它们的执行根据传入调用在时间上交错进行。本文观察到:(1)给定服务器上的热实例之间存在高度交错;(2)单个暖函数调用频率相对较低，通常以秒或分为粒度;(3)许多函数调用在几毫秒内完成。在服务器上交错执行很少调用的函数会导致调用之间每个函数的微体系结构状态的波动。与此同时，函数的短执行时间阻碍了缓存层次结构的预热延迟的摊销，导致CPI比在热微架构状态下的执行增加了31- 114%。我们认为芯片上的指令缺失是造成性能损失的主要原因。作为回应，我们提出了Jukebox，这是一个专门为减少热函数实例的启动延迟而设计的记录和重播指令预取器。Jukebox每个函数实例只需要32KB的元数据，对于各种功能，它的性能平均提高了18.7%，这转化为相应的吞吐量提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 49th Annual International Symposium on Computer Architecture

自引率

0.00%

发文量