Flint:在瞬态服务器上进行批量交互的数据密集型处理

Proceedings of the Eleventh European Conference on Computer Systems Pub Date : 2016-04-18 DOI:10.1145/2901318.2901319

Prateek Sharma, Tian Guo, Xin He, David E. Irwin, P. Shenoy

{"title":"Flint:在瞬态服务器上进行批量交互的数据密集型处理","authors":"Prateek Sharma, Tian Guo, Xin He, David E. Irwin, P. Shenoy","doi":"10.1145/2901318.2901319","DOIUrl":null,"url":null,"abstract":"Cloud providers now offer transient servers, which they may revoke at anytime, for significantly lower prices than on-demand servers, which they cannot revoke. The low price of transient servers is particularly attractive for executing an emerging class of workload, which we call Batch-Interactive Data-Intensive (BIDI), that is becoming increasingly important for data analytics. BIDI workloads require large sets of servers to cache massive datasets in memory to enable low latency operation. In this paper, we illustrate the challenges of executing BIDI workloads on transient servers, where revocations (akin to failures) are the common case. To address these challenges, we design Flint, which is based on Spark and includes automated checkpointing and server selection policies that i) support batch and interactive applications and ii) dynamically adapt to application characteristics. We evaluate a prototype of Flint using EC2 spot instances, and show that it yields cost savings of up to 90% compared to using on-demand servers, while increasing running time by < 2%.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"139 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"88","resultStr":"{\"title\":\"Flint: batch-interactive data-intensive processing on transient servers\",\"authors\":\"Prateek Sharma, Tian Guo, Xin He, David E. Irwin, P. Shenoy\",\"doi\":\"10.1145/2901318.2901319\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cloud providers now offer transient servers, which they may revoke at anytime, for significantly lower prices than on-demand servers, which they cannot revoke. The low price of transient servers is particularly attractive for executing an emerging class of workload, which we call Batch-Interactive Data-Intensive (BIDI), that is becoming increasingly important for data analytics. BIDI workloads require large sets of servers to cache massive datasets in memory to enable low latency operation. In this paper, we illustrate the challenges of executing BIDI workloads on transient servers, where revocations (akin to failures) are the common case. To address these challenges, we design Flint, which is based on Spark and includes automated checkpointing and server selection policies that i) support batch and interactive applications and ii) dynamically adapt to application characteristics. We evaluate a prototype of Flint using EC2 spot instances, and show that it yields cost savings of up to 90% compared to using on-demand servers, while increasing running time by < 2%.\",\"PeriodicalId\":20737,\"journal\":{\"name\":\"Proceedings of the Eleventh European Conference on Computer Systems\",\"volume\":\"139 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-04-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"88\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Eleventh European Conference on Computer Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2901318.2901319\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Eleventh European Conference on Computer Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2901318.2901319","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 88

摘要

云提供商现在提供临时服务器，他们可以随时撤销，价格远低于他们不能撤销的按需服务器。临时服务器的低价格对于执行一类新兴的工作负载特别有吸引力，我们称之为批处理交互式数据密集型工作负载(BIDI)，它对数据分析变得越来越重要。BIDI工作负载需要大量服务器在内存中缓存大量数据集，以实现低延迟操作。在本文中，我们将说明在临时服务器上执行BIDI工作负载的挑战，其中撤销(类似于故障)是常见的情况。为了应对这些挑战，我们设计了Flint，它基于Spark，包括自动检查点和服务器选择策略，这些策略i)支持批处理和交互式应用程序，ii)动态适应应用程序的特征。我们使用EC2现场实例评估了Flint的原型，结果表明，与使用按需服务器相比，它可以节省高达90%的成本，而运行时间却增加了不到2%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Flint: batch-interactive data-intensive processing on transient servers

Cloud providers now offer transient servers, which they may revoke at anytime, for significantly lower prices than on-demand servers, which they cannot revoke. The low price of transient servers is particularly attractive for executing an emerging class of workload, which we call Batch-Interactive Data-Intensive (BIDI), that is becoming increasingly important for data analytics. BIDI workloads require large sets of servers to cache massive datasets in memory to enable low latency operation. In this paper, we illustrate the challenges of executing BIDI workloads on transient servers, where revocations (akin to failures) are the common case. To address these challenges, we design Flint, which is based on Spark and includes automated checkpointing and server selection policies that i) support batch and interactive applications and ii) dynamically adapt to application characteristics. We evaluate a prototype of Flint using EC2 spot instances, and show that it yields cost savings of up to 90% compared to using on-demand servers, while increasing running time by < 2%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Eleventh European Conference on Computer Systems

自引率

0.00%

发文量