A Case for Abstract Cost Models for Distributed Execution of Analytics Operators.

Big data analytics and knowledge discovery : 19th International Conference, DaWaK 2017, Lyon, France, August 28-31, 2017, Proceedings. DaWaK (Conference) (19th : 2017 : Lyon, France) Pub Date : 2017-08-01 Epub Date: 2017-08-03 DOI:10.1007/978-3-319-64283-3_11

Rundong Li, Ningfang Mi, Mirek Riedewald, Yizhou Sun, Yi Yao

{"title":"A Case for Abstract Cost Models for Distributed Execution of Analytics Operators.","authors":"Rundong Li, Ningfang Mi, Mirek Riedewald, Yizhou Sun, Yi Yao","doi":"10.1007/978-3-319-64283-3_11","DOIUrl":null,"url":null,"abstract":"<p><p>We consider data analytics workloads on distributed architectures, in particular clusters of commodity machines. To find a job partitioning that minimizes running time, a cost model, which we more accurately refer to as makespan model, is needed. In attempting to find the simplest possible, but sufficiently accurate, such model, we explore piecewise linear functions of input, output, and computational complexity. They are abstract in the sense that they capture fundamental algorithm properties, but do not require explicit modeling of system and implementation details such as the number of disk accesses. We show how the simplified functional structure can be exploited by directly integrating the model into the makespan optimization process, reducing complexity by orders of magnitude. Experimental results provide evidence of good prediction quality and successful makespan optimization across a variety of cluster architectures.</p>","PeriodicalId":92483,"journal":{"name":"Big data analytics and knowledge discovery : 19th International Conference, DaWaK 2017, Lyon, France, August 28-31, 2017, Proceedings. DaWaK (Conference) (19th : 2017 : Lyon, France)","volume":"10440 ","pages":"149-163"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/978-3-319-64283-3_11","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big data analytics and knowledge discovery : 19th International Conference, DaWaK 2017, Lyon, France, August 28-31, 2017, Proceedings. DaWaK (Conference) (19th : 2017 : Lyon, France)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/978-3-319-64283-3_11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2017/8/3 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

We consider data analytics workloads on distributed architectures, in particular clusters of commodity machines. To find a job partitioning that minimizes running time, a cost model, which we more accurately refer to as makespan model, is needed. In attempting to find the simplest possible, but sufficiently accurate, such model, we explore piecewise linear functions of input, output, and computational complexity. They are abstract in the sense that they capture fundamental algorithm properties, but do not require explicit modeling of system and implementation details such as the number of disk accesses. We show how the simplified functional structure can be exploited by directly integrating the model into the makespan optimization process, reducing complexity by orders of magnitude. Experimental results provide evidence of good prediction quality and successful makespan optimization across a variety of cluster architectures.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

分析算子分布式执行的抽象成本模型。

我们考虑分布式架构上的数据分析工作负载，特别是商用机器集群。为了找到最小化运行时间的作业分区，需要一个成本模型，我们更准确地称之为makespan模型。在试图找到最简单但足够精确的模型时，我们探索了输入、输出和计算复杂性的分段线性函数。它们是抽象的，因为它们捕获基本的算法属性，但不需要对系统和实现细节(如磁盘访问次数)进行显式建模。我们展示了如何通过将模型直接集成到makespan优化过程中来利用简化的功能结构，从而将复杂性降低了几个数量级。实验结果提供了良好的预测质量和成功的跨各种集群架构的最大完工时间优化的证据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Big data analytics and knowledge discovery : 19th International Conference, DaWaK 2017, Lyon, France, August 28-31, 2017, Proceedings. DaWaK (Conference) (19th : 2017 : Lyon, France)

自引率

0.00%

发文量