Rundong Li, Ningfang Mi, Mirek Riedewald, Yizhou Sun, Yi Yao
{"title":"A Case for Abstract Cost Models for Distributed Execution of Analytics Operators.","authors":"Rundong Li, Ningfang Mi, Mirek Riedewald, Yizhou Sun, Yi Yao","doi":"10.1007/978-3-319-64283-3_11","DOIUrl":null,"url":null,"abstract":"<p><p>We consider data analytics workloads on distributed architectures, in particular clusters of commodity machines. To find a job partitioning that minimizes running time, a cost model, which we more accurately refer to as makespan model, is needed. In attempting to find the simplest possible, but sufficiently accurate, such model, we explore piecewise linear functions of input, output, and computational complexity. They are abstract in the sense that they capture fundamental algorithm properties, but do not require explicit modeling of system and implementation details such as the number of disk accesses. We show how the simplified functional structure can be exploited by directly integrating the model into the makespan optimization process, reducing complexity by orders of magnitude. Experimental results provide evidence of good prediction quality and successful makespan optimization across a variety of cluster architectures.</p>","PeriodicalId":92483,"journal":{"name":"Big data analytics and knowledge discovery : 19th International Conference, DaWaK 2017, Lyon, France, August 28-31, 2017, Proceedings. DaWaK (Conference) (19th : 2017 : Lyon, France)","volume":"10440 ","pages":"149-163"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/978-3-319-64283-3_11","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big data analytics and knowledge discovery : 19th International Conference, DaWaK 2017, Lyon, France, August 28-31, 2017, Proceedings. DaWaK (Conference) (19th : 2017 : Lyon, France)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/978-3-319-64283-3_11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2017/8/3 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
We consider data analytics workloads on distributed architectures, in particular clusters of commodity machines. To find a job partitioning that minimizes running time, a cost model, which we more accurately refer to as makespan model, is needed. In attempting to find the simplest possible, but sufficiently accurate, such model, we explore piecewise linear functions of input, output, and computational complexity. They are abstract in the sense that they capture fundamental algorithm properties, but do not require explicit modeling of system and implementation details such as the number of disk accesses. We show how the simplified functional structure can be exploited by directly integrating the model into the makespan optimization process, reducing complexity by orders of magnitude. Experimental results provide evidence of good prediction quality and successful makespan optimization across a variety of cluster architectures.