Self-adaptive Executors for Big Data Processing

Proceedings of the 20th International Middleware Conference Pub Date : 2019-12-09 DOI:10.1145/3361525.3361545

Sobhan Omranian Khorasani, Jan S. Rellermeyer, D. Epema

{"title":"Self-adaptive Executors for Big Data Processing","authors":"Sobhan Omranian Khorasani, Jan S. Rellermeyer, D. Epema","doi":"10.1145/3361525.3361545","DOIUrl":null,"url":null,"abstract":"The demand for additional performance due to the rapid increase in the size and importance of data-intensive applications has considerably elevated the complexity of computer architecture. In response, systems offer pre-determined behaviors based on heuristics and then expose a large number of configuration parameters for operators to adjust them to their particular infrastructure. Unfortunately, in practice this leads to a substantial manual tuning effort. In this work, we focus on one of the most impactful tuning decisions in big data systems: the number of executor threads. We first show the impact of I/O contention on the runtime of workloads and a simple static solution to reduce the number of threads for I/O-bound phases. We then present a more elaborate solution in the form of self-adaptive executors which are able to continuously monitor the underlying system resources and detect contentions. This enables the executors to tune their thread pool size dynamically at runtime in order to achieve the best performance. Our experimental results show that being adaptive can significantly reduce the execution time especially in I/O intensive applications such as Terasort and PageRank which see a 34% and 54% reduction in runtime.","PeriodicalId":381253,"journal":{"name":"Proceedings of the 20th International Middleware Conference","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 20th International Middleware Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3361525.3361545","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

The demand for additional performance due to the rapid increase in the size and importance of data-intensive applications has considerably elevated the complexity of computer architecture. In response, systems offer pre-determined behaviors based on heuristics and then expose a large number of configuration parameters for operators to adjust them to their particular infrastructure. Unfortunately, in practice this leads to a substantial manual tuning effort. In this work, we focus on one of the most impactful tuning decisions in big data systems: the number of executor threads. We first show the impact of I/O contention on the runtime of workloads and a simple static solution to reduce the number of threads for I/O-bound phases. We then present a more elaborate solution in the form of self-adaptive executors which are able to continuously monitor the underlying system resources and detect contentions. This enables the executors to tune their thread pool size dynamically at runtime in order to achieve the best performance. Our experimental results show that being adaptive can significantly reduce the execution time especially in I/O intensive applications such as Terasort and PageRank which see a 34% and 54% reduction in runtime.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

大数据处理的自适应执行器

由于数据密集型应用程序的规模和重要性的迅速增加，对额外性能的需求大大提高了计算机体系结构的复杂性。作为回应，系统提供基于启发式的预先确定的行为，然后为运营商提供大量配置参数，以便根据其特定的基础设施进行调整。不幸的是，在实践中，这会导致大量的手动调优工作。在这项工作中，我们专注于大数据系统中最具影响力的调优决策之一:执行线程的数量。我们首先展示I/O争用对工作负载运行时的影响，以及一个简单的静态解决方案，以减少I/O绑定阶段的线程数量。然后，我们以自适应执行器的形式提出了一个更复杂的解决方案，它能够持续监控底层系统资源并检测争用。这使执行程序能够在运行时动态地调整线程池大小，以实现最佳性能。我们的实验结果表明，自适应可以显著减少执行时间，特别是在I/O密集型应用程序中，如Terasort和PageRank，它们的运行时间分别减少了34%和54%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 20th International Middleware Conference

自引率

0.00%

发文量

期刊最新文献

OS-Augmented Oversubscription of Opportunistic Memory with a User-Assisted OOM Killer Medley: A Novel Distributed Failure Detector for IoT Networks AccTEE FabricCRDT: A Conflict-Free Replicated Datatypes Approach to Permissioned Blockchains Combining it all: Cost minimal and low-latency stream processing across distributed heterogeneous infrastructures