A Hard Real-time Scheduler for Spark on YARN

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) Pub Date : 2018-05-01 DOI:10.1109/CCGRID.2018.00096

Guolu Wang, Jungang Xu, Renfeng Liu, Shanshan Huang

{"title":"A Hard Real-time Scheduler for Spark on YARN","authors":"Guolu Wang, Jungang Xu, Renfeng Liu, Shanshan Huang","doi":"10.1109/CCGRID.2018.00096","DOIUrl":null,"url":null,"abstract":"Apache Spark is a fast and general engine for large-scale data processing using distributed memory. It provides different deploy modes to meet the needs of different users and Spark on YARN is the most popular deploy mode. Different deploy modes have different scheduling mechanisms. Spark on YARN has three different schedulers, including FIFO Scheduler, Fair Scheduler, and Capacity Scheduler. However, these three schedulers cannot fit hard real-time application scenarios. With the application of Apache Spark more widely, the needs of hard real-time scheduling will increase quickly. In this paper, we proposed a novel hard real-time scheduling algorithm called DVDA (Deadline and Value Density-Aware) in order to meet the requirements of hard real-time scheduling. Compared with traditional EDF (Earliest Deadline First) algorithm which only considers the deadline, the DVDA algorithm considers both the deadline and value density of the application. Furthermore, we implement a DVDA Scheduler for Spark on YARN based on the DVDA algorithm. Finally, the experiments are conducted to verify the effectiveness of the algorithm. Experimental results show that the proposed algorithm can increase the application completed rate by 18% and 6%, Value Income by 78% and 32% compared with default Capacity scheduler and EDF-Capacity scheduler respectively.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"131 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGRID.2018.00096","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Apache Spark is a fast and general engine for large-scale data processing using distributed memory. It provides different deploy modes to meet the needs of different users and Spark on YARN is the most popular deploy mode. Different deploy modes have different scheduling mechanisms. Spark on YARN has three different schedulers, including FIFO Scheduler, Fair Scheduler, and Capacity Scheduler. However, these three schedulers cannot fit hard real-time application scenarios. With the application of Apache Spark more widely, the needs of hard real-time scheduling will increase quickly. In this paper, we proposed a novel hard real-time scheduling algorithm called DVDA (Deadline and Value Density-Aware) in order to meet the requirements of hard real-time scheduling. Compared with traditional EDF (Earliest Deadline First) algorithm which only considers the deadline, the DVDA algorithm considers both the deadline and value density of the application. Furthermore, we implement a DVDA Scheduler for Spark on YARN based on the DVDA algorithm. Finally, the experiments are conducted to verify the effectiveness of the algorithm. Experimental results show that the proposed algorithm can increase the application completed rate by 18% and 6%, Value Income by 78% and 32% compared with default Capacity scheduler and EDF-Capacity scheduler respectively.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于YARN的Spark硬实时调度器

Apache Spark是一个使用分布式内存进行大规模数据处理的快速通用引擎。它提供了不同的部署模式，以满足不同用户的需求，其中Spark on YARN是最流行的部署模式。不同的部署模式有不同的调度机制。Spark on YARN有三种不同的调度器，包括FIFO调度器、公平调度器和容量调度器。但是，这三个调度器不适合硬实时应用程序场景。随着Apache Spark的应用越来越广泛，对硬实时调度的需求也将迅速增加。为了满足硬实时调度的要求，本文提出了一种新的硬实时调度算法DVDA (Deadline and Value Density-Aware)。与传统的EDF(最早截止日期优先)算法只考虑截止日期相比，DVDA算法同时考虑了应用程序的截止日期和值密度。此外，我们还基于DVDA算法在YARN上实现了Spark的DVDA调度程序。最后通过实验验证了算法的有效性。实验结果表明，与默认Capacity scheduler和EDF-Capacity scheduler相比，该算法的应用完成率分别提高18%和6%，Value Income分别提高78%和32%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

自引率

0.00%

发文量