YARN Schedulers for Hadoop MapReduce Jobs: Design Goals, Issues and Taxonomy

Q3 Computer Science Recent Advances in Computer Science and Communications Pub Date : 2022-08-31 DOI:10.2174/2666255816666220831125012

Gnanendra Kotikam, S. Lokesh

引用次数: 0

Abstract

Big Data processing is a demanding task, and several big data processing frameworks have emerged during recent decades. The performance of these frameworks greatly dependent on resource management models. YARN is one of such models which acts as a resource management layer and provides computational resources for execution engines (Spark, MapReduce, storm, etc.) through its schedulers. The most important aspect of resource management is job scheduling. In this paper, we first present the design goal of YARN real-life schedulers (FIFO, Capacity, and Fair) for the MapReduce engine. Later, we discuss the scheduling issues of the Hadoop MapReduce cluster. Many efforts have been carried out in the literature to address issues of data locality, heterogeneity, straggling, skew mitigation, stragglers and fairness in Hadoop MapReduce scheduling. Lastly, we present the taxonomy of different scheduling algorithms available in the literature based on some factors like environment, scope, approach, objective and addressed issues.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Hadoop MapReduce作业的YARN调度器:设计目标、问题和分类

大数据处理是一项要求很高的任务，近几十年来出现了几种大数据处理框架。这些框架的性能在很大程度上依赖于资源管理模型。YARN就是这样一个模型，它作为一个资源管理层，通过它的调度程序为执行引擎(Spark, MapReduce, storm等)提供计算资源。资源管理最重要的方面是作业调度。在本文中，我们首先提出了MapReduce引擎的YARN现实调度程序(FIFO, Capacity和Fair)的设计目标。稍后，我们将讨论Hadoop MapReduce集群的调度问题。文献中已经进行了许多努力来解决Hadoop MapReduce调度中的数据局部性、异构性、散列、倾斜缓解、散列和公平性问题。最后，我们根据环境、范围、方法、目标和解决问题等因素对文献中不同的调度算法进行了分类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊