{"title":"YARN Schedulers for Hadoop MapReduce Jobs: Design Goals, Issues and Taxonomy","authors":"Gnanendra Kotikam, S. Lokesh","doi":"10.2174/2666255816666220831125012","DOIUrl":null,"url":null,"abstract":"\n\nBig Data processing is a demanding task, and several big data processing frameworks have emerged during recent decades. The performance of these frameworks greatly dependent on resource management models.\n\n\n\nYARN is one of such models which acts as a resource management layer and provides computational resources for execution engines (Spark, MapReduce, storm, etc.) through its schedulers. The most important aspect of resource management is job scheduling.\n\n\n\nIn this paper, we first present the design goal of YARN real-life schedulers (FIFO, Capacity, and Fair) for the MapReduce engine. Later, we discuss the scheduling issues of the Hadoop MapReduce cluster.\n\n\n\nMany efforts have been carried out in the literature to address issues of data locality, heterogeneity, straggling, skew mitigation, stragglers and fairness in Hadoop MapReduce scheduling. Lastly, we present the taxonomy of different scheduling algorithms available in the literature based on some factors like environment, scope, approach, objective and addressed issues.\n","PeriodicalId":36514,"journal":{"name":"Recent Advances in Computer Science and Communications","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Recent Advances in Computer Science and Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2174/2666255816666220831125012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 0
Abstract
Big Data processing is a demanding task, and several big data processing frameworks have emerged during recent decades. The performance of these frameworks greatly dependent on resource management models.
YARN is one of such models which acts as a resource management layer and provides computational resources for execution engines (Spark, MapReduce, storm, etc.) through its schedulers. The most important aspect of resource management is job scheduling.
In this paper, we first present the design goal of YARN real-life schedulers (FIFO, Capacity, and Fair) for the MapReduce engine. Later, we discuss the scheduling issues of the Hadoop MapReduce cluster.
Many efforts have been carried out in the literature to address issues of data locality, heterogeneity, straggling, skew mitigation, stragglers and fairness in Hadoop MapReduce scheduling. Lastly, we present the taxonomy of different scheduling algorithms available in the literature based on some factors like environment, scope, approach, objective and addressed issues.