{"title":"An Improved Task Scheduling Algorithm Based on Cache Locality and Data Locality in Hadoop","authors":"P. Zhang, Chunlin Li, Yahui Zhao","doi":"10.1109/PDCAT.2016.060","DOIUrl":null,"url":null,"abstract":"The optimization of task scheduling in Hadoop environment is an important research topic. The result of task scheduling affects the system performance and resource utilization. The existing task scheduling algorithm is lack of consideration at the cache level, which makes the performance of the task greatly affected. Therefore, this paper proposes an improved task scheduling algorithm based on cache locality and data locality. Firstly section matrix and weighted bipartite graph are constructed according to the relation between resources and tasks. Then the bipartite graph matching is used to realize map task scheduling for optimizing the local cache and data locality and reducing the data transmission amount during task execution process. The experimental results show that the proposed algorithm can effectively improve the data locality and system performance, which is better than other two algorithms.","PeriodicalId":203925,"journal":{"name":"2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDCAT.2016.060","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
The optimization of task scheduling in Hadoop environment is an important research topic. The result of task scheduling affects the system performance and resource utilization. The existing task scheduling algorithm is lack of consideration at the cache level, which makes the performance of the task greatly affected. Therefore, this paper proposes an improved task scheduling algorithm based on cache locality and data locality. Firstly section matrix and weighted bipartite graph are constructed according to the relation between resources and tasks. Then the bipartite graph matching is used to realize map task scheduling for optimizing the local cache and data locality and reducing the data transmission amount during task execution process. The experimental results show that the proposed algorithm can effectively improve the data locality and system performance, which is better than other two algorithms.