{"title":"基于Hadoop缓存局部性和数据局部性的改进任务调度算法","authors":"P. Zhang, Chunlin Li, Yahui Zhao","doi":"10.1109/PDCAT.2016.060","DOIUrl":null,"url":null,"abstract":"The optimization of task scheduling in Hadoop environment is an important research topic. The result of task scheduling affects the system performance and resource utilization. The existing task scheduling algorithm is lack of consideration at the cache level, which makes the performance of the task greatly affected. Therefore, this paper proposes an improved task scheduling algorithm based on cache locality and data locality. Firstly section matrix and weighted bipartite graph are constructed according to the relation between resources and tasks. Then the bipartite graph matching is used to realize map task scheduling for optimizing the local cache and data locality and reducing the data transmission amount during task execution process. The experimental results show that the proposed algorithm can effectively improve the data locality and system performance, which is better than other two algorithms.","PeriodicalId":203925,"journal":{"name":"2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"An Improved Task Scheduling Algorithm Based on Cache Locality and Data Locality in Hadoop\",\"authors\":\"P. Zhang, Chunlin Li, Yahui Zhao\",\"doi\":\"10.1109/PDCAT.2016.060\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The optimization of task scheduling in Hadoop environment is an important research topic. The result of task scheduling affects the system performance and resource utilization. The existing task scheduling algorithm is lack of consideration at the cache level, which makes the performance of the task greatly affected. Therefore, this paper proposes an improved task scheduling algorithm based on cache locality and data locality. Firstly section matrix and weighted bipartite graph are constructed according to the relation between resources and tasks. Then the bipartite graph matching is used to realize map task scheduling for optimizing the local cache and data locality and reducing the data transmission amount during task execution process. The experimental results show that the proposed algorithm can effectively improve the data locality and system performance, which is better than other two algorithms.\",\"PeriodicalId\":203925,\"journal\":{\"name\":\"2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDCAT.2016.060\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDCAT.2016.060","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Improved Task Scheduling Algorithm Based on Cache Locality and Data Locality in Hadoop
The optimization of task scheduling in Hadoop environment is an important research topic. The result of task scheduling affects the system performance and resource utilization. The existing task scheduling algorithm is lack of consideration at the cache level, which makes the performance of the task greatly affected. Therefore, this paper proposes an improved task scheduling algorithm based on cache locality and data locality. Firstly section matrix and weighted bipartite graph are constructed according to the relation between resources and tasks. Then the bipartite graph matching is used to realize map task scheduling for optimizing the local cache and data locality and reducing the data transmission amount during task execution process. The experimental results show that the proposed algorithm can effectively improve the data locality and system performance, which is better than other two algorithms.