{"title":"Apache Hadoop调度算法的性能分析","authors":"Yang Li","doi":"10.1109/CIS52066.2020.00040","DOIUrl":null,"url":null,"abstract":"Hadoop bundles the two computing resources of memory and CPU in the management resources, and then divides it into two resource models: MapSlot and ReduceSlot according to task types. MapReduce applications will have a large number of sorting operations in operation. Most of these sorts are executed iteratively, which consumes a lot of performance. Chapter 5 of this article takes this as an entry point and reorganizes the execution process of the Shuffle stage. Researched to replace quick sort with more efficient counting sorting. At the same time, the Shuffle execution is branched according to the definition of Combiner. One branch deletes the quick sort in the partition in the spill phase and the merge sort in the combine phase to reduce performance consumption. The other branch executes Combiner in advance to improve data processing efficiency. The two branches processed 21GB of log data on a 7-node PC cluster, and both achieved an efficiency improvement of about half an hour.","PeriodicalId":106959,"journal":{"name":"2020 16th International Conference on Computational Intelligence and Security (CIS)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Performance Analysis of Scheduling Algorithms in Apache Hadoop\",\"authors\":\"Yang Li\",\"doi\":\"10.1109/CIS52066.2020.00040\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hadoop bundles the two computing resources of memory and CPU in the management resources, and then divides it into two resource models: MapSlot and ReduceSlot according to task types. MapReduce applications will have a large number of sorting operations in operation. Most of these sorts are executed iteratively, which consumes a lot of performance. Chapter 5 of this article takes this as an entry point and reorganizes the execution process of the Shuffle stage. Researched to replace quick sort with more efficient counting sorting. At the same time, the Shuffle execution is branched according to the definition of Combiner. One branch deletes the quick sort in the partition in the spill phase and the merge sort in the combine phase to reduce performance consumption. The other branch executes Combiner in advance to improve data processing efficiency. The two branches processed 21GB of log data on a 7-node PC cluster, and both achieved an efficiency improvement of about half an hour.\",\"PeriodicalId\":106959,\"journal\":{\"name\":\"2020 16th International Conference on Computational Intelligence and Security (CIS)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 16th International Conference on Computational Intelligence and Security (CIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIS52066.2020.00040\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 16th International Conference on Computational Intelligence and Security (CIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIS52066.2020.00040","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Performance Analysis of Scheduling Algorithms in Apache Hadoop
Hadoop bundles the two computing resources of memory and CPU in the management resources, and then divides it into two resource models: MapSlot and ReduceSlot according to task types. MapReduce applications will have a large number of sorting operations in operation. Most of these sorts are executed iteratively, which consumes a lot of performance. Chapter 5 of this article takes this as an entry point and reorganizes the execution process of the Shuffle stage. Researched to replace quick sort with more efficient counting sorting. At the same time, the Shuffle execution is branched according to the definition of Combiner. One branch deletes the quick sort in the partition in the spill phase and the merge sort in the combine phase to reduce performance consumption. The other branch executes Combiner in advance to improve data processing efficiency. The two branches processed 21GB of log data on a 7-node PC cluster, and both achieved an efficiency improvement of about half an hour.