{"title":"减少MapReduce中的失衡比例","authors":"Hsing-Lung Chen, Y. Shen","doi":"10.1109/SC2.2017.54","DOIUrl":null,"url":null,"abstract":"In order to speed up the processing, MapReduce invokes many mappers and reducers concurrently. Each mapper sends the intermediate map-outputs to reducers according to the key of data. For some big data with the property of data skew, some partitions will own a huge amounts of data. Thus, some reducers need more time to process their assigned partitions, resulting in increasing the total execution time. This paper proposes a balanced partition method to divide the intermediate map-outputs evenly. The balanced partition method has a preprocessing mapreduce (mapper1 and reducer1) by which partitioner is derived. The mapper1 is used to counting key frequencies by employing trie data structure efficiently. In reducer1, based on all the key frequencies, many sub-partitions are derived by cut-points and these sub-partitions are evenly distributed to partitions. The cut-points and the mapping table are used in every mappers of the application mapreduce for partitioning the intermediate map-outputs evenly, resulting in reducing the execution time.","PeriodicalId":188326,"journal":{"name":"2017 IEEE 7th International Symposium on Cloud and Service Computing (SC2)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Reducing Imbalance Ratio in MapReduce\",\"authors\":\"Hsing-Lung Chen, Y. Shen\",\"doi\":\"10.1109/SC2.2017.54\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In order to speed up the processing, MapReduce invokes many mappers and reducers concurrently. Each mapper sends the intermediate map-outputs to reducers according to the key of data. For some big data with the property of data skew, some partitions will own a huge amounts of data. Thus, some reducers need more time to process their assigned partitions, resulting in increasing the total execution time. This paper proposes a balanced partition method to divide the intermediate map-outputs evenly. The balanced partition method has a preprocessing mapreduce (mapper1 and reducer1) by which partitioner is derived. The mapper1 is used to counting key frequencies by employing trie data structure efficiently. In reducer1, based on all the key frequencies, many sub-partitions are derived by cut-points and these sub-partitions are evenly distributed to partitions. The cut-points and the mapping table are used in every mappers of the application mapreduce for partitioning the intermediate map-outputs evenly, resulting in reducing the execution time.\",\"PeriodicalId\":188326,\"journal\":{\"name\":\"2017 IEEE 7th International Symposium on Cloud and Service Computing (SC2)\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE 7th International Symposium on Cloud and Service Computing (SC2)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SC2.2017.54\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 7th International Symposium on Cloud and Service Computing (SC2)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SC2.2017.54","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
In order to speed up the processing, MapReduce invokes many mappers and reducers concurrently. Each mapper sends the intermediate map-outputs to reducers according to the key of data. For some big data with the property of data skew, some partitions will own a huge amounts of data. Thus, some reducers need more time to process their assigned partitions, resulting in increasing the total execution time. This paper proposes a balanced partition method to divide the intermediate map-outputs evenly. The balanced partition method has a preprocessing mapreduce (mapper1 and reducer1) by which partitioner is derived. The mapper1 is used to counting key frequencies by employing trie data structure efficiently. In reducer1, based on all the key frequencies, many sub-partitions are derived by cut-points and these sub-partitions are evenly distributed to partitions. The cut-points and the mapping table are used in every mappers of the application mapreduce for partitioning the intermediate map-outputs evenly, resulting in reducing the execution time.