Communication optimisation for intermediate data of MapReduce computing model

Yunpeng Cao, Haifeng Wang
{"title":"Communication optimisation for intermediate data of MapReduce computing model","authors":"Yunpeng Cao, Haifeng Wang","doi":"10.1504/ijcse.2020.10027428","DOIUrl":null,"url":null,"abstract":"MapReduce is a typical computing model for processing and analysis of big data. MapReduce computing job produces a large amount of intermediate data after map phase. Massive intermediate data results in a large amount of intermediate data communication across rack switches in the Shuffle process of MapReduce computing model, this degrades the performance of heterogeneous cluster computing. In order to optimise the intermediate data communication performance of map-intensive jobs, the characteristics of pre-running scheduling information of MapReduce computing jobs are extracted, and job classification is realised by machine learning. The jobs of active intermediate data communication are mapped into a rack to keep the communication locality of intermediate data. The jobs with inactive communication are deployed to the nodes sorted by computing performance. The experimental results show that the proposed communication optimisation scheme has a good effect on Shuffle-intensive jobs, and can reach 4%–5%. In the case of larger amount of input data, the communication optimisation scheme is robust and can adapt to heterogeneous cluster. In the case of multi-user application scene, the intermediate data communication can be reduced by 4.1%.","PeriodicalId":340410,"journal":{"name":"Int. J. Comput. Sci. Eng.","volume":"112 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Comput. Sci. Eng.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/ijcse.2020.10027428","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

MapReduce is a typical computing model for processing and analysis of big data. MapReduce computing job produces a large amount of intermediate data after map phase. Massive intermediate data results in a large amount of intermediate data communication across rack switches in the Shuffle process of MapReduce computing model, this degrades the performance of heterogeneous cluster computing. In order to optimise the intermediate data communication performance of map-intensive jobs, the characteristics of pre-running scheduling information of MapReduce computing jobs are extracted, and job classification is realised by machine learning. The jobs of active intermediate data communication are mapped into a rack to keep the communication locality of intermediate data. The jobs with inactive communication are deployed to the nodes sorted by computing performance. The experimental results show that the proposed communication optimisation scheme has a good effect on Shuffle-intensive jobs, and can reach 4%–5%. In the case of larger amount of input data, the communication optimisation scheme is robust and can adapt to heterogeneous cluster. In the case of multi-user application scene, the intermediate data communication can be reduced by 4.1%.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
MapReduce计算模型中间数据的通信优化
MapReduce是一种典型的处理和分析大数据的计算模型。MapReduce计算作业在map阶段之后会产生大量的中间数据。在MapReduce计算模型的Shuffle过程中,大量的中间数据导致了大量的中间数据跨机架交换机通信,从而降低了异构集群计算的性能。为了优化地图密集型作业的中间数据通信性能,提取MapReduce计算作业的预运行调度信息特征,并通过机器学习实现作业分类。主动中间数据通信的作业被映射到一个机架中,以保持中间数据的通信局部性。具有非活动通信的作业被部署到按计算性能排序的节点上。实验结果表明,所提出的通信优化方案对shuffle密集型作业具有良好的效果,可达到4% ~ 5%。在输入数据量较大的情况下,该通信优化方案具有鲁棒性,能够适应异构集群。在多用户应用场景下,中间数据通信可减少4.1%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
ECC-based lightweight mutual authentication protocol for fog enabled IoT system using three-way authentication procedure Gene selection and classification combining information gain ratio with fruit fly optimisation algorithm for single-cell RNA-seq data Attitude control of an unmanned patrol helicopter based on an optimised spiking neural membrane system for use in coal mines CEMP-IR: a novel location aware cache invalidation and replacement policy Prediction of consumer preference for the bottom of the pyramid using EEG-based deep model
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1