{"title":"异构Hadoop集群的数据预取","authors":"D. Vinutha, G. Raju","doi":"10.1109/ICACCS.2019.8728373","DOIUrl":null,"url":null,"abstract":"Hadoop is an open source implementation of MapReduce. Performance of Hadoop is affected by the overhead of communication during the transmission of large datasets to the computing node. In a heterogeneous cluster if a map task wants to process the data, which is not present in the local disk then the data transmission overhead occurs. To overcome this issue, Data Prefetching for Heterogeneous Hadoop cluster for resource optimization is proposed. data prefetching is used to fetch the input data in advance from the remote node to a particular computing node. Hence transmission of data occurs in parallel with data processing and the job execution time is reduced. Different MapReduce jobs are used to conduct the experiment. The results demonstrate that the time taken to transmit the data is reduced. The job execution time is reduced by 15% for the input data size greater than or equal to 2GB and performance improvement of 25% is obtained for 64MB block size.","PeriodicalId":249139,"journal":{"name":"2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Data Prefetching for Heterogeneous Hadoop Cluster\",\"authors\":\"D. Vinutha, G. Raju\",\"doi\":\"10.1109/ICACCS.2019.8728373\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hadoop is an open source implementation of MapReduce. Performance of Hadoop is affected by the overhead of communication during the transmission of large datasets to the computing node. In a heterogeneous cluster if a map task wants to process the data, which is not present in the local disk then the data transmission overhead occurs. To overcome this issue, Data Prefetching for Heterogeneous Hadoop cluster for resource optimization is proposed. data prefetching is used to fetch the input data in advance from the remote node to a particular computing node. Hence transmission of data occurs in parallel with data processing and the job execution time is reduced. Different MapReduce jobs are used to conduct the experiment. The results demonstrate that the time taken to transmit the data is reduced. The job execution time is reduced by 15% for the input data size greater than or equal to 2GB and performance improvement of 25% is obtained for 64MB block size.\",\"PeriodicalId\":249139,\"journal\":{\"name\":\"2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS)\",\"volume\":\"66 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICACCS.2019.8728373\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACCS.2019.8728373","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

Hadoop是MapReduce的开源实现。在向计算节点传输大型数据集的过程中,通信开销会影响Hadoop的性能。在异构集群中,如果map任务想要处理本地磁盘中不存在的数据,则会产生数据传输开销。为了解决这一问题,提出了异构Hadoop集群数据预取的方法。数据预取是将输入的数据从远程节点提前提取到特定的计算节点。因此,数据传输与数据处理并行进行,减少了作业的执行时间。使用不同的MapReduce作业进行实验。结果表明,传输数据所花费的时间减少了。当输入数据大小大于或等于2GB时,作业执行时间减少15%,当块大小为64MB时,性能提高25%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Data Prefetching for Heterogeneous Hadoop Cluster
Hadoop is an open source implementation of MapReduce. Performance of Hadoop is affected by the overhead of communication during the transmission of large datasets to the computing node. In a heterogeneous cluster if a map task wants to process the data, which is not present in the local disk then the data transmission overhead occurs. To overcome this issue, Data Prefetching for Heterogeneous Hadoop cluster for resource optimization is proposed. data prefetching is used to fetch the input data in advance from the remote node to a particular computing node. Hence transmission of data occurs in parallel with data processing and the job execution time is reduced. Different MapReduce jobs are used to conduct the experiment. The results demonstrate that the time taken to transmit the data is reduced. The job execution time is reduced by 15% for the input data size greater than or equal to 2GB and performance improvement of 25% is obtained for 64MB block size.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Object Detection and Tracking Approaches for Video Surveillance Over Camera Network A Systematic Literature Review for Early Detection of Type II Diabetes Agricultural Field Monitoring using IoT A Methodical Overview on Phishing Detection along with an Organized Way to Construct an Anti-Phishing Framework Mobile Edge Communication An overview of MEC in 5G
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1