异构Hadoop集群的数据预取

2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS) Pub Date : 2019-03-01 DOI:10.1109/ICACCS.2019.8728373

D. Vinutha, G. Raju

{"title":"异构Hadoop集群的数据预取","authors":"D. Vinutha, G. Raju","doi":"10.1109/ICACCS.2019.8728373","DOIUrl":null,"url":null,"abstract":"Hadoop is an open source implementation of MapReduce. Performance of Hadoop is affected by the overhead of communication during the transmission of large datasets to the computing node. In a heterogeneous cluster if a map task wants to process the data, which is not present in the local disk then the data transmission overhead occurs. To overcome this issue, Data Prefetching for Heterogeneous Hadoop cluster for resource optimization is proposed. data prefetching is used to fetch the input data in advance from the remote node to a particular computing node. Hence transmission of data occurs in parallel with data processing and the job execution time is reduced. Different MapReduce jobs are used to conduct the experiment. The results demonstrate that the time taken to transmit the data is reduced. The job execution time is reduced by 15% for the input data size greater than or equal to 2GB and performance improvement of 25% is obtained for 64MB block size.","PeriodicalId":249139,"journal":{"name":"2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Data Prefetching for Heterogeneous Hadoop Cluster\",\"authors\":\"D. Vinutha, G. Raju\",\"doi\":\"10.1109/ICACCS.2019.8728373\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hadoop is an open source implementation of MapReduce. Performance of Hadoop is affected by the overhead of communication during the transmission of large datasets to the computing node. In a heterogeneous cluster if a map task wants to process the data, which is not present in the local disk then the data transmission overhead occurs. To overcome this issue, Data Prefetching for Heterogeneous Hadoop cluster for resource optimization is proposed. data prefetching is used to fetch the input data in advance from the remote node to a particular computing node. Hence transmission of data occurs in parallel with data processing and the job execution time is reduced. Different MapReduce jobs are used to conduct the experiment. The results demonstrate that the time taken to transmit the data is reduced. The job execution time is reduced by 15% for the input data size greater than or equal to 2GB and performance improvement of 25% is obtained for 64MB block size.\",\"PeriodicalId\":249139,\"journal\":{\"name\":\"2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS)\",\"volume\":\"66 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICACCS.2019.8728373\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACCS.2019.8728373","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

Hadoop是MapReduce的开源实现。在向计算节点传输大型数据集的过程中，通信开销会影响Hadoop的性能。在异构集群中，如果map任务想要处理本地磁盘中不存在的数据，则会产生数据传输开销。为了解决这一问题，提出了异构Hadoop集群数据预取的方法。数据预取是将输入的数据从远程节点提前提取到特定的计算节点。因此，数据传输与数据处理并行进行，减少了作业的执行时间。使用不同的MapReduce作业进行实验。结果表明，传输数据所花费的时间减少了。当输入数据大小大于或等于2GB时，作业执行时间减少15%，当块大小为64MB时，性能提高25%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Data Prefetching for Heterogeneous Hadoop Cluster

Hadoop is an open source implementation of MapReduce. Performance of Hadoop is affected by the overhead of communication during the transmission of large datasets to the computing node. In a heterogeneous cluster if a map task wants to process the data, which is not present in the local disk then the data transmission overhead occurs. To overcome this issue, Data Prefetching for Heterogeneous Hadoop cluster for resource optimization is proposed. data prefetching is used to fetch the input data in advance from the remote node to a particular computing node. Hence transmission of data occurs in parallel with data processing and the job execution time is reduced. Different MapReduce jobs are used to conduct the experiment. The results demonstrate that the time taken to transmit the data is reduced. The job execution time is reduced by 15% for the input data size greater than or equal to 2GB and performance improvement of 25% is obtained for 64MB block size.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS)

自引率

0.00%

发文量

期刊最新文献

Object Detection and Tracking Approaches for Video Surveillance Over Camera Network A Systematic Literature Review for Early Detection of Type II Diabetes Agricultural Field Monitoring using IoT A Methodical Overview on Phishing Detection along with an Organized Way to Construct an Anti-Phishing Framework Mobile Edge Communication An overview of MEC in 5G