{"title":"异构Hadoop集群的数据预取","authors":"D. Vinutha, G. Raju","doi":"10.1109/ICACCS.2019.8728373","DOIUrl":null,"url":null,"abstract":"Hadoop is an open source implementation of MapReduce. Performance of Hadoop is affected by the overhead of communication during the transmission of large datasets to the computing node. In a heterogeneous cluster if a map task wants to process the data, which is not present in the local disk then the data transmission overhead occurs. To overcome this issue, Data Prefetching for Heterogeneous Hadoop cluster for resource optimization is proposed. data prefetching is used to fetch the input data in advance from the remote node to a particular computing node. Hence transmission of data occurs in parallel with data processing and the job execution time is reduced. Different MapReduce jobs are used to conduct the experiment. The results demonstrate that the time taken to transmit the data is reduced. The job execution time is reduced by 15% for the input data size greater than or equal to 2GB and performance improvement of 25% is obtained for 64MB block size.","PeriodicalId":249139,"journal":{"name":"2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Data Prefetching for Heterogeneous Hadoop Cluster\",\"authors\":\"D. Vinutha, G. Raju\",\"doi\":\"10.1109/ICACCS.2019.8728373\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hadoop is an open source implementation of MapReduce. Performance of Hadoop is affected by the overhead of communication during the transmission of large datasets to the computing node. In a heterogeneous cluster if a map task wants to process the data, which is not present in the local disk then the data transmission overhead occurs. To overcome this issue, Data Prefetching for Heterogeneous Hadoop cluster for resource optimization is proposed. data prefetching is used to fetch the input data in advance from the remote node to a particular computing node. Hence transmission of data occurs in parallel with data processing and the job execution time is reduced. Different MapReduce jobs are used to conduct the experiment. The results demonstrate that the time taken to transmit the data is reduced. The job execution time is reduced by 15% for the input data size greater than or equal to 2GB and performance improvement of 25% is obtained for 64MB block size.\",\"PeriodicalId\":249139,\"journal\":{\"name\":\"2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS)\",\"volume\":\"66 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICACCS.2019.8728373\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACCS.2019.8728373","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Hadoop is an open source implementation of MapReduce. Performance of Hadoop is affected by the overhead of communication during the transmission of large datasets to the computing node. In a heterogeneous cluster if a map task wants to process the data, which is not present in the local disk then the data transmission overhead occurs. To overcome this issue, Data Prefetching for Heterogeneous Hadoop cluster for resource optimization is proposed. data prefetching is used to fetch the input data in advance from the remote node to a particular computing node. Hence transmission of data occurs in parallel with data processing and the job execution time is reduced. Different MapReduce jobs are used to conduct the experiment. The results demonstrate that the time taken to transmit the data is reduced. The job execution time is reduced by 15% for the input data size greater than or equal to 2GB and performance improvement of 25% is obtained for 64MB block size.