{"title":"利用WorkflowSim测量虚拟MapReduce集群中的数据局部性比","authors":"Peerasak Wangsom, K. Lavangnananda, P. Bouvry","doi":"10.1109/JCSSE.2017.8025944","DOIUrl":null,"url":null,"abstract":"The data locality is significant factor which has a direct impact on the performance of MapReduce framework. Several previous works have proposed alternative scheduling algorithms for improving the performance by increasing data locality. Nevertheless, their studies had focused the data locality on physical MapReduce cluster. As more and more deployment of MapReduce cluster have been on virtual environment, a more suitable evaluation of MapReduce cluster may be necessary. This study adopts a simulation based approach. Five scheduling algorithms were used for the simulation. WorkflowSim is extended by inclusion of three implemented modules to assess the new performance measure called ‘data locality ratio’. Comparison of their results reveals interesting findings. The proposed implementation can be used to assess ‘data locality ratio’ and allows users prior to efficiently select and tune scheduler and system configurations suitable for an environment prior to its actual physical MapReduce deployment.","PeriodicalId":6460,"journal":{"name":"2017 14th International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"41 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Measuring data locality ratio in virtual MapReduce cluster using WorkflowSim\",\"authors\":\"Peerasak Wangsom, K. Lavangnananda, P. Bouvry\",\"doi\":\"10.1109/JCSSE.2017.8025944\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The data locality is significant factor which has a direct impact on the performance of MapReduce framework. Several previous works have proposed alternative scheduling algorithms for improving the performance by increasing data locality. Nevertheless, their studies had focused the data locality on physical MapReduce cluster. As more and more deployment of MapReduce cluster have been on virtual environment, a more suitable evaluation of MapReduce cluster may be necessary. This study adopts a simulation based approach. Five scheduling algorithms were used for the simulation. WorkflowSim is extended by inclusion of three implemented modules to assess the new performance measure called ‘data locality ratio’. Comparison of their results reveals interesting findings. The proposed implementation can be used to assess ‘data locality ratio’ and allows users prior to efficiently select and tune scheduler and system configurations suitable for an environment prior to its actual physical MapReduce deployment.\",\"PeriodicalId\":6460,\"journal\":{\"name\":\"2017 14th International Joint Conference on Computer Science and Software Engineering (JCSSE)\",\"volume\":\"41 1\",\"pages\":\"1-6\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 14th International Joint Conference on Computer Science and Software Engineering (JCSSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/JCSSE.2017.8025944\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 14th International Joint Conference on Computer Science and Software Engineering (JCSSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/JCSSE.2017.8025944","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Measuring data locality ratio in virtual MapReduce cluster using WorkflowSim
The data locality is significant factor which has a direct impact on the performance of MapReduce framework. Several previous works have proposed alternative scheduling algorithms for improving the performance by increasing data locality. Nevertheless, their studies had focused the data locality on physical MapReduce cluster. As more and more deployment of MapReduce cluster have been on virtual environment, a more suitable evaluation of MapReduce cluster may be necessary. This study adopts a simulation based approach. Five scheduling algorithms were used for the simulation. WorkflowSim is extended by inclusion of three implemented modules to assess the new performance measure called ‘data locality ratio’. Comparison of their results reveals interesting findings. The proposed implementation can be used to assess ‘data locality ratio’ and allows users prior to efficiently select and tune scheduler and system configurations suitable for an environment prior to its actual physical MapReduce deployment.