基于云和网格的数据密集型计算的编程抽象

2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid Pub Date : 2009-05-18 DOI:10.1109/CCGRID.2009.87

Christopher Miceli, M. Miceli, S. Jha, Hartmut Kaiser, André Merzky

{"title":"基于云和网格的数据密集型计算的编程抽象","authors":"Christopher Miceli, M. Miceli, S. Jha, Hartmut Kaiser, André Merzky","doi":"10.1109/CCGRID.2009.87","DOIUrl":null,"url":null,"abstract":"MapReduce has emerged as an important data-parallel programming model for data-intensive computing – for Clouds and Grids. However most if not all implementations of MapReduce are coupled to a specific infrastructure. SAGA is a high-level programming interface which provides the ability to create distributed applications in an infrastructure independent way. In this paper, we show how MapReduce has been implemented using SAGA and demonstrate its interoperability across different distributed platforms – Grids, Cloud-like infrastructure and Clouds. We discuss the advantages of programmatically developing MapReduce using SAGA, by demonstrating that the SAGA-based implementation is infrastructure independent whilst still providing control over the deployment, distribution and runtime decomposition. The ability to control the distribution and placement of the computation units (workers) is critical in order to implement the ability to move computational work to the data. This is required to keep data network transfer low and in the case of commercial Clouds the monetary cost of computing the solution low. Using data-sets of size up to 10GB, and upto 10 workers, we provide detailed performance analysis of the SAGA-MapReduce implementation, and show how controllingthe distribution of computation and the payload per worker helps enhance performance.","PeriodicalId":118263,"journal":{"name":"2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"43","resultStr":"{\"title\":\"Programming Abstractions for Data Intensive Computing on Clouds and Grids\",\"authors\":\"Christopher Miceli, M. Miceli, S. Jha, Hartmut Kaiser, André Merzky\",\"doi\":\"10.1109/CCGRID.2009.87\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"MapReduce has emerged as an important data-parallel programming model for data-intensive computing – for Clouds and Grids. However most if not all implementations of MapReduce are coupled to a specific infrastructure. SAGA is a high-level programming interface which provides the ability to create distributed applications in an infrastructure independent way. In this paper, we show how MapReduce has been implemented using SAGA and demonstrate its interoperability across different distributed platforms – Grids, Cloud-like infrastructure and Clouds. We discuss the advantages of programmatically developing MapReduce using SAGA, by demonstrating that the SAGA-based implementation is infrastructure independent whilst still providing control over the deployment, distribution and runtime decomposition. The ability to control the distribution and placement of the computation units (workers) is critical in order to implement the ability to move computational work to the data. This is required to keep data network transfer low and in the case of commercial Clouds the monetary cost of computing the solution low. Using data-sets of size up to 10GB, and upto 10 workers, we provide detailed performance analysis of the SAGA-MapReduce implementation, and show how controllingthe distribution of computation and the payload per worker helps enhance performance.\",\"PeriodicalId\":118263,\"journal\":{\"name\":\"2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid\",\"volume\":\"70 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-05-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"43\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGRID.2009.87\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGRID.2009.87","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 43

摘要

MapReduce已经成为一个重要的数据并行编程模型，用于数据密集型计算——云和网格。然而，MapReduce的大多数(如果不是全部的话)实现都与特定的基础设施相耦合。SAGA是一个高级编程接口，它提供了以独立于基础设施的方式创建分布式应用程序的能力。在本文中，我们展示了MapReduce是如何使用SAGA实现的，并演示了它在不同分布式平台(网格、类云基础设施和云)之间的互操作性。通过展示基于SAGA的实现是独立于基础设施的，同时仍然提供对部署、分发和运行时分解的控制，我们讨论了使用SAGA以编程方式开发MapReduce的优势。为了实现将计算工作移动到数据上的能力，控制计算单元(工作者)的分布和位置的能力至关重要。这是保持低数据网络传输所需的，并且在商业云的情况下，计算解决方案的货币成本较低。使用最大10GB的数据集和最多10个worker，我们提供了SAGA-MapReduce实现的详细性能分析，并展示了如何控制计算分布和每个worker的有效负载有助于提高性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Programming Abstractions for Data Intensive Computing on Clouds and Grids

MapReduce has emerged as an important data-parallel programming model for data-intensive computing – for Clouds and Grids. However most if not all implementations of MapReduce are coupled to a specific infrastructure. SAGA is a high-level programming interface which provides the ability to create distributed applications in an infrastructure independent way. In this paper, we show how MapReduce has been implemented using SAGA and demonstrate its interoperability across different distributed platforms – Grids, Cloud-like infrastructure and Clouds. We discuss the advantages of programmatically developing MapReduce using SAGA, by demonstrating that the SAGA-based implementation is infrastructure independent whilst still providing control over the deployment, distribution and runtime decomposition. The ability to control the distribution and placement of the computation units (workers) is critical in order to implement the ability to move computational work to the data. This is required to keep data network transfer low and in the case of commercial Clouds the monetary cost of computing the solution low. Using data-sets of size up to 10GB, and upto 10 workers, we provide detailed performance analysis of the SAGA-MapReduce implementation, and show how controllingthe distribution of computation and the payload per worker helps enhance performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid

自引率

0.00%

发文量