makflow:用于集群、云和网格上的数据密集型计算的可移植抽象

SWEET '12 Pub Date : 2012-05-20 DOI:10.1145/2443416.2443417

M. Albrecht, P. Donnelly, Peter Bui, D. Thain

{"title":"makflow:用于集群、云和网格上的数据密集型计算的可移植抽象","authors":"M. Albrecht, P. Donnelly, Peter Bui, D. Thain","doi":"10.1145/2443416.2443417","DOIUrl":null,"url":null,"abstract":"In recent years, there has been a renewed interest in languages and systems for large scale distributed computing. Unfortunately, most systems available to the end user use a custom description language tightly coupled to a specific runtime implementation, making it difficult to transfer applications between systems. To address this problem we introduce Makeflow, a simple system for expressing and running a data-intensive workflow across multiple execution engines without requiring changes to the application or workflow description. Makeflow allows any user familiar with basic Unix Make syntax to generate a workflow and run it on one of many supported execution systems. Furthermore, in order to assess the performance characteristics of the various execution engines available to users and assist them in selecting one for use we introduce Workbench, a suite of benchmarks designed for analyzing common workflow patterns. We evaluate Workbench on two physical architectures -- the first a storage cluster with local disks and a slower network and the second a high performance computing cluster with a central parallel filesystem and fast network -- using a variety of execution engines. We conclude by demonstrating three applications that use Makeflow to execute data intensive applications consisting of thousands of jobs.","PeriodicalId":143151,"journal":{"name":"SWEET '12","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"145","resultStr":"{\"title\":\"Makeflow: a portable abstraction for data intensive computing on clusters, clouds, and grids\",\"authors\":\"M. Albrecht, P. Donnelly, Peter Bui, D. Thain\",\"doi\":\"10.1145/2443416.2443417\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, there has been a renewed interest in languages and systems for large scale distributed computing. Unfortunately, most systems available to the end user use a custom description language tightly coupled to a specific runtime implementation, making it difficult to transfer applications between systems. To address this problem we introduce Makeflow, a simple system for expressing and running a data-intensive workflow across multiple execution engines without requiring changes to the application or workflow description. Makeflow allows any user familiar with basic Unix Make syntax to generate a workflow and run it on one of many supported execution systems. Furthermore, in order to assess the performance characteristics of the various execution engines available to users and assist them in selecting one for use we introduce Workbench, a suite of benchmarks designed for analyzing common workflow patterns. We evaluate Workbench on two physical architectures -- the first a storage cluster with local disks and a slower network and the second a high performance computing cluster with a central parallel filesystem and fast network -- using a variety of execution engines. We conclude by demonstrating three applications that use Makeflow to execute data intensive applications consisting of thousands of jobs.\",\"PeriodicalId\":143151,\"journal\":{\"name\":\"SWEET '12\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"145\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"SWEET '12\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2443416.2443417\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"SWEET '12","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2443416.2443417","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 145

摘要

近年来，人们对大规模分布式计算的语言和系统重新产生了兴趣。不幸的是，最终用户可用的大多数系统使用与特定运行时实现紧密耦合的自定义描述语言，这使得在系统之间传输应用程序变得困难。为了解决这个问题，我们引入了Makeflow，这是一个简单的系统，用于跨多个执行引擎表达和运行数据密集型工作流，而不需要更改应用程序或工作流描述。Makeflow允许任何熟悉基本Unix Make语法的用户生成工作流，并在许多支持的执行系统之一上运行它。此外，为了评估可供用户使用的各种执行引擎的性能特征，并帮助他们选择要使用的执行引擎，我们介绍了Workbench，这是一套用于分析常见工作流模式的基准测试。我们在两个物理体系结构上评估Workbench——第一个是具有本地磁盘和较慢网络的存储集群，第二个是具有中央并行文件系统和快速网络的高性能计算集群——使用各种执行引擎。最后，我们将演示三个应用程序，它们使用makflow来执行由数千个作业组成的数据密集型应用程序。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Makeflow: a portable abstraction for data intensive computing on clusters, clouds, and grids

In recent years, there has been a renewed interest in languages and systems for large scale distributed computing. Unfortunately, most systems available to the end user use a custom description language tightly coupled to a specific runtime implementation, making it difficult to transfer applications between systems. To address this problem we introduce Makeflow, a simple system for expressing and running a data-intensive workflow across multiple execution engines without requiring changes to the application or workflow description. Makeflow allows any user familiar with basic Unix Make syntax to generate a workflow and run it on one of many supported execution systems. Furthermore, in order to assess the performance characteristics of the various execution engines available to users and assist them in selecting one for use we introduce Workbench, a suite of benchmarks designed for analyzing common workflow patterns. We evaluate Workbench on two physical architectures -- the first a storage cluster with local disks and a slower network and the second a high performance computing cluster with a central parallel filesystem and fast network -- using a variety of execution engines. We conclude by demonstrating three applications that use Makeflow to execute data intensive applications consisting of thousands of jobs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

SWEET '12

自引率

0.00%

发文量