makflow:用于集群、云和网格上的数据密集型计算的可移植抽象

SWEET '12 Pub Date : 2012-05-20 DOI:10.1145/2443416.2443417
M. Albrecht, P. Donnelly, Peter Bui, D. Thain
{"title":"makflow:用于集群、云和网格上的数据密集型计算的可移植抽象","authors":"M. Albrecht, P. Donnelly, Peter Bui, D. Thain","doi":"10.1145/2443416.2443417","DOIUrl":null,"url":null,"abstract":"In recent years, there has been a renewed interest in languages and systems for large scale distributed computing. Unfortunately, most systems available to the end user use a custom description language tightly coupled to a specific runtime implementation, making it difficult to transfer applications between systems. To address this problem we introduce Makeflow, a simple system for expressing and running a data-intensive workflow across multiple execution engines without requiring changes to the application or workflow description. Makeflow allows any user familiar with basic Unix Make syntax to generate a workflow and run it on one of many supported execution systems. Furthermore, in order to assess the performance characteristics of the various execution engines available to users and assist them in selecting one for use we introduce Workbench, a suite of benchmarks designed for analyzing common workflow patterns. We evaluate Workbench on two physical architectures -- the first a storage cluster with local disks and a slower network and the second a high performance computing cluster with a central parallel filesystem and fast network -- using a variety of execution engines. We conclude by demonstrating three applications that use Makeflow to execute data intensive applications consisting of thousands of jobs.","PeriodicalId":143151,"journal":{"name":"SWEET '12","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"145","resultStr":"{\"title\":\"Makeflow: a portable abstraction for data intensive computing on clusters, clouds, and grids\",\"authors\":\"M. Albrecht, P. Donnelly, Peter Bui, D. Thain\",\"doi\":\"10.1145/2443416.2443417\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, there has been a renewed interest in languages and systems for large scale distributed computing. Unfortunately, most systems available to the end user use a custom description language tightly coupled to a specific runtime implementation, making it difficult to transfer applications between systems. To address this problem we introduce Makeflow, a simple system for expressing and running a data-intensive workflow across multiple execution engines without requiring changes to the application or workflow description. Makeflow allows any user familiar with basic Unix Make syntax to generate a workflow and run it on one of many supported execution systems. Furthermore, in order to assess the performance characteristics of the various execution engines available to users and assist them in selecting one for use we introduce Workbench, a suite of benchmarks designed for analyzing common workflow patterns. We evaluate Workbench on two physical architectures -- the first a storage cluster with local disks and a slower network and the second a high performance computing cluster with a central parallel filesystem and fast network -- using a variety of execution engines. We conclude by demonstrating three applications that use Makeflow to execute data intensive applications consisting of thousands of jobs.\",\"PeriodicalId\":143151,\"journal\":{\"name\":\"SWEET '12\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"145\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"SWEET '12\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2443416.2443417\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"SWEET '12","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2443416.2443417","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 145

摘要

近年来,人们对大规模分布式计算的语言和系统重新产生了兴趣。不幸的是,最终用户可用的大多数系统使用与特定运行时实现紧密耦合的自定义描述语言,这使得在系统之间传输应用程序变得困难。为了解决这个问题,我们引入了Makeflow,这是一个简单的系统,用于跨多个执行引擎表达和运行数据密集型工作流,而不需要更改应用程序或工作流描述。Makeflow允许任何熟悉基本Unix Make语法的用户生成工作流,并在许多支持的执行系统之一上运行它。此外,为了评估可供用户使用的各种执行引擎的性能特征,并帮助他们选择要使用的执行引擎,我们介绍了Workbench,这是一套用于分析常见工作流模式的基准测试。我们在两个物理体系结构上评估Workbench——第一个是具有本地磁盘和较慢网络的存储集群,第二个是具有中央并行文件系统和快速网络的高性能计算集群——使用各种执行引擎。最后,我们将演示三个应用程序,它们使用makflow来执行由数千个作业组成的数据密集型应用程序。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Makeflow: a portable abstraction for data intensive computing on clusters, clouds, and grids
In recent years, there has been a renewed interest in languages and systems for large scale distributed computing. Unfortunately, most systems available to the end user use a custom description language tightly coupled to a specific runtime implementation, making it difficult to transfer applications between systems. To address this problem we introduce Makeflow, a simple system for expressing and running a data-intensive workflow across multiple execution engines without requiring changes to the application or workflow description. Makeflow allows any user familiar with basic Unix Make syntax to generate a workflow and run it on one of many supported execution systems. Furthermore, in order to assess the performance characteristics of the various execution engines available to users and assist them in selecting one for use we introduce Workbench, a suite of benchmarks designed for analyzing common workflow patterns. We evaluate Workbench on two physical architectures -- the first a storage cluster with local disks and a slower network and the second a high performance computing cluster with a central parallel filesystem and fast network -- using a variety of execution engines. We conclude by demonstrating three applications that use Makeflow to execute data intensive applications consisting of thousands of jobs.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
DAGwoman: enabling DAGMan-like workflows on non-Condor platforms Turbine: a distributed-memory dataflow engine for extreme-scale many-task applications Evaluating parameter sweep workflows in high performance computing Makeflow: a portable abstraction for data intensive computing on clusters, clouds, and grids Oozie: towards a scalable workflow management system for Hadoop
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1