An adaptive data placement strategy in scientific workflows over cloud computing environments

Heewon Kim, Yoonhee Kim
{"title":"An adaptive data placement strategy in scientific workflows over cloud computing environments","authors":"Heewon Kim, Yoonhee Kim","doi":"10.1109/NOMS.2018.8406191","DOIUrl":null,"url":null,"abstract":"Data of scientific workflow applications are tend to be distributed over many data centers to be effectively stored, retrieved, and transferred among them. The result of an experiment with those data shows diverse execution performance depending on the placement of input and intermediate data which are generated during application execution. However, initial data placement strategy would not be the best plan for long running experiments because of the dynamic change of resource condition time to time. We propose an adaptive data placement strategy considering dynamic resource change for efficient data-intensive applications. The strategy consists of two stages that group the datasets in data centers during the build- time stage and dynamically clusters every time newly generated datasets repeatedly to the most appropriate data centers during runtime stage, which is based on task dependency, intense degree of data usage, and just-in-time resource availability. Just-in-time data placement coming with task execution is more efficient than the one with initialization stage of experiments in the aspect of resource utilization. Experiments show that data movement can be effectively reduced while the workflow is running","PeriodicalId":19331,"journal":{"name":"NOMS 2018 - 2018 IEEE/IFIP Network Operations and Management Symposium","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NOMS 2018 - 2018 IEEE/IFIP Network Operations and Management Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NOMS.2018.8406191","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

Data of scientific workflow applications are tend to be distributed over many data centers to be effectively stored, retrieved, and transferred among them. The result of an experiment with those data shows diverse execution performance depending on the placement of input and intermediate data which are generated during application execution. However, initial data placement strategy would not be the best plan for long running experiments because of the dynamic change of resource condition time to time. We propose an adaptive data placement strategy considering dynamic resource change for efficient data-intensive applications. The strategy consists of two stages that group the datasets in data centers during the build- time stage and dynamically clusters every time newly generated datasets repeatedly to the most appropriate data centers during runtime stage, which is based on task dependency, intense degree of data usage, and just-in-time resource availability. Just-in-time data placement coming with task execution is more efficient than the one with initialization stage of experiments in the aspect of resource utilization. Experiments show that data movement can be effectively reduced while the workflow is running
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
云计算环境下科学工作流中的自适应数据放置策略
科学工作流应用的数据往往分布在多个数据中心,以便在多个数据中心之间进行有效的存储、检索和传输。对这些数据进行实验的结果显示,根据应用程序执行期间生成的输入数据和中间数据的位置,执行性能会有所不同。然而,由于资源条件的动态变化,初始数据放置策略并不是长期运行实验的最佳方案。针对高效的数据密集型应用,提出了一种考虑动态资源变化的自适应数据放置策略。该策略包括两个阶段,在构建时阶段对数据中心中的数据集进行分组,并在运行时阶段根据任务依赖性、数据使用强度和实时资源可用性,动态地将每次新生成的数据集重复聚集到最合适的数据中心。在资源利用方面,随任务执行而来的实时数据放置比实验初始化阶段的数据放置更有效。实验表明,该方法可以有效地减少工作流运行过程中的数据移动
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
SSH Kernel: A Jupyter Extension Specifically for Remote Infrastructure Administration Visual emulation for Ethereum's virtual machine Analyzing throughput and stability in cellular networks Network events in a large commercial network: What can we learn? Economic incentives on DNSSEC deployment: Time to move from quantity to quality
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1