Emulating I/O Behavior in Scientific Workflows on High Performance Computing Systems

Fahim Chowdhury, Yue Zhu, F. Natale, A. Moody, Elsa Gonsiorowski, K. Mohror, Weikuan Yu
{"title":"Emulating I/O Behavior in Scientific Workflows on High Performance Computing Systems","authors":"Fahim Chowdhury, Yue Zhu, F. Natale, A. Moody, Elsa Gonsiorowski, K. Mohror, Weikuan Yu","doi":"10.1109/PDSW51947.2020.00011","DOIUrl":null,"url":null,"abstract":"Scientific application workflows leverage the capabilities of cutting-edge high-performance computing (HPC) facilities to enable complex applications for academia, research, and industry communities. Data transfer and I/O dependency among different modules of modern HPC workflows can increase the complexity and hamper the overall performance of workflows. Understanding this complexity due to data-dependency and dataflow is an essential prerequisite for developing optimization strategies to improve I/O performance and, eventually, the entire workflow. In this paper, we discuss dataflow patterns for workflow applications on HPC systems. As existing I/O benchmarking tools lack in identifying and representing the dataflow in modern HPC workflows, we have implemented Wemul, an open-source workflow I/O emulation framework, to mimic different types of I/O behavior demonstrated by common and complex HPC application workflows for deeper analysis. We elaborate on the features and usage of Wemul, demonstrate its application to HPC workflows, and discuss the insights from the performance analysis results on Lassen supercomputing cluster at Lawrence Livermore National Laboratory (LLNL).","PeriodicalId":142923,"journal":{"name":"2020 IEEE/ACM Fifth International Parallel Data Systems Workshop (PDSW)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE/ACM Fifth International Parallel Data Systems Workshop (PDSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDSW51947.2020.00011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Scientific application workflows leverage the capabilities of cutting-edge high-performance computing (HPC) facilities to enable complex applications for academia, research, and industry communities. Data transfer and I/O dependency among different modules of modern HPC workflows can increase the complexity and hamper the overall performance of workflows. Understanding this complexity due to data-dependency and dataflow is an essential prerequisite for developing optimization strategies to improve I/O performance and, eventually, the entire workflow. In this paper, we discuss dataflow patterns for workflow applications on HPC systems. As existing I/O benchmarking tools lack in identifying and representing the dataflow in modern HPC workflows, we have implemented Wemul, an open-source workflow I/O emulation framework, to mimic different types of I/O behavior demonstrated by common and complex HPC application workflows for deeper analysis. We elaborate on the features and usage of Wemul, demonstrate its application to HPC workflows, and discuss the insights from the performance analysis results on Lassen supercomputing cluster at Lawrence Livermore National Laboratory (LLNL).
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
高性能计算系统中科学工作流的I/O行为仿真
科学应用程序工作流利用尖端高性能计算(HPC)设施的功能,为学术界、研究机构和工业界提供复杂的应用程序。现代高性能计算工作流中不同模块之间的数据传输和I/O依赖会增加工作流的复杂性,影响工作流的整体性能。理解由于数据依赖性和数据流导致的这种复杂性是开发优化策略以提高I/O性能并最终提高整个工作流的必要先决条件。本文讨论了HPC系统中工作流应用的数据流模式。由于现有的I/O基准测试工具在识别和表示现代HPC工作流中的数据流方面存在不足,我们实现了Wemul,一个开源的工作流I/O仿真框架,以模拟常见和复杂HPC应用程序工作流中不同类型的I/O行为,以进行更深入的分析。我们详细介绍了Wemul的特性和用法,演示了它在HPC工作流中的应用,并讨论了来自劳伦斯利弗莫尔国家实验室(LLNL) Lassen超级计算集群性能分析结果的见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Fractional-Overlap Declustered Parity: Evaluating Reliability for Storage Systems GPU Direct I/O with HDF5 Keeping It Real: Why HPC Data Services Don't Achieve I/O Microbenchmark Performance Fingerprinting the Checker Policies of Parallel File Systems [Title page]
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1