Understanding Data Motion in the Modern HPC Data Center

Glenn K. Lockwood, S. Snyder, S. Byna, P. Carns, N. Wright
{"title":"Understanding Data Motion in the Modern HPC Data Center","authors":"Glenn K. Lockwood, S. Snyder, S. Byna, P. Carns, N. Wright","doi":"10.1109/PDSW49588.2019.00012","DOIUrl":null,"url":null,"abstract":"The utilization and performance of storage, compute, and network resources within HPC data centers have been studied extensively, but much less work has gone toward characterizing how these resources are used in conjunction to solve larger scientific challenges. To address this gap, we present our work in characterizing workloads and workflows at a data-center-wide level by examining all data transfers that occurred between storage, compute, and the external network at the National Energy Research Scientific Computing Center over a three-month period in 2019. Using a simple abstract representation of data transfers, we analyze over 100 million transfer logs from Darshan, HPSS user interfaces, and Globus to quantify the load on data paths between compute, storage, and the wide-area network based on transfer direction, user, transfer tool, source, destination, and time. We show that parallel I/O from user jobs, while undeniably important, is only one of several major I/O workloads that occurs throughout the execution of scientific workflows. We also show that this approach can be used to connect anomalous data traffic to specific users and file access patterns, and we construct time-resolved user transfer traces to demonstrate that one can systematically identify coupled data motion for individual workflows.","PeriodicalId":130430,"journal":{"name":"2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDSW49588.2019.00012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

The utilization and performance of storage, compute, and network resources within HPC data centers have been studied extensively, but much less work has gone toward characterizing how these resources are used in conjunction to solve larger scientific challenges. To address this gap, we present our work in characterizing workloads and workflows at a data-center-wide level by examining all data transfers that occurred between storage, compute, and the external network at the National Energy Research Scientific Computing Center over a three-month period in 2019. Using a simple abstract representation of data transfers, we analyze over 100 million transfer logs from Darshan, HPSS user interfaces, and Globus to quantify the load on data paths between compute, storage, and the wide-area network based on transfer direction, user, transfer tool, source, destination, and time. We show that parallel I/O from user jobs, while undeniably important, is only one of several major I/O workloads that occurs throughout the execution of scientific workflows. We also show that this approach can be used to connect anomalous data traffic to specific users and file access patterns, and we construct time-resolved user transfer traces to demonstrate that one can systematically identify coupled data motion for individual workflows.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
理解现代HPC数据中心中的数据运动
HPC数据中心内存储、计算和网络资源的利用率和性能已经得到了广泛的研究,但在描述如何将这些资源结合起来解决更大的科学挑战方面的工作要少得多。为了解决这一差距,我们通过检查2019年三个月期间国家能源研究科学计算中心存储、计算和外部网络之间发生的所有数据传输,介绍了我们在数据中心范围内描述工作负载和工作流程的工作。使用数据传输的简单抽象表示,我们分析了来自Darshan、HPSS用户界面和Globus的超过1亿个传输日志,以根据传输方向、用户、传输工具、源、目的地和时间量化计算、存储和广域网之间数据路径上的负载。我们展示了来自用户作业的并行I/O,虽然不可否认它很重要,但它只是科学工作流执行过程中出现的几个主要I/O工作负载之一。我们还表明,这种方法可用于将异常数据流量连接到特定用户和文件访问模式,并且我们构建了时间解析的用户传输跟踪,以证明可以系统地识别单个工作流的耦合数据运动。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
[Copyright notice] Applying Machine Learning to Understand Write Performance of Large-scale Parallel Filesystems Towards Physical Design Management in Storage Systems Profiling Platform Storage Using IO500 and Mistral Active Learning-based Automatic Tuning and Prediction of Parallel I/O Performance
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1