{"title":"JUWELS Booster - Early User Experiences","authors":"A. Herten","doi":"10.1145/3452412.3462752","DOIUrl":null,"url":null,"abstract":"Over the last few years, GPUs became ubiquitous in HPC installations around the world. Today, they provide the main source of performance in a number of Top500 machines - for example Summit, Sierra, and JUWELS Booster. Also for the upcoming Exascale era, GPUs are selected as key enablers and will be installed numerously. While individual GPU devices already offer plenty of performance (O (10) TFLOP/sFP64), current and next-generation super-computers employ them in the thousands. Using these machines to the fullest extend means not only utilizing individual devices efficiently, but using the entire interconnected system of devices thoroughly. JUWELS Booster is a recently installed Tier-0/1 system at Jülich Supercomputing Centre (JSC), currently the 7th-fastest supercomputer in the world, and the fastest in Europe. JUWELS Booster features 936 nodes, each equipped with 4 NVIDIA A100 Tensor Core GPUs and 4 Mellanox HDR200 InfiniBand HCAs. The peak performance of all GPUs together sums up to 73 PFLOP/s and it features a DragonFly+ network topology with 800 Gbit/s network injection bandwidth per node. During installation of JUWELS Booster, a selected set of applications were given access to the system as part of the JUWELS Booster Early Access Program. To prepare for their first compute time allocation, scientific users were able to gain first experiences on the machine. They gave direct feedback to the system operations team during installation and beyond. Close collaboration was facilitated with the application support staff of JSC, giving unique insights into the individual processes of utilizing a brand-new large-sale system for a first time. Likewise, performance profiles of applications could be studied and collaboratively analyzed, employing available tools and methods. Performance limiters of the specific application on the platform were identified and proposals for improvement developed. This talk will present first experiences with JUWELS Booster and the applications utilizing the system during its first months. Applied methods for onboarding, analysis, and optimization will be shown and assessed. Highlights of the state of the art of performance analysis and modeling for GPUs will be presented with concrete examples from the JUWELS Booster Early Access Program.","PeriodicalId":342766,"journal":{"name":"Proceedings of the 2021 on Performance EngineeRing, Modelling, Analysis, and VisualizatiOn STrategy","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 on Performance EngineeRing, Modelling, Analysis, and VisualizatiOn STrategy","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3452412.3462752","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Over the last few years, GPUs became ubiquitous in HPC installations around the world. Today, they provide the main source of performance in a number of Top500 machines - for example Summit, Sierra, and JUWELS Booster. Also for the upcoming Exascale era, GPUs are selected as key enablers and will be installed numerously. While individual GPU devices already offer plenty of performance (O (10) TFLOP/sFP64), current and next-generation super-computers employ them in the thousands. Using these machines to the fullest extend means not only utilizing individual devices efficiently, but using the entire interconnected system of devices thoroughly. JUWELS Booster is a recently installed Tier-0/1 system at Jülich Supercomputing Centre (JSC), currently the 7th-fastest supercomputer in the world, and the fastest in Europe. JUWELS Booster features 936 nodes, each equipped with 4 NVIDIA A100 Tensor Core GPUs and 4 Mellanox HDR200 InfiniBand HCAs. The peak performance of all GPUs together sums up to 73 PFLOP/s and it features a DragonFly+ network topology with 800 Gbit/s network injection bandwidth per node. During installation of JUWELS Booster, a selected set of applications were given access to the system as part of the JUWELS Booster Early Access Program. To prepare for their first compute time allocation, scientific users were able to gain first experiences on the machine. They gave direct feedback to the system operations team during installation and beyond. Close collaboration was facilitated with the application support staff of JSC, giving unique insights into the individual processes of utilizing a brand-new large-sale system for a first time. Likewise, performance profiles of applications could be studied and collaboratively analyzed, employing available tools and methods. Performance limiters of the specific application on the platform were identified and proposals for improvement developed. This talk will present first experiences with JUWELS Booster and the applications utilizing the system during its first months. Applied methods for onboarding, analysis, and optimization will be shown and assessed. Highlights of the state of the art of performance analysis and modeling for GPUs will be presented with concrete examples from the JUWELS Booster Early Access Program.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
JUWELS Booster -早期用户体验
在过去的几年里,gpu在世界各地的HPC安装中变得无处不在。今天,它们在许多Top500机器中提供了主要的性能来源-例如Summit, Sierra和JUWELS Booster。同样,在即将到来的百亿亿次时代,gpu被选为关键推动者,并将大量安装。虽然单个GPU设备已经提供了足够的性能(O (10) TFLOP/sFP64),但当前和下一代超级计算机仍在使用数千个GPU设备。充分利用这些机器不仅意味着有效地利用单个设备,而且意味着彻底地利用整个相互连接的设备系统。JUWELS Booster是j lich超级计算中心(JSC)最近安装的Tier-0/1系统,目前是世界上第七快的超级计算机,也是欧洲最快的超级计算机。JUWELS Booster具有936个节点,每个节点配备4个NVIDIA A100 Tensor Core gpu和4个Mellanox HDR200 InfiniBand hca。所有gpu的峰值性能总计可达73 PFLOP/s,并具有DragonFly+网络拓扑,每个节点的网络注入带宽为800 Gbit/s。在安装JUWELS Booster期间,一组选定的应用程序被授予访问系统的权限,作为JUWELS Booster早期访问计划的一部分。为了准备他们的第一次计算时间分配,科学用户能够在机器上获得第一次体验。他们在安装期间和之后向系统操作团队提供直接反馈。与JSC的应用支持人员进行了密切的合作,对首次使用全新的大型销售系统的各个流程有了独特的见解。同样,可以使用可用的工具和方法研究和协作分析应用程序的性能配置文件。确定了平台上特定应用程序的性能限制因素,并提出了改进建议。本次演讲将介绍JUWELS Booster的首次使用体验以及在最初几个月使用该系统的应用程序。将展示和评估用于入职、分析和优化的应用方法。重点介绍了gpu性能分析和建模的最新技术,并将介绍JUWELS Booster Early Access项目的具体示例。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Panel Discussion on the Future of Performance Analysis and Engineering JUWELS Booster - Early User Experiences Predicting How CNN Training Time Changes on Various Mini-Batch Sizes by Considering Convolution Algorithms and Non-GPU Time TALP: A Lightweight Tool to Unveil Parallel Efficiency of Large-scale Executions On the Exploration and Optimization of High-Dimensional Architectural Design Space
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1