HPC AI500 V3.0:一个可扩展的HPC AI基准测试框架

Zihan Jiang , Chunjie Luo , Wanling Gao , Lei Wang , Jianfeng Zhan
{"title":"HPC AI500 V3.0:一个可扩展的HPC AI基准测试框架","authors":"Zihan Jiang ,&nbsp;Chunjie Luo ,&nbsp;Wanling Gao ,&nbsp;Lei Wang ,&nbsp;Jianfeng Zhan","doi":"10.1016/j.tbench.2022.100083","DOIUrl":null,"url":null,"abstract":"<div><p>In recent years, the convergence of High Performance Computing (HPC) and artificial intelligence (AI) makes the community desperately need a benchmark to guide the design of next-generation scalable HPC AI systems. The success of the HPL benchmarks and the affiliated TOP500 ranking indicates that scalability is the fundamental requirement to evaluate HPC systems. However, being scalable in terms of these emerging AI workloads like deep learning (DL) raises nontrivial challenges. This paper formally and systematically analyzes the factor that limits scalability in DL workloads and presents HPC AI500 v3.0, a scalable HPC AI benchmarking framework. The HPC AI500 V3.0 methodology is inspired by bagging, which utilizes the collective wisdom of an ensemble of base models and enables the benchmarks to be adaptively scalable to different scales of HPC systems. We implement HPC AI500 V3.0 in a highly customizable manner, maintaining the space of various optimization from both system and algorithm levels. By reusing the representative workloads in HPC AI500 V2.0, we evaluate HPC AI500 V3.0 on typical HPC systems, and the results show it has near-linear scalability. Furthermore, based on the customizable design, we present a case study to perform a trade-off between AI model quality and its training speed. The source code of HPC AI500 V3.0 is publicly available from the HPC AI500 project homepage  <span>https://www.benchcouncil.org/aibench/hpcai500/</span><svg><path></path></svg>.</p></div>","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":"2 4","pages":"Article 100083"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772485922000709/pdfft?md5=00638cfe1547defd3a92b0141166ca91&pid=1-s2.0-S2772485922000709-main.pdf","citationCount":"0","resultStr":"{\"title\":\"HPC AI500 V3.0: A scalable HPC AI benchmarking framework\",\"authors\":\"Zihan Jiang ,&nbsp;Chunjie Luo ,&nbsp;Wanling Gao ,&nbsp;Lei Wang ,&nbsp;Jianfeng Zhan\",\"doi\":\"10.1016/j.tbench.2022.100083\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>In recent years, the convergence of High Performance Computing (HPC) and artificial intelligence (AI) makes the community desperately need a benchmark to guide the design of next-generation scalable HPC AI systems. The success of the HPL benchmarks and the affiliated TOP500 ranking indicates that scalability is the fundamental requirement to evaluate HPC systems. However, being scalable in terms of these emerging AI workloads like deep learning (DL) raises nontrivial challenges. This paper formally and systematically analyzes the factor that limits scalability in DL workloads and presents HPC AI500 v3.0, a scalable HPC AI benchmarking framework. The HPC AI500 V3.0 methodology is inspired by bagging, which utilizes the collective wisdom of an ensemble of base models and enables the benchmarks to be adaptively scalable to different scales of HPC systems. We implement HPC AI500 V3.0 in a highly customizable manner, maintaining the space of various optimization from both system and algorithm levels. By reusing the representative workloads in HPC AI500 V2.0, we evaluate HPC AI500 V3.0 on typical HPC systems, and the results show it has near-linear scalability. Furthermore, based on the customizable design, we present a case study to perform a trade-off between AI model quality and its training speed. The source code of HPC AI500 V3.0 is publicly available from the HPC AI500 project homepage  <span>https://www.benchcouncil.org/aibench/hpcai500/</span><svg><path></path></svg>.</p></div>\",\"PeriodicalId\":100155,\"journal\":{\"name\":\"BenchCouncil Transactions on Benchmarks, Standards and Evaluations\",\"volume\":\"2 4\",\"pages\":\"Article 100083\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2772485922000709/pdfft?md5=00638cfe1547defd3a92b0141166ca91&pid=1-s2.0-S2772485922000709-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BenchCouncil Transactions on Benchmarks, Standards and Evaluations\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2772485922000709\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772485922000709","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

近年来,高性能计算(HPC)和人工智能(AI)的融合使得社区迫切需要一个基准来指导下一代可扩展的HPC AI系统的设计。HPL基准测试和TOP500排名的成功表明,可扩展性是评估HPC系统的基本要求。然而,在深度学习(DL)等新兴人工智能工作负载方面的可扩展性带来了不小的挑战。本文正式而系统地分析了限制深度学习工作负载可扩展性的因素,并提出了HPC AI500 v3.0,一个可扩展的HPC AI基准测试框架。HPC AI500 V3.0方法论的灵感来自于bagging,它利用了基本模型集合的集体智慧,并使基准能够自适应地扩展到不同规模的HPC系统。我们以高度可定制的方式实现HPC AI500 V3.0,从系统和算法层面保持各种优化的空间。通过重用HPC AI500 V2.0中的代表性工作负载,我们在典型的HPC系统上对HPC AI500 V3.0进行了评估,结果表明它具有近似线性的可扩展性。此外,基于可定制设计,我们提出了一个案例研究,在人工智能模型质量和训练速度之间进行权衡。HPC AI500 V3.0的源代码可从HPC AI500项目主页https://www.benchcouncil.org/aibench/hpcai500/公开获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
HPC AI500 V3.0: A scalable HPC AI benchmarking framework

In recent years, the convergence of High Performance Computing (HPC) and artificial intelligence (AI) makes the community desperately need a benchmark to guide the design of next-generation scalable HPC AI systems. The success of the HPL benchmarks and the affiliated TOP500 ranking indicates that scalability is the fundamental requirement to evaluate HPC systems. However, being scalable in terms of these emerging AI workloads like deep learning (DL) raises nontrivial challenges. This paper formally and systematically analyzes the factor that limits scalability in DL workloads and presents HPC AI500 v3.0, a scalable HPC AI benchmarking framework. The HPC AI500 V3.0 methodology is inspired by bagging, which utilizes the collective wisdom of an ensemble of base models and enables the benchmarks to be adaptively scalable to different scales of HPC systems. We implement HPC AI500 V3.0 in a highly customizable manner, maintaining the space of various optimization from both system and algorithm levels. By reusing the representative workloads in HPC AI500 V2.0, we evaluate HPC AI500 V3.0 on typical HPC systems, and the results show it has near-linear scalability. Furthermore, based on the customizable design, we present a case study to perform a trade-off between AI model quality and its training speed. The source code of HPC AI500 V3.0 is publicly available from the HPC AI500 project homepage  https://www.benchcouncil.org/aibench/hpcai500/.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
4.80
自引率
0.00%
发文量
0
期刊最新文献
Evaluation of mechanical properties of natural fiber based polymer composite Could bibliometrics reveal top science and technology achievements and researchers? The case for evaluatology-based science and technology evaluation Table of Contents BinCodex: A comprehensive and multi-level dataset for evaluating binary code similarity detection techniques Analyzing the impact of opportunistic maintenance optimization on manufacturing industries in Bangladesh: An empirical study
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1