首页 > 最新文献

ISC Workshops最新文献

英文 中文
Challenges and Opportunities for RISC-V Architectures towards Genomics-based Workloads 面向基因组工作负载的RISC-V架构的挑战与机遇
Pub Date : 2023-06-27 DOI: 10.48550/arXiv.2306.15562
Gonzalo Gómez-Sánchez, A. Call, Xavier Teruel, Lorena Alonso, Ignasi Morán, Miguel Angel Perez, D. Torrents, J. L. Berral
The use of large-scale supercomputing architectures is a hard requirement for scientific computing Big-Data applications. An example is genomics analytics, where millions of data transformations and tests per patient need to be done to find relevant clinical indicators. Therefore, to ensure open and broad access to high-performance technologies, governments, and academia are pushing toward the introduction of novel computing architectures in large-scale scientific environments. This is the case of RISC-V, an open-source and royalty-free instruction-set architecture. To evaluate such technologies, here we present the Variant-Interaction Analytics use case benchmarking suite and datasets. Through this use case, we search for possible genetic interactions using computational and statistical methods, providing a representative case for heavy ETL (Extract, Transform, Load) data processing. Current implementations are implemented in x86-based supercomputers (e.g. MareNostrum-IV at the Barcelona Supercomputing Center (BSC)), and future steps propose RISC-V as part of the next MareNostrum generations. Here we describe the Variant Interaction Use Case, highlighting the characteristics leveraging high-performance computing, indicating the caveats and challenges towards the next RISC-V developments and designs to come from a first comparison between x86 and RISC-V architectures on real Variant Interaction executions over real hardware implementations.
大规模超级计算架构的使用是科学计算大数据应用的硬性要求。基因组学分析就是一个例子,需要对每个患者进行数百万次数据转换和测试,才能找到相关的临床指标。因此,为了确保对高性能技术的开放和广泛访问,政府和学术界正在推动在大规模科学环境中引入新颖的计算体系结构。RISC-V就是这种情况,它是一种开源且免版税的指令集架构。为了评估这些技术,我们在这里展示了变量交互分析用例基准套件和数据集。通过这个用例,我们使用计算和统计方法搜索可能的遗传相互作用,为大量ETL(提取、转换、加载)数据处理提供了一个代表性的用例。当前的实现是在基于x86的超级计算机中实现的(例如巴塞罗那超级计算中心(BSC)的MareNostrum- iv),未来的步骤建议将RISC-V作为下一代MareNostrum的一部分。在这里,我们描述了变体交互用例,突出了利用高性能计算的特征,指出了下一个RISC-V开发和设计的警告和挑战,这些警告和挑战来自于x86和RISC-V架构在真实硬件实现上的真实变体交互执行的第一次比较。
{"title":"Challenges and Opportunities for RISC-V Architectures towards Genomics-based Workloads","authors":"Gonzalo Gómez-Sánchez, A. Call, Xavier Teruel, Lorena Alonso, Ignasi Morán, Miguel Angel Perez, D. Torrents, J. L. Berral","doi":"10.48550/arXiv.2306.15562","DOIUrl":"https://doi.org/10.48550/arXiv.2306.15562","url":null,"abstract":"The use of large-scale supercomputing architectures is a hard requirement for scientific computing Big-Data applications. An example is genomics analytics, where millions of data transformations and tests per patient need to be done to find relevant clinical indicators. Therefore, to ensure open and broad access to high-performance technologies, governments, and academia are pushing toward the introduction of novel computing architectures in large-scale scientific environments. This is the case of RISC-V, an open-source and royalty-free instruction-set architecture. To evaluate such technologies, here we present the Variant-Interaction Analytics use case benchmarking suite and datasets. Through this use case, we search for possible genetic interactions using computational and statistical methods, providing a representative case for heavy ETL (Extract, Transform, Load) data processing. Current implementations are implemented in x86-based supercomputers (e.g. MareNostrum-IV at the Barcelona Supercomputing Center (BSC)), and future steps propose RISC-V as part of the next MareNostrum generations. Here we describe the Variant Interaction Use Case, highlighting the characteristics leveraging high-performance computing, indicating the caveats and challenges towards the next RISC-V developments and designs to come from a first comparison between x86 and RISC-V architectures on real Variant Interaction executions over real hardware implementations.","PeriodicalId":345133,"journal":{"name":"ISC Workshops","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116139531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Software Development Vehicles to enable extended and early co-design: a RISC-V and HPC case of study 支持扩展和早期协同设计的软件开发工具:RISC-V和HPC案例研究
Pub Date : 2023-06-01 DOI: 10.48550/arXiv.2306.01797
F. Mantovani, Pablo Vizcaino, Fabio Banchelli, M. Garcia-Gasulla, R. Ferrer, Giorgos Ieronymakis, Nikos Dimou, Vassilis D. Papaefstathiou, Jesús Labarta
Prototyping HPC systems with low-to-mid technology readiness level (TRL) systems is critical for providing feedback to hardware designers, the system software team (e.g., compiler developers), and early adopters from the scientific community. The typical approach to hardware design and HPC system prototyping often limits feedback or only allows it at a late stage. In this paper, we present a set of tools for co-designing HPC systems, called software development vehicles (SDV). We use an innovative RISC-V design as a demonstrator, which includes a scalar CPU and a vector processing unit capable of operating large vectors up to 16 kbits. We provide an incremental methodology and early tangible evidence of the co-design process that provide feedback to improve both architecture and system software at a very early stage of system development.
采用中低技术准备水平(TRL)系统的HPC系统原型对于向硬件设计人员、系统软件团队(例如编译器开发人员)和科学界的早期采用者提供反馈至关重要。硬件设计和HPC系统原型的典型方法通常会限制反馈,或者只在后期阶段允许反馈。在本文中,我们提出了一套用于协同设计HPC系统的工具,称为软件开发工具(SDV)。我们使用创新的RISC-V设计作为演示,其中包括一个标量CPU和一个能够处理高达16 kb的大矢量的矢量处理单元。我们提供了一种增量的方法和早期切实的共同设计过程的证据,这些过程在系统开发的早期阶段提供反馈,以改进体系结构和系统软件。
{"title":"Software Development Vehicles to enable extended and early co-design: a RISC-V and HPC case of study","authors":"F. Mantovani, Pablo Vizcaino, Fabio Banchelli, M. Garcia-Gasulla, R. Ferrer, Giorgos Ieronymakis, Nikos Dimou, Vassilis D. Papaefstathiou, Jesús Labarta","doi":"10.48550/arXiv.2306.01797","DOIUrl":"https://doi.org/10.48550/arXiv.2306.01797","url":null,"abstract":"Prototyping HPC systems with low-to-mid technology readiness level (TRL) systems is critical for providing feedback to hardware designers, the system software team (e.g., compiler developers), and early adopters from the scientific community. The typical approach to hardware design and HPC system prototyping often limits feedback or only allows it at a late stage. In this paper, we present a set of tools for co-designing HPC systems, called software development vehicles (SDV). We use an innovative RISC-V design as a demonstrator, which includes a scalar CPU and a vector processing unit capable of operating large vectors up to 16 kbits. We provide an incremental methodology and early tangible evidence of the co-design process that provide feedback to improve both architecture and system software at a very early stage of system development.","PeriodicalId":345133,"journal":{"name":"ISC Workshops","volume":"2015 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127746546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Backporting RISC-V Vector assembly 支持RISC-V矢量组件
Pub Date : 2023-04-20 DOI: 10.48550/arXiv.2304.10324
Joseph K. L. Lee, Maurice Jamieson, Nick Brown
Leveraging vectorisation, the ability for a CPU to apply operations to multiple elements of data concurrently, is critical for high performance workloads. However, at the time of writing, commercially available physical RISC-V hardware that provides the RISC-V vector extension (RVV) only supports version 0.7.1, which is incompatible with the latest ratified version 1.0. The challenge is that upstream compiler toolchains, such as Clang, only target the ratified v1.0 and do not support the older v0.7.1. Because v1.0 is not compatible with v0.7.1, the only way to program vectorised code is to use a vendor-provided, older compiler. In this paper we introduce the rvv-rollback tool which translates assembly code generated by the compiler using vector extension v1.0 instructions to v0.7.1. We utilise this tool to compare vectorisation performance of the vendor-provided GNU 8.4 compiler (supports v0.7.1) against LLVM 15.0 (supports only v1.0), where we found that the LLVM compiler is capable of auto-vectorising more computational kernels, and delivers greater performance than GNU in most, but not all, cases. We also tested LLVM vectorisation with vector length agnostic and specific settings, and observed cases with significant difference in performance.
利用向量化,即CPU同时对多个数据元素应用操作的能力,对于高性能工作负载至关重要。然而,在撰写本文时,提供RISC-V矢量扩展(RVV)的商用物理RISC-V硬件仅支持0.7.1版本,这与最新批准的1.0版本不兼容。挑战在于上游编译器工具链,如Clang,只针对已批准的v1.0,而不支持较旧的v0.7.1。由于v1.0与v0.7.1不兼容,编写向量化代码的唯一方法是使用供应商提供的较旧的编译器。在本文中,我们介绍了rvv-rollback工具,它将编译器使用向量扩展v1.0指令生成的汇编代码转换为v0.7.1。我们利用这个工具来比较供应商提供的GNU 8.4编译器(支持v0.7.1)和LLVM 15.0(只支持v1.0)的矢量化性能,在LLVM编译器中,我们发现LLVM编译器能够自动向量化更多的计算内核,并且在大多数(但不是全部)情况下提供比GNU更好的性能。我们还测试了具有向量长度不可知和特定设置的LLVM矢量化,并观察到性能有显着差异的情况。
{"title":"Backporting RISC-V Vector assembly","authors":"Joseph K. L. Lee, Maurice Jamieson, Nick Brown","doi":"10.48550/arXiv.2304.10324","DOIUrl":"https://doi.org/10.48550/arXiv.2304.10324","url":null,"abstract":"Leveraging vectorisation, the ability for a CPU to apply operations to multiple elements of data concurrently, is critical for high performance workloads. However, at the time of writing, commercially available physical RISC-V hardware that provides the RISC-V vector extension (RVV) only supports version 0.7.1, which is incompatible with the latest ratified version 1.0. The challenge is that upstream compiler toolchains, such as Clang, only target the ratified v1.0 and do not support the older v0.7.1. Because v1.0 is not compatible with v0.7.1, the only way to program vectorised code is to use a vendor-provided, older compiler. In this paper we introduce the rvv-rollback tool which translates assembly code generated by the compiler using vector extension v1.0 instructions to v0.7.1. We utilise this tool to compare vectorisation performance of the vendor-provided GNU 8.4 compiler (supports v0.7.1) against LLVM 15.0 (supports only v1.0), where we found that the LLVM compiler is capable of auto-vectorising more computational kernels, and delivers greater performance than GNU in most, but not all, cases. We also tested LLVM vectorisation with vector length agnostic and specific settings, and observed cases with significant difference in performance.","PeriodicalId":345133,"journal":{"name":"ISC Workshops","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116499860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Test-driving RISC-V Vector hardware for HPC 测试驱动RISC-V矢量硬件的HPC
Pub Date : 2023-04-20 DOI: 10.48550/arXiv.2304.10319
Joseph K. L. Lee, Maurice Jamieson, Nick Brown, Ricardo Jesus
Whilst the RISC-V Vector extension (RVV) has been ratified, at the time of writing both hardware implementations and open source software support are still limited for vectorisation on RISC-V. This is important because vectorisation is crucial to obtaining good performance for High Performance Computing (HPC) workloads and, as of April 2023, the Allwinner D1 SoC, containing the XuanTie C906 processor, is the only mass-produced and commercially available hardware supporting RVV. This paper surveys the current state of RISC-V vectorisation as of 2023, reporting the landscape of both the hardware and software ecosystem. Driving our discussion from experiences in setting up the Allwinner D1 as part of the EPCC RISC-V testbed, we report the results of benchmarking the Allwinner D1 using the RAJA Performance Suite, which demonstrated reasonable vectorisation speedup using vendor-provided compiler, as well as favourable performance compared to the StarFive VisionFive V2 with SiFive's U74 processor.
虽然RISC-V矢量扩展(RVV)已经被批准,但在编写硬件实现和开源软件支持时,RISC-V的矢量化仍然受到限制。这一点很重要,因为向量化对于高性能计算(HPC)工作负载获得良好性能至关重要,截至2023年4月,包含萱铁C906处理器的Allwinner D1 SoC是唯一批量生产和商用的支持RVV的硬件。本文调查了截至2023年的RISC-V矢量化的现状,报告了硬件和软件生态系统的前景。从将Allwinner D1设置为EPCC RISC-V测试平台的一部分的经验中推动我们的讨论,我们报告了使用RAJA性能套件对Allwinner D1进行基准测试的结果,该结果使用供应商提供的编译器证明了合理的矢量化加速,以及与带有SiFive U74处理器的StarFive VisionFive V2相比的有利性能。
{"title":"Test-driving RISC-V Vector hardware for HPC","authors":"Joseph K. L. Lee, Maurice Jamieson, Nick Brown, Ricardo Jesus","doi":"10.48550/arXiv.2304.10319","DOIUrl":"https://doi.org/10.48550/arXiv.2304.10319","url":null,"abstract":"Whilst the RISC-V Vector extension (RVV) has been ratified, at the time of writing both hardware implementations and open source software support are still limited for vectorisation on RISC-V. This is important because vectorisation is crucial to obtaining good performance for High Performance Computing (HPC) workloads and, as of April 2023, the Allwinner D1 SoC, containing the XuanTie C906 processor, is the only mass-produced and commercially available hardware supporting RVV. This paper surveys the current state of RISC-V vectorisation as of 2023, reporting the landscape of both the hardware and software ecosystem. Driving our discussion from experiences in setting up the Allwinner D1 as part of the EPCC RISC-V testbed, we report the results of benchmarking the Allwinner D1 using the RAJA Performance Suite, which demonstrated reasonable vectorisation speedup using vendor-provided compiler, as well as favourable performance compared to the StarFive VisionFive V2 with SiFive's U74 processor.","PeriodicalId":345133,"journal":{"name":"ISC Workshops","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116493164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Portability and Scalability of OpenMP Offloading on State-of-the-art Accelerators 最先进加速器上OpenMP卸载的可移植性和可扩展性
Pub Date : 2023-04-09 DOI: 10.48550/arXiv.2304.04276
Yehonatan Fridman, G. Tamir, Gal Oren
Over the last decade, most of the increase in computing power has been gained by advances in accelerated many-core architectures, mainly in the form of GPGPUs. While accelerators achieve phenomenal performances in various computing tasks, their utilization requires code adaptations and transformations. Thus, OpenMP, the most common standard for multi-threading in scientific computing applications, introduced offloading capabilities between host (CPUs) and accelerators since v4.0, with increasing support in the successive v4.5, v5.0, v5.1, and the latest v5.2 versions. Recently, two state-of-the-art GPUs -- the Intel Ponte Vecchio Max 1100 and the NVIDIA A100 GPUs -- were released to the market, with the oneAPI and NVHPC compilers for offloading, correspondingly. In this work, we present early performance results of OpenMP offloading capabilities to these devices while specifically analyzing the portability of advanced directives (using SOLLVE's OMPVV test suite) and the scalability of the hardware in representative scientific mini-app (the LULESH benchmark). Our results show that the coverage for version 4.5 is nearly complete in both latest NVHPC and oneAPI tools. However, we observed a lack of support in versions 5.0, 5.1, and 5.2, which is particularly noticeable when using NVHPC. From the performance perspective, we found that the PVC1100 and A100 are relatively comparable on the LULESH benchmark. While the A100 is slightly better due to faster memory bandwidth, the PVC1100 reaches the next problem size (400^3) scalably due to the larger memory size.
在过去的十年中,计算能力的大部分增长都是通过加速多核架构的进步获得的,主要是以gpgpu的形式。虽然加速器在各种计算任务中实现了惊人的性能,但它们的使用需要对代码进行调整和转换。因此,作为科学计算应用程序中最常见的多线程标准,OpenMP从v4.0开始在主机(cpu)和加速器之间引入了卸载功能,并在随后的v4.5、v5.0、v5.1和最新的v5.2版本中增加了支持。最近,两款最先进的gpu——英特尔Ponte Vecchio Max 1100和NVIDIA A100 gpu——发布到市场上,相应的,有一个api和NVHPC编译器用于卸载。在这项工作中,我们展示了OpenMP卸载功能到这些设备的早期性能结果,同时特别分析了高级指令的可移植性(使用SOLLVE的OMPVV测试套件)和代表性科学迷你应用程序(LULESH基准)中硬件的可扩展性。我们的结果表明,在最新的NVHPC和oneAPI工具中,4.5版本的覆盖几乎完全。然而,我们观察到在5.0、5.1和5.2版本中缺乏支持,这在使用NVHPC时尤其明显。从性能的角度来看,我们发现PVC1100和A100在LULESH基准上是相对可比性的。虽然A100由于更快的内存带宽而略好,但PVC1100由于更大的内存大小而可扩展地达到下一个问题大小(400^3)。
{"title":"Portability and Scalability of OpenMP Offloading on State-of-the-art Accelerators","authors":"Yehonatan Fridman, G. Tamir, Gal Oren","doi":"10.48550/arXiv.2304.04276","DOIUrl":"https://doi.org/10.48550/arXiv.2304.04276","url":null,"abstract":"Over the last decade, most of the increase in computing power has been gained by advances in accelerated many-core architectures, mainly in the form of GPGPUs. While accelerators achieve phenomenal performances in various computing tasks, their utilization requires code adaptations and transformations. Thus, OpenMP, the most common standard for multi-threading in scientific computing applications, introduced offloading capabilities between host (CPUs) and accelerators since v4.0, with increasing support in the successive v4.5, v5.0, v5.1, and the latest v5.2 versions. Recently, two state-of-the-art GPUs -- the Intel Ponte Vecchio Max 1100 and the NVIDIA A100 GPUs -- were released to the market, with the oneAPI and NVHPC compilers for offloading, correspondingly. In this work, we present early performance results of OpenMP offloading capabilities to these devices while specifically analyzing the portability of advanced directives (using SOLLVE's OMPVV test suite) and the scalability of the hardware in representative scientific mini-app (the LULESH benchmark). Our results show that the coverage for version 4.5 is nearly complete in both latest NVHPC and oneAPI tools. However, we observed a lack of support in versions 5.0, 5.1, and 5.2, which is particularly noticeable when using NVHPC. From the performance perspective, we found that the PVC1100 and A100 are relatively comparable on the LULESH benchmark. While the A100 is slightly better due to faster memory bandwidth, the PVC1100 reaches the next problem size (400^3) scalably due to the larger memory size.","PeriodicalId":345133,"journal":{"name":"ISC Workshops","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127897643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Precise Energy Consumption Measurements of Heterogeneous Artificial Intelligence Workloads 异构人工智能工作负载的精确能耗测量
Pub Date : 2022-12-03 DOI: 10.48550/arXiv.2212.01698
R. Caspart, Sebastian Ziegler, Arvid Weyrauch, Holger Obermaier, Simon Raffeiner, Leonie Schuhmacher, J. Scholtyssek, D. Trofimova, M. Nolden, I. Reinartz, Fabian Isensee, Markus Goetz, C. Debus
. With the rise of artificial intelligence (AI) in recent years and the subsequent increase in complexity of the applied models, the growing demand in computational resources is starting to pose a signif-icant challenge. The need for higher compute power is being met with increasingly more potent accelerator hardware as well as the use of large and powerful compute clusters. However, the gain in prediction accuracy from large models trained on distributed and accelerated systems ulti-mately comes at the price of a substantial increase in energy demand, and researchers have started questioning the environmental friendliness of such AI methods at scale. Consequently, awareness of energy efficiency plays an important role for AI model developers and hardware infrastructure operators likewise. The energy consumption of AI workloads depends both on the model implementation and the composition of the utilized hardware. Therefore, accurate measurements of the power draw of respective AI workflows on different types of compute nodes is key to algorithmic improvements and the design of future compute clusters and hardware. Towards this end, we present measurements of the energy consumption of two typical applications of deep learning models on different types of heterogeneous compute nodes. Our results indicate that 1. contrary to common approaches, deriving energy consumption directly from runtime is not accurate, but the consumption of the compute node needs to be considered regarding its composition; 2. neglecting accelerator hardware on mixed nodes results in overproportional ineffi-ciency regarding energy consumption; 3. energy consumption of model training and inference should be considered separately – while training on GPUs outperforms all other node types regarding both runtime and energy consumption, inference on CPU nodes can be comparably efficient. One advantage of our approach is the fact that the information on energy consumption is available to all users of the supercomputer and not just those with administrator rights, enabling an easy transfer to other workloads alongside a raise in user-awareness of energy consumption.
. 近年来,随着人工智能(AI)的兴起,应用模型的复杂性随之增加,对计算资源的需求不断增长,开始构成重大挑战。越来越强大的加速器硬件以及大型和强大的计算集群的使用满足了对更高计算能力的需求。然而,在分布式和加速系统上训练的大型模型的预测准确性的提高最终是以能源需求的大幅增加为代价的,研究人员已经开始质疑这种大规模人工智能方法的环境友好性。因此,能效意识对人工智能模型开发者和硬件基础设施运营商同样起着重要作用。人工智能工作负载的能耗取决于模型实现和所使用硬件的组成。因此,准确测量不同类型计算节点上各自AI工作流的功耗是算法改进和未来计算集群和硬件设计的关键。为此,我们在不同类型的异构计算节点上对深度学习模型的两个典型应用的能耗进行了测量。我们的结果表明:1。与一般方法相反,直接从运行时计算能耗并不准确,但需要考虑计算节点的构成;2. 忽略混合节点上的加速器硬件会导致能量消耗方面的不成比例的低效率;3.模型训练和推理的能量消耗应该分开考虑——虽然gpu上的训练在运行时间和能量消耗方面优于所有其他节点类型,但CPU节点上的推理可以相当高效。我们的方法的一个优点是,有关能耗的信息对超级计算机的所有用户都是可用的,而不仅仅是那些具有管理员权限的用户,从而可以轻松地转移到其他工作负载,同时提高用户对能耗的认识。
{"title":"Precise Energy Consumption Measurements of Heterogeneous Artificial Intelligence Workloads","authors":"R. Caspart, Sebastian Ziegler, Arvid Weyrauch, Holger Obermaier, Simon Raffeiner, Leonie Schuhmacher, J. Scholtyssek, D. Trofimova, M. Nolden, I. Reinartz, Fabian Isensee, Markus Goetz, C. Debus","doi":"10.48550/arXiv.2212.01698","DOIUrl":"https://doi.org/10.48550/arXiv.2212.01698","url":null,"abstract":". With the rise of artificial intelligence (AI) in recent years and the subsequent increase in complexity of the applied models, the growing demand in computational resources is starting to pose a signif-icant challenge. The need for higher compute power is being met with increasingly more potent accelerator hardware as well as the use of large and powerful compute clusters. However, the gain in prediction accuracy from large models trained on distributed and accelerated systems ulti-mately comes at the price of a substantial increase in energy demand, and researchers have started questioning the environmental friendliness of such AI methods at scale. Consequently, awareness of energy efficiency plays an important role for AI model developers and hardware infrastructure operators likewise. The energy consumption of AI workloads depends both on the model implementation and the composition of the utilized hardware. Therefore, accurate measurements of the power draw of respective AI workflows on different types of compute nodes is key to algorithmic improvements and the design of future compute clusters and hardware. Towards this end, we present measurements of the energy consumption of two typical applications of deep learning models on different types of heterogeneous compute nodes. Our results indicate that 1. contrary to common approaches, deriving energy consumption directly from runtime is not accurate, but the consumption of the compute node needs to be considered regarding its composition; 2. neglecting accelerator hardware on mixed nodes results in overproportional ineffi-ciency regarding energy consumption; 3. energy consumption of model training and inference should be considered separately – while training on GPUs outperforms all other node types regarding both runtime and energy consumption, inference on CPU nodes can be comparably efficient. One advantage of our approach is the fact that the information on energy consumption is available to all users of the supercomputer and not just those with administrator rights, enabling an easy transfer to other workloads alongside a raise in user-awareness of energy consumption.","PeriodicalId":345133,"journal":{"name":"ISC Workshops","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125602079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Workflows to driving high-performance interactive supercomputing for urgent decision making 为紧急决策制定驱动高性能交互式超级计算的工作流程
Pub Date : 2022-06-28 DOI: 10.48550/arXiv.2206.14103
Nick Brown, R. Nash, G. Gibb, E. Belikov, Artur Podobas, W. Chien, S. Markidis, M. Flatken, A. Gerndt
. Interactive urgent computing is a small but growing user of supercomputing resources. However there are numerous technical challenges that must be overcome to make supercomputers fully suited to the wide range of urgent workloads which could benefit from the computational power delivered by such instruments. An important question is how to connect the different components of an urgent workload; namely the users, the simulation codes, and external data sources, together in a structured and accessible manner. In this paper we explore the role of workflows from both the perspective of marshalling and control of urgent workloads, and at the individual HPC machine level. Ultimately requiring two workflow systems, by using a space weather prediction urgent use-cases, we explore the benefit that these two workflow systems provide especially when one exploits the flexibility enabled by them interoperating.
. 交互式紧急计算是超级计算资源的一个小而不断增长的用户。然而,要使超级计算机完全适应各种各样的紧急工作负载,必须克服许多技术挑战,这些工作负载可以从这些仪器提供的计算能力中受益。一个重要的问题是如何连接紧急工作负载的不同组成部分;即将用户、仿真代码和外部数据源以结构化和可访问的方式组合在一起。在本文中,我们从编组和控制紧急工作负载的角度以及在单个HPC机器级别探索工作流的作用。最终需要两个工作流系统,通过使用空间天气预报紧急用例,我们探索了这两个工作流系统提供的好处,特别是当一个人利用它们互操作所带来的灵活性时。
{"title":"Workflows to driving high-performance interactive supercomputing for urgent decision making","authors":"Nick Brown, R. Nash, G. Gibb, E. Belikov, Artur Podobas, W. Chien, S. Markidis, M. Flatken, A. Gerndt","doi":"10.48550/arXiv.2206.14103","DOIUrl":"https://doi.org/10.48550/arXiv.2206.14103","url":null,"abstract":". Interactive urgent computing is a small but growing user of supercomputing resources. However there are numerous technical challenges that must be overcome to make supercomputers fully suited to the wide range of urgent workloads which could benefit from the computational power delivered by such instruments. An important question is how to connect the different components of an urgent workload; namely the users, the simulation codes, and external data sources, together in a structured and accessible manner. In this paper we explore the role of workflows from both the perspective of marshalling and control of urgent workloads, and at the individual HPC machine level. Ultimately requiring two workflow systems, by using a space weather prediction urgent use-cases, we explore the benefit that these two workflow systems provide especially when one exploits the flexibility enabled by them interoperating.","PeriodicalId":345133,"journal":{"name":"ISC Workshops","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127585912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic Tuning of Tensorflow's CPU Backend using Gradient-Free Optimization Algorithms 使用无梯度优化算法自动调整Tensorflow的CPU后端
Pub Date : 2021-09-13 DOI: 10.1007/978-3-030-90539-2_17
Derssie Mebratu, N. Hasabnis, Pietro Mercati, Gaurit Sharma, S. Najnin
{"title":"Automatic Tuning of Tensorflow's CPU Backend using Gradient-Free Optimization Algorithms","authors":"Derssie Mebratu, N. Hasabnis, Pietro Mercati, Gaurit Sharma, S. Najnin","doi":"10.1007/978-3-030-90539-2_17","DOIUrl":"https://doi.org/10.1007/978-3-030-90539-2_17","url":null,"abstract":"","PeriodicalId":345133,"journal":{"name":"ISC Workshops","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128074663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Negative Perceptions About the Applicability of Source-to-Source Compilers in HPC: A Literature Review 对HPC中源对源编译器适用性的负面看法:文献综述
Pub Date : 2021-07-01 DOI: 10.1007/978-3-030-90539-2_16
Reed Milewicz, P. Pirkelbauer, Prema Soundararajan, H. Ahmed, A. Skjellum
{"title":"Negative Perceptions About the Applicability of Source-to-Source Compilers in HPC: A Literature Review","authors":"Reed Milewicz, P. Pirkelbauer, Prema Soundararajan, H. Ahmed, A. Skjellum","doi":"10.1007/978-3-030-90539-2_16","DOIUrl":"https://doi.org/10.1007/978-3-030-90539-2_16","url":null,"abstract":"","PeriodicalId":345133,"journal":{"name":"ISC Workshops","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129859667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Lettuce: PyTorch-based Lattice Boltzmann Framework Lettuce:基于pytorch的晶格玻尔兹曼框架
Pub Date : 2021-06-24 DOI: 10.1007/978-3-030-90539-2_3
Mario Bedrunka, D. Wilde, Martin L. Kliemank, D. Reith, H. Foysi, Andreas Krämer
{"title":"Lettuce: PyTorch-based Lattice Boltzmann Framework","authors":"Mario Bedrunka, D. Wilde, Martin L. Kliemank, D. Reith, H. Foysi, Andreas Krämer","doi":"10.1007/978-3-030-90539-2_3","DOIUrl":"https://doi.org/10.1007/978-3-030-90539-2_3","url":null,"abstract":"","PeriodicalId":345133,"journal":{"name":"ISC Workshops","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115676863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
ISC Workshops
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1