首页 > 最新文献

International Journal of High Performance Computing Applications最新文献

英文 中文
TwoFold: Highly accurate structure and affinity prediction for protein-ligand complexes from sequences 双重:高度精确的结构和亲和预测从序列的蛋白质配体复合物
3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-10-30 DOI: 10.1177/10943420231201151
Darren J Hsu, Hao Lu, Aditya Kashi, Michael Matheson, John Gounley, Feiyi Wang, Wayne Joubert, Jens Glaser
We describe our development of ab initio protein-ligand binding pose prediction models based on transformers and binding affinity prediction models based on the neural tangent kernel (NTK). Folding both protein and ligand, the TwoFold models achieve efficient and quality predictions matching state-of-the-art implementations while additionally reconstructing protein structures. Solving NTK models points to a new use case for highly optimized linear solver benchmarking codes on HPC.
我们描述了基于变压器的从头算蛋白质配体结合位姿预测模型和基于神经切线核(NTK)的结合亲和力预测模型的开发。折叠蛋白质和配体,TwoFold模型实现了与最先进的实现相匹配的高效和高质量的预测,同时还重建了蛋白质结构。解决NTK模型指向了HPC上高度优化的线性求解器基准代码的新用例。
{"title":"TwoFold: Highly accurate structure and affinity prediction for protein-ligand complexes from sequences","authors":"Darren J Hsu, Hao Lu, Aditya Kashi, Michael Matheson, John Gounley, Feiyi Wang, Wayne Joubert, Jens Glaser","doi":"10.1177/10943420231201151","DOIUrl":"https://doi.org/10.1177/10943420231201151","url":null,"abstract":"We describe our development of ab initio protein-ligand binding pose prediction models based on transformers and binding affinity prediction models based on the neural tangent kernel (NTK). Folding both protein and ligand, the TwoFold models achieve efficient and quality predictions matching state-of-the-art implementations while additionally reconstructing protein structures. Solving NTK models points to a new use case for highly optimized linear solver benchmarking codes on HPC.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"191 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136069417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics GenSLMs:基因组尺度语言模型揭示了SARS-CoV-2的进化动态
3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-10-27 DOI: 10.1177/10943420231201154
Maxim Zvyagin, Alexander Brace, Kyle Hippe, Yuntian Deng, Bin Zhang, Cindy Orozco Bohorquez, Austin Clyde, Bharat Kale, Danilo Perez-Rivera, Heng Ma, Carla M. Mann, Michael Irvin, Defne G. Ozgulbas, Natalia Vassilieva, James Gregory Pauloski, Logan Ward, Valerie Hayot-Sasson, Murali Emani, Sam Foreman, Zhen Xie, Diangen Lin, Maulik Shukla, Weili Nie, Josh Romero, Christian Dallago, Arash Vahdat, Chaowei Xiao, Thomas Gibbs, Ian Foster, James J. Davis, Michael E. Papka, Thomas Brettin, Rick Stevens, Anima Anandkumar, Venkatram Vishwanath, Arvind Ramanathan
We seek to transform how new and emergent variants of pandemic-causing viruses, specifically SARS-CoV-2, are identified and classified. By adapting large language models (LLMs) for genomic data, we build genome-scale language models (GenSLMs) which can learn the evolutionary landscape of SARS-CoV-2 genomes. By pre-training on over 110 million prokaryotic gene sequences and fine-tuning a SARS-CoV-2-specific model on 1.5 million genomes, we show that GenSLMs can accurately and rapidly identify variants of concern. Thus, to our knowledge, GenSLMs represents one of the first whole-genome scale foundation models which can generalize to other prediction tasks. We demonstrate scaling of GenSLMs on GPU-based supercomputers and AI-hardware accelerators utilizing 1.63 Zettaflops in training runs with a sustained performance of 121 PFLOPS in mixed precision and peak of 850 PFLOPS. We present initial scientific insights from examining GenSLMs in tracking evolutionary dynamics of SARS-CoV-2, paving the path to realizing this on large biological data.
我们寻求改变如何识别和分类引起大流行的新病毒和新出现的病毒变体,特别是SARS-CoV-2。通过适应基因组数据的大语言模型(large language models, LLMs),我们构建了能够了解SARS-CoV-2基因组进化格局的基因组尺度语言模型(genome-scale language models, GenSLMs)。通过对超过1.1亿个原核基因序列进行预训练,并对150万个基因组上的sars - cov -2特异性模型进行微调,我们发现GenSLMs可以准确快速地识别相关变异。因此,据我们所知,GenSLMs代表了第一个全基因组规模的基础模型之一,可以推广到其他预测任务。我们演示了genslm在基于gpu的超级计算机和ai硬件加速器上的扩展,在训练运行中使用1.63 zttaflops,混合精度持续性能为121 PFLOPS,峰值为850 PFLOPS。我们通过研究genslm来跟踪SARS-CoV-2的进化动力学,提出了初步的科学见解,为在大型生物数据上实现这一目标铺平了道路。
{"title":"GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics","authors":"Maxim Zvyagin, Alexander Brace, Kyle Hippe, Yuntian Deng, Bin Zhang, Cindy Orozco Bohorquez, Austin Clyde, Bharat Kale, Danilo Perez-Rivera, Heng Ma, Carla M. Mann, Michael Irvin, Defne G. Ozgulbas, Natalia Vassilieva, James Gregory Pauloski, Logan Ward, Valerie Hayot-Sasson, Murali Emani, Sam Foreman, Zhen Xie, Diangen Lin, Maulik Shukla, Weili Nie, Josh Romero, Christian Dallago, Arash Vahdat, Chaowei Xiao, Thomas Gibbs, Ian Foster, James J. Davis, Michael E. Papka, Thomas Brettin, Rick Stevens, Anima Anandkumar, Venkatram Vishwanath, Arvind Ramanathan","doi":"10.1177/10943420231201154","DOIUrl":"https://doi.org/10.1177/10943420231201154","url":null,"abstract":"We seek to transform how new and emergent variants of pandemic-causing viruses, specifically SARS-CoV-2, are identified and classified. By adapting large language models (LLMs) for genomic data, we build genome-scale language models (GenSLMs) which can learn the evolutionary landscape of SARS-CoV-2 genomes. By pre-training on over 110 million prokaryotic gene sequences and fine-tuning a SARS-CoV-2-specific model on 1.5 million genomes, we show that GenSLMs can accurately and rapidly identify variants of concern. Thus, to our knowledge, GenSLMs represents one of the first whole-genome scale foundation models which can generalize to other prediction tasks. We demonstrate scaling of GenSLMs on GPU-based supercomputers and AI-hardware accelerators utilizing 1.63 Zettaflops in training runs with a sustained performance of 121 PFLOPS in mixed precision and peak of 850 PFLOPS. We present initial scientific insights from examining GenSLMs in tracking evolutionary dynamics of SARS-CoV-2, paving the path to realizing this on large biological data.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136311318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
General framework for re-assuring numerical reliability in parallel Krylov solvers: A case of bi-conjugate gradient stabilized methods 再保证并行Krylov解数值可靠性的一般框架:双共轭梯度稳定方法的一个例子
3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-10-25 DOI: 10.1177/10943420231207642
Roman Iakymchuk, Stef Graillat, José I. Aliaga
Parallel implementations of Krylov subspace methods often help to accelerate the procedure of finding an approximate solution of a linear system. However, such parallelization coupled with asynchronous and out-of-order execution often makes more visible the non-associativity impact in floating-point operations. These problems are even amplified when communication-hiding pipelined algorithms are used to improve the parallelization of Krylov subspace methods. Introducing reproducibility in the implementations avoids these problems by getting more robust and correct solutions. This paper proposes a general framework for deriving reproducible and accurate variants of Krylov subspace methods. The proposed algorithmic strategies are reinforced by programmability suggestions to assure deterministic and accurate executions. The framework is illustrated on the preconditioned BiCGStab method and its pipelined modification, which in fact is a distinctive method from the Krylov subspace family, for the solution of non-symmetric linear systems with message-passing. Finally, we verify the numerical behavior of the two reproducible variants of BiCGStab on a set of matrices from the SuiteSparse Matrix Collection and a 3D Poisson’s equation.
Krylov子空间方法的并行实现通常有助于加快寻找线性系统近似解的过程。然而,这种并行化与异步和乱序执行相结合,通常会使浮点操作中的非关联性影响更加明显。当使用通信隐藏流水线算法来提高Krylov子空间方法的并行性时,这些问题甚至会被放大。在实现中引入可再现性可以通过获得更健壮和正确的解决方案来避免这些问题。本文提出了一种推导克雷洛夫子空间方法的可重复和精确变体的一般框架。所提出的算法策略通过可编程性建议得到加强,以确保确定性和准确的执行。本文给出了求解具有消息传递的非对称线性系统的一种不同于Krylov子空间族的预条件BiCGStab方法及其流水线改进方法的框架。最后,我们在一组来自SuiteSparse矩阵集合的矩阵和一个三维泊松方程上验证了BiCGStab的两个可重复变体的数值行为。
{"title":"General framework for re-assuring numerical reliability in parallel Krylov solvers: A case of bi-conjugate gradient stabilized methods","authors":"Roman Iakymchuk, Stef Graillat, José I. Aliaga","doi":"10.1177/10943420231207642","DOIUrl":"https://doi.org/10.1177/10943420231207642","url":null,"abstract":"Parallel implementations of Krylov subspace methods often help to accelerate the procedure of finding an approximate solution of a linear system. However, such parallelization coupled with asynchronous and out-of-order execution often makes more visible the non-associativity impact in floating-point operations. These problems are even amplified when communication-hiding pipelined algorithms are used to improve the parallelization of Krylov subspace methods. Introducing reproducibility in the implementations avoids these problems by getting more robust and correct solutions. This paper proposes a general framework for deriving reproducible and accurate variants of Krylov subspace methods. The proposed algorithmic strategies are reinforced by programmability suggestions to assure deterministic and accurate executions. The framework is illustrated on the preconditioned BiCGStab method and its pipelined modification, which in fact is a distinctive method from the Krylov subspace family, for the solution of non-symmetric linear systems with message-passing. Finally, we verify the numerical behavior of the two reproducible variants of BiCGStab on a set of matrices from the SuiteSparse Matrix Collection and a 3D Poisson’s equation.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"74 3-4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135218161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Role-shifting threads: Increasing OpenMP malleability to address load imbalance at MPI and OpenMP 角色转换线程:增加OpenMP的延展性以解决MPI和OpenMP的负载不平衡问题
3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-10-21 DOI: 10.1177/10943420231201153
Joel Criado, Victor Lopez, Joan Vinyals-Ylla-Catala, Guillem Ramirez-Miranda, Xavier Teruel, Marta Garcia-Gasulla
This paper presents the evolution of the free agent threads for OpenMP to the new role-shifting threads model and their integration with the Dynamic Load Balancing (DLB) library. We demonstrate how free agent threads can improve resource utilization in OpenMP applications with load imbalance in their nested parallel regions. We also demonstrate how DLB efficiently manages the malleability exposed by the role-shifting threads to address load imbalance issues. We use three real-world scientific applications, one of them to demonstrate that free agents alone can improve the OpenMP model without external tools, and two other MPI+OpenMP applications, one of them with a coupling case, to illustrate the potential of the free agent threads’ malleability with an external resource manager to increase the efficiency of the system. In addition, we demonstrate that the new implementation is more usable than the former one, letting the runtime system automatically make decisions that were made by the programmer previously. All software is released open-source.
本文介绍了OpenMP的自由代理线程向新的角色转换线程模型的演变及其与动态负载平衡(DLB)库的集成。我们将演示自由代理线程如何提高OpenMP应用程序中嵌套并行区域中负载不平衡的资源利用率。我们还演示了DLB如何有效地管理角色转换线程暴露的可伸缩性,以解决负载不平衡问题。我们使用了三个真实的科学应用程序,其中一个演示了单独使用自由代理可以在没有外部工具的情况下改进OpenMP模型,另外两个MPI+OpenMP应用程序,其中一个具有耦合情况,以说明使用外部资源管理器时自由代理线程的可塑性提高系统效率的潜力。此外,我们还演示了新的实现比以前的实现更有用,它允许运行时系统自动做出以前由程序员做出的决策。所有软件都是开源的。
{"title":"Role-shifting threads: Increasing OpenMP malleability to address load imbalance at MPI and OpenMP","authors":"Joel Criado, Victor Lopez, Joan Vinyals-Ylla-Catala, Guillem Ramirez-Miranda, Xavier Teruel, Marta Garcia-Gasulla","doi":"10.1177/10943420231201153","DOIUrl":"https://doi.org/10.1177/10943420231201153","url":null,"abstract":"This paper presents the evolution of the free agent threads for OpenMP to the new role-shifting threads model and their integration with the Dynamic Load Balancing (DLB) library. We demonstrate how free agent threads can improve resource utilization in OpenMP applications with load imbalance in their nested parallel regions. We also demonstrate how DLB efficiently manages the malleability exposed by the role-shifting threads to address load imbalance issues. We use three real-world scientific applications, one of them to demonstrate that free agents alone can improve the OpenMP model without external tools, and two other MPI+OpenMP applications, one of them with a coupling case, to illustrate the potential of the free agent threads’ malleability with an external resource manager to increase the efficiency of the system. In addition, we demonstrate that the new implementation is more usable than the former one, letting the runtime system automatically make decisions that were made by the programmer previously. All software is released open-source.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"3 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135511599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient implementation of low-order-precision smoothed particle hydrodynamics 低阶精度平滑粒子流体力学的有效实现
3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-09-14 DOI: 10.1177/10943420231201144
Natsuki Hosono, Mikito Furuichi
Smoothed particle hydrodynamics (SPH) method is widely accepted as a flexible numerical treatment for surface boundaries and interactions. High-resolution simulations of hydrodynamic events require high-performance computing (HPC). There is a need for an SPH code that runs efficiently on modern supercomputers involving accelerators such as NVIDIA or AMD graphics processing units. In this work, we applied half-precision, which is widely used in artificial intelligence, to the SPH method. However, improving HPC performance at such low-order precisions is a challenge. An as-is implementation with half-precision will have lower computational cost than that of float/double precision simulations, but also worsens the simulation accuracy. We propose a scaling and shifting method that maintains the simulation accuracy near the level of float/double precision. By examining the impact of half-precision on the simulation accuracy and time-to-solution, we demonstrated that the use of half-precision can improve the computational performance of SPH simulations for scientific purposes without sacrificing the accuracy. In addition, we demonstrated that the efficiency of half-precision depends on the architecture used.
光滑粒子流体力学(SPH)方法作为一种灵活的表面边界和相互作用数值处理方法被广泛接受。流体动力学事件的高分辨率模拟需要高性能计算(HPC)。我们需要一个能在现代超级计算机上高效运行的SPH代码,这些超级计算机包括NVIDIA或AMD图形处理单元等加速器。在这项工作中,我们将人工智能中广泛使用的半精度应用于SPH方法。然而,在如此低阶精度下提高高性能计算性能是一个挑战。半精度的原状实现比浮点/双精度仿真的计算成本低,但也会降低仿真精度。我们提出了一种缩放和移位方法,使模拟精度保持在浮点/双精度水平附近。通过研究半精度对模拟精度和求解时间的影响,我们证明了使用半精度可以在不牺牲精度的情况下提高科学目的的SPH模拟的计算性能。此外,我们还证明了半精度的效率取决于所使用的体系结构。
{"title":"Efficient implementation of low-order-precision smoothed particle hydrodynamics","authors":"Natsuki Hosono, Mikito Furuichi","doi":"10.1177/10943420231201144","DOIUrl":"https://doi.org/10.1177/10943420231201144","url":null,"abstract":"Smoothed particle hydrodynamics (SPH) method is widely accepted as a flexible numerical treatment for surface boundaries and interactions. High-resolution simulations of hydrodynamic events require high-performance computing (HPC). There is a need for an SPH code that runs efficiently on modern supercomputers involving accelerators such as NVIDIA or AMD graphics processing units. In this work, we applied half-precision, which is widely used in artificial intelligence, to the SPH method. However, improving HPC performance at such low-order precisions is a challenge. An as-is implementation with half-precision will have lower computational cost than that of float/double precision simulations, but also worsens the simulation accuracy. We propose a scaling and shifting method that maintains the simulation accuracy near the level of float/double precision. By examining the impact of half-precision on the simulation accuracy and time-to-solution, we demonstrated that the use of half-precision can improve the computational performance of SPH simulations for scientific purposes without sacrificing the accuracy. In addition, we demonstrated that the efficiency of half-precision depends on the architecture used.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134911412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Heterogeneous programming using OpenMP and CUDA/HIP for hybrid CPU-GPU scientific applications 使用OpenMP和CUDA/HIP的异构编程用于CPU-GPU混合科学应用
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-08-11 DOI: 10.1177/10943420231188079
Marc Gonzalez Tallada, E. Morancho
Hybrid computer systems combine compute units (CUs) of different nature like CPUs, GPUs and FPGAs. Simultaneously exploiting the computing power of these CUs requires a careful decomposition of the applications into balanced parallel tasks according to both the performance of each CU type and the communication costs among them. This paper describes the design and implementation of runtime support for OpenMP hybrid GPU-CPU applications, when mixed with GPU-oriented programming models (e.g. CUDA/HIP). The paper describes the case for a hybrid multi-level parallelization of the NPB-MZ benchmark suite. The implementation exploits both coarse-grain and fine-grain parallelism, mapped to compute units of different nature (GPUs and CPUs). The paper describes the implementation of runtime support to bridge OpenMP and HIP, introducing the abstractions of Computing Unit and Data Placement. We compare hybrid and non-hybrid executions under state-of-the-art schedulers for OpenMP: static and dynamic task schedulings. Then, we improve the set of schedulers with two additional variants: a memorizing-dynamic task scheduling and a profile-based static task scheduling. On a computing node composed of one AMD EPYC 7742 @ 2.250 GHz (64 cores and 2 threads/core, totalling 128 threads per node) and 2 × GPU AMD Radeon Instinct MI50 with 32 GB, hybrid executions present speedups from 1.10× up to 3.5× with respect to a non-hybrid GPU implementation, depending on the number of activated CUs.
混合计算机系统结合了不同性质的计算单元(CU),如CPU、GPU和FPGA。同时利用这些CU的计算能力需要根据每种CU类型的性能和它们之间的通信成本将应用程序仔细分解为平衡的并行任务。本文描述了当与面向GPU的编程模型(例如CUDA/HIP)混合时,OpenMP混合GPU-CPU应用程序的运行时支持的设计和实现。本文描述了NPB-MZ基准套件的混合多级并行化的情况。该实现利用了粗粒度和细粒度并行性,映射到不同性质的计算单元(GPU和CPU)。本文描述了运行时支持的实现,以桥接OpenMP和HIP,介绍了计算单元和数据放置的抽象。我们比较了在最先进的OpenMP调度器下的混合和非混合执行:静态和动态任务调度。然后,我们用两个额外的变体改进了调度器集:记忆动态任务调度和基于配置文件的静态任务调度。在由一个AMD EPYC 7742@2.250 GHz(64核和2个线程/核,每个节点总计128个线程)和2×GPU AMD Radeon Instinct MI50(32 GB)组成的计算节点上,混合执行相对于非混合GPU实现呈现从1.10×到3.5×的加速,具体取决于激活的CU的数量。
{"title":"Heterogeneous programming using OpenMP and CUDA/HIP for hybrid CPU-GPU scientific applications","authors":"Marc Gonzalez Tallada, E. Morancho","doi":"10.1177/10943420231188079","DOIUrl":"https://doi.org/10.1177/10943420231188079","url":null,"abstract":"Hybrid computer systems combine compute units (CUs) of different nature like CPUs, GPUs and FPGAs. Simultaneously exploiting the computing power of these CUs requires a careful decomposition of the applications into balanced parallel tasks according to both the performance of each CU type and the communication costs among them. This paper describes the design and implementation of runtime support for OpenMP hybrid GPU-CPU applications, when mixed with GPU-oriented programming models (e.g. CUDA/HIP). The paper describes the case for a hybrid multi-level parallelization of the NPB-MZ benchmark suite. The implementation exploits both coarse-grain and fine-grain parallelism, mapped to compute units of different nature (GPUs and CPUs). The paper describes the implementation of runtime support to bridge OpenMP and HIP, introducing the abstractions of Computing Unit and Data Placement. We compare hybrid and non-hybrid executions under state-of-the-art schedulers for OpenMP: static and dynamic task schedulings. Then, we improve the set of schedulers with two additional variants: a memorizing-dynamic task scheduling and a profile-based static task scheduling. On a computing node composed of one AMD EPYC 7742 @ 2.250 GHz (64 cores and 2 threads/core, totalling 128 threads per node) and 2 × GPU AMD Radeon Instinct MI50 with 32 GB, hybrid executions present speedups from 1.10× up to 3.5× with respect to a non-hybrid GPU implementation, depending on the number of activated CUs.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"37 1","pages":"626 - 646"},"PeriodicalIF":3.1,"publicationDate":"2023-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49064728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Running ahead of evolution—AI-based simulation for predicting future high-risk SARS-CoV-2 variants 基于进化人工智能的模拟预测未来高风险的SARS-CoV-2变体
3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-07-29 DOI: 10.1177/10943420231188077
Jie Chen, Zhiwei Nie, Yu Wang, Kai Wang, Fan Xu, Zhiheng Hu, Bing Zheng, Zhennan Wang, Guoli Song, Jingyi Zhang, Jie Fu, Xiansong Huang, Zhongqi Wang, Zhixiang Ren, Qiankun Wang, Daixi Li, Dongqing Wei, Bin Zhou, Chao Yang, Yonghong Tian
The never-ending emergence of SARS-CoV-2 variations of concern (VOCs) has challenged the whole world for pandemic control. In order to develop effective drugs and vaccines, one needs to efficiently simulate SARS-CoV-2 spike receptor-binding domain (RBD) mutations and identify high-risk variants. We pretrain a large protein language model with approximately 408 million protein sequences and construct a high-throughput screening for the prediction of binding affinity and antibody escape. As the first work on SARS-CoV-2 RBD mutation simulation, we successfully identify mutations in the RBD regions of 5 VOCs and can screen millions of potential variants in seconds. Our workflow scales to 4096 NPUs with 96.5% scalability and 493.9× speedup in mixed-precision computing, while achieving a peak performance of 366.8 PFLOPS (reaching 34.9% theoretical peak) on Pengcheng Cloudbrain-II. Our method paves the way for simulating coronavirus evolution in order to prepare for a future pandemic that will inevitably take place. Our models are released at https://github.com/ZhiweiNiepku/SARS-CoV-2_mutation_simulation to facilitate future related work.
不断出现的SARS-CoV-2关注变异(VOCs)对全世界的大流行控制提出了挑战。为了开发有效的药物和疫苗,需要有效地模拟SARS-CoV-2刺突受体结合域(spike receptor-binding domain, RBD)突变并识别高风险变体。我们预训练了一个包含约4.08亿个蛋白质序列的大型蛋白质语言模型,并构建了一个高通量筛选来预测结合亲和力和抗体逃逸。作为SARS-CoV-2 RBD突变模拟的第一个工作,我们成功地识别了5种VOCs RBD区域的突变,并可以在几秒钟内筛选数百万个潜在的变体。我们的工作流规模为4096个npu,可扩展性为96.5%,混合精度计算加速为493.9倍,同时在鹏程云脑ii上实现了366.8 PFLOPS的峰值性能(达到34.9%的理论峰值)。我们的方法为模拟冠状病毒的进化铺平了道路,以便为未来不可避免的大流行做准备。我们的模型发布在https://github.com/ZhiweiNiepku/SARS-CoV-2_mutation_simulation,以方便以后的相关工作。
{"title":"Running ahead of evolution—AI-based simulation for predicting future high-risk SARS-CoV-2 variants","authors":"Jie Chen, Zhiwei Nie, Yu Wang, Kai Wang, Fan Xu, Zhiheng Hu, Bing Zheng, Zhennan Wang, Guoli Song, Jingyi Zhang, Jie Fu, Xiansong Huang, Zhongqi Wang, Zhixiang Ren, Qiankun Wang, Daixi Li, Dongqing Wei, Bin Zhou, Chao Yang, Yonghong Tian","doi":"10.1177/10943420231188077","DOIUrl":"https://doi.org/10.1177/10943420231188077","url":null,"abstract":"The never-ending emergence of SARS-CoV-2 variations of concern (VOCs) has challenged the whole world for pandemic control. In order to develop effective drugs and vaccines, one needs to efficiently simulate SARS-CoV-2 spike receptor-binding domain (RBD) mutations and identify high-risk variants. We pretrain a large protein language model with approximately 408 million protein sequences and construct a high-throughput screening for the prediction of binding affinity and antibody escape. As the first work on SARS-CoV-2 RBD mutation simulation, we successfully identify mutations in the RBD regions of 5 VOCs and can screen millions of potential variants in seconds. Our workflow scales to 4096 NPUs with 96.5% scalability and 493.9× speedup in mixed-precision computing, while achieving a peak performance of 366.8 PFLOPS (reaching 34.9% theoretical peak) on Pengcheng Cloudbrain-II. Our method paves the way for simulating coronavirus evolution in order to prepare for a future pandemic that will inevitably take place. Our models are released at https://github.com/ZhiweiNiepku/SARS-CoV-2_mutation_simulation to facilitate future related work.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135444766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Guest editors note: Special issue on clusters, clouds, and data for scientific computing 客座编辑注:关于科学计算的集群、云和数据的特刊
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-07-01 DOI: 10.1177/10943420231180188
J. Dongarra, B. Tourancheau
The research areas of cluster, cloud, and data analytics computing, which today provide fundamental infrastructure for all areas of advanced computational science, are being radically transformed by the convergence of at least two unprecedented trends. The fi rst is the ongoing emergence of multicore and hybrid microprocessor designs, ushering in a new era of computing in which system designers must accept energy usage as a fi rst-order constraint, and application designers must be able to exploit parallelism and data locality to an unprecedented degree. As the research community is rapidly becoming aware, the components of the traditional HPC software stack are poorly matched to the characteristics of systems based on these new architectures — hundreds of thousands of nodes, millions of cores, GPU accelerators, reduced bandwidth, and memory per core. The second trend is the dramatic escalation in the amount of data that leading edge scienti fi c applications, and the communities that use them, are either generating or trying to analyze. A key problem in such data intensive science lies not only in the shear volume of bits that must be processed and managed but also in the logistical problems associated with making the data of most current interest available to participants in large national and international collaborations, sitting in different administrative domains, spread across the wide area network, and wanting to use diverse resources — clusters, clouds, and data. This special issue gathers selected papers of the Work-shop on Clusters, Clouds and Data for Scienti fi c Computing (CCDSC) that was held at La Maison des Contes , 69490 Dareize-France, on September 6 – 9, 2022. This workshop is a continuation of a series of workshops started in 1992 entitled Workshop on Environments and Tools for Parallel Scienti
集群、云和数据分析计算的研究领域,今天为高级计算科学的所有领域提供了基础设施,正因至少两种前所未有的趋势的融合而发生根本性的转变。首先是多核和混合微处理器设计的不断出现,开创了一个新的计算时代,在这个时代,系统设计者必须接受能源使用作为一阶约束,应用程序设计者必须能够以前所未有的程度利用并行性和数据局部性。正如研究界迅速意识到的那样,传统HPC软件堆栈的组件与基于这些新架构的系统的特性不太匹配——数十万个节点、数百万个内核、GPU加速器、降低的带宽和每个内核的内存。第二个趋势是领先的科学应用程序和使用它们的社区正在生成或试图分析的数据量急剧增加。在这种数据密集型科学中,一个关键问题不仅在于必须处理和管理的比特剪切量,还在于与将当前最感兴趣的数据提供给大型国家和国际合作的参与者相关的后勤问题,这些合作位于不同的行政领域,分布在广域网中,并且希望使用各种资源——集群、云和数据。本特刊汇集了2022年9月6日至9日在法国达雷泽69490号La Maison des Contes举行的科学计算集群、云和数据工作坊(CCDSC)的精选论文。该讲习班是1992年开始的一系列讲习班的延续,题为“平行科学的环境和工具讲习班”
{"title":"Guest editors note: Special issue on clusters, clouds, and data for scientific computing","authors":"J. Dongarra, B. Tourancheau","doi":"10.1177/10943420231180188","DOIUrl":"https://doi.org/10.1177/10943420231180188","url":null,"abstract":"The research areas of cluster, cloud, and data analytics computing, which today provide fundamental infrastructure for all areas of advanced computational science, are being radically transformed by the convergence of at least two unprecedented trends. The fi rst is the ongoing emergence of multicore and hybrid microprocessor designs, ushering in a new era of computing in which system designers must accept energy usage as a fi rst-order constraint, and application designers must be able to exploit parallelism and data locality to an unprecedented degree. As the research community is rapidly becoming aware, the components of the traditional HPC software stack are poorly matched to the characteristics of systems based on these new architectures — hundreds of thousands of nodes, millions of cores, GPU accelerators, reduced bandwidth, and memory per core. The second trend is the dramatic escalation in the amount of data that leading edge scienti fi c applications, and the communities that use them, are either generating or trying to analyze. A key problem in such data intensive science lies not only in the shear volume of bits that must be processed and managed but also in the logistical problems associated with making the data of most current interest available to participants in large national and international collaborations, sitting in different administrative domains, spread across the wide area network, and wanting to use diverse resources — clusters, clouds, and data. This special issue gathers selected papers of the Work-shop on Clusters, Clouds and Data for Scienti fi c Computing (CCDSC) that was held at La Maison des Contes , 69490 Dareize-France, on September 6 – 9, 2022. This workshop is a continuation of a series of workshops started in 1992 entitled Workshop on Environments and Tools for Parallel Scienti","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"37 1","pages":"211 - 212"},"PeriodicalIF":3.1,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46065430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parallel multithreaded deduplication of data sequences in nuclear structure calculations 核结构计算中数据序列的并行多线程重复数据删除
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-06-30 DOI: 10.1177/10943420231183697
D. Langr, T. Dytrych
High performance computing (HPC) applications that work with redundant sequences of data can benefit from their deduplication. We study this problem on the symmetry-adapted no-core shell model (SA-NCSM), where redundant sequences of different kinds naturally emerge in the data of the basis of the Hilbert space physically relevant to a modeled nucleus. For a fast solution of this problem on multicore architectures, we propose and present three multithreaded algorithms, which employ either concurrent hash tables or parallel sorting methods. Furthermore, we present evaluation and comparison of these algorithms based on experiments performed with real-world SA-NCSM calculations. The results indicate that the fastest option is to use a concurrent hash table, provided that it supports sequences of data as a type of table keys. If such a hash table is not available, the algorithm based on parallel sorting is a viable alternative.
使用冗余数据序列的高性能计算(HPC)应用程序可以从其重复数据消除中受益。我们在对称自适应无核壳模型(SA-NCSM)上研究了这个问题,其中不同类型的冗余序列自然出现在与建模核物理相关的希尔伯特空间的基础数据中。为了在多核体系结构上快速解决这个问题,我们提出并提出了三种多线程算法,它们采用并发哈希表或并行排序方法。此外,我们基于真实世界SA-NCSM计算的实验,对这些算法进行了评估和比较。结果表明,最快的选择是使用并发哈希表,前提是它支持数据序列作为表键的类型。如果这样的哈希表不可用,则基于并行排序的算法是可行的替代方案。
{"title":"Parallel multithreaded deduplication of data sequences in nuclear structure calculations","authors":"D. Langr, T. Dytrych","doi":"10.1177/10943420231183697","DOIUrl":"https://doi.org/10.1177/10943420231183697","url":null,"abstract":"High performance computing (HPC) applications that work with redundant sequences of data can benefit from their deduplication. We study this problem on the symmetry-adapted no-core shell model (SA-NCSM), where redundant sequences of different kinds naturally emerge in the data of the basis of the Hilbert space physically relevant to a modeled nucleus. For a fast solution of this problem on multicore architectures, we propose and present three multithreaded algorithms, which employ either concurrent hash tables or parallel sorting methods. Furthermore, we present evaluation and comparison of these algorithms based on experiments performed with real-world SA-NCSM calculations. The results indicate that the fastest option is to use a concurrent hash table, provided that it supports sequences of data as a type of table keys. If such a hash table is not available, the algorithm based on parallel sorting is a viable alternative.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"1 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2023-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41427547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
INDIANA—In-Network Distributed Infrastructure for Advanced Network Applications INDANA——用于高级网络应用的网络内分布式基础设施
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-06-26 DOI: 10.1177/10943420231179662
Sabra Ossen, Jeremy Musser, Luke Dalessandro, M. Swany
Data volumes are exploding as sensors proliferate and become more capable. Edge computing is envisioned as a path to distribute processing and reduce latency. Many models of Edge computing consider small devices running conventional software. Our model includes a more lightweight execution engine for network microservices and a network scheduling framework to configure network processing elements to process streams and direct the appropriate traffic to them. In this article, we describe INDIANA, a complete framework for in-network microservices. We will describe how the two components-the INDIANA network Processing Element (InPE) and the Flange Network Operating System (NOS)-work together to achieve effective in-network processing to improve performance in edge to cloud environments. Our processing elements provide lightweight compute units optimized for efficient stream processing. These elements are customizable and vary in sophistication and resource consumption. The Flange NOS provides first-class flow based reasoning to drive function placement, network configuration, and load balancing that can respond dynamically to network conditions. We describe design considerations and discuss our approach and implementations. We evaluate the performance of stream processing and examine the performance of several exemplar applications on networks of increasing scale and complexity.
随着传感器的激增和功能的增强,数据量呈爆炸式增长。边缘计算被设想为一种分配处理和减少延迟的途径。许多边缘计算模型考虑的是运行传统软件的小型设备。我们的模型包括一个用于网络微服务的更轻量级的执行引擎和一个网络调度框架,用于配置网络处理元素来处理流并将适当的流量定向到它们。在本文中,我们将描述一个用于网络内微服务的完整框架——INDIANA。我们将描述这两个组件——印第安纳网络处理元素(InPE)和法兰网络操作系统(NOS)——如何协同工作,以实现有效的网络内处理,从而提高边缘到云环境中的性能。我们的处理元素提供了轻量级的计算单元,优化了高效的流处理。这些元素是可定制的,在复杂程度和资源消耗方面各不相同。法兰NOS提供一流的基于流的推理,以驱动功能布局、网络配置和负载平衡,可以动态响应网络条件。我们描述了设计注意事项,并讨论了我们的方法和实现。我们评估了流处理的性能,并检查了几个示例应用程序在规模和复杂性不断增加的网络上的性能。
{"title":"INDIANA—In-Network Distributed Infrastructure for Advanced Network Applications","authors":"Sabra Ossen, Jeremy Musser, Luke Dalessandro, M. Swany","doi":"10.1177/10943420231179662","DOIUrl":"https://doi.org/10.1177/10943420231179662","url":null,"abstract":"Data volumes are exploding as sensors proliferate and become more capable. Edge computing is envisioned as a path to distribute processing and reduce latency. Many models of Edge computing consider small devices running conventional software. Our model includes a more lightweight execution engine for network microservices and a network scheduling framework to configure network processing elements to process streams and direct the appropriate traffic to them. In this article, we describe INDIANA, a complete framework for in-network microservices. We will describe how the two components-the INDIANA network Processing Element (InPE) and the Flange Network Operating System (NOS)-work together to achieve effective in-network processing to improve performance in edge to cloud environments. Our processing elements provide lightweight compute units optimized for efficient stream processing. These elements are customizable and vary in sophistication and resource consumption. The Flange NOS provides first-class flow based reasoning to drive function placement, network configuration, and load balancing that can respond dynamically to network conditions. We describe design considerations and discuss our approach and implementations. We evaluate the performance of stream processing and examine the performance of several exemplar applications on networks of increasing scale and complexity.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"37 1","pages":"442 - 461"},"PeriodicalIF":3.1,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43385869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Journal of High Performance Computing Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1