首页 > 最新文献

Journal of Parallel and Distributed Computing最新文献

英文 中文
Teaching parallel and distributed computing in a single undergraduate-level course 在单一的本科水平课程中教授并行和分布式计算
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-04-29 DOI: 10.1016/j.jpdc.2025.105092
Tia Newhall
As the application of parallel distributed computing (PDC) becomes ever more pervasive, it is increasingly important that undergraduate CS curricula expose students to a wide range of PDC topics in order to prepare them for the workforce. We present the curricular design and learning goals of an upper-level undergraduate course that covers a wide breadth of topics in parallel and distributed computing, while also providing students with depth of experience and development of problem solving, programming, and analysis skills. We discuss lessons learned from our experiences teaching this course over 15 years, and we discuss changes and improvements we have made in its offerings, as well as choices and trade-offs we made to achieve a balance between breadth and depth of coverage across these two huge fields. Evaluations from students support that our approach works well meeting the goals of exposing students to a broad range of PDC topics, building important PDC thinking and programming skills, and meeting other pedagogical goals of an advanced upper-level undergraduate CS course. Although initially designed as a single course due to constraints that are common to smaller schools, our experiences with this course lead us to conclude that it is a good approach for an advanced undergraduate course on PDC at any institution.
随着并行分布式计算(PDC)的应用变得越来越普遍,本科CS课程让学生接触到广泛的PDC主题,以便为他们的工作做好准备,这一点变得越来越重要。我们提出了一个高级本科课程的课程设计和学习目标,该课程涵盖了并行和分布式计算的广泛主题,同时也为学生提供了解决问题、编程和分析技能的深度经验和发展。我们讨论了15年来我们教授这门课程的经验教训,我们讨论了我们在课程中所做的改变和改进,以及我们为实现这两个巨大领域的广度和深度之间的平衡而做出的选择和权衡。学生的评价支持我们的方法很好地满足了让学生接触广泛的PDC主题,培养重要的PDC思维和编程技能,以及满足高级本科CS课程的其他教学目标的目标。虽然由于小型学校的限制,最初设计为单一课程,但我们对这门课程的经验使我们得出结论,对于任何机构的PDC高级本科课程来说,这都是一个很好的方法。
{"title":"Teaching parallel and distributed computing in a single undergraduate-level course","authors":"Tia Newhall","doi":"10.1016/j.jpdc.2025.105092","DOIUrl":"10.1016/j.jpdc.2025.105092","url":null,"abstract":"<div><div>As the application of parallel distributed computing (PDC) becomes ever more pervasive, it is increasingly important that undergraduate CS curricula expose students to a wide range of PDC topics in order to prepare them for the workforce. We present the curricular design and learning goals of an upper-level undergraduate course that covers a wide breadth of topics in parallel and distributed computing, while also providing students with depth of experience and development of problem solving, programming, and analysis skills. We discuss lessons learned from our experiences teaching this course over 15 years, and we discuss changes and improvements we have made in its offerings, as well as choices and trade-offs we made to achieve a balance between breadth and depth of coverage across these two huge fields. Evaluations from students support that our approach works well meeting the goals of exposing students to a broad range of PDC topics, building important PDC thinking and programming skills, and meeting other pedagogical goals of an advanced upper-level undergraduate CS course. Although initially designed as a single course due to constraints that are common to smaller schools, our experiences with this course lead us to conclude that it is a good approach for an advanced undergraduate course on PDC at any institution.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105092"},"PeriodicalIF":3.4,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143912437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues) 封面1 -完整的扉页(每期)/特刊扉页(每期)
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-04-25 DOI: 10.1016/S0743-7315(25)00065-6
{"title":"Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)","authors":"","doi":"10.1016/S0743-7315(25)00065-6","DOIUrl":"10.1016/S0743-7315(25)00065-6","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"201 ","pages":"Article 105098"},"PeriodicalIF":3.4,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143874679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Experience with adapting to a software framework for a use-case in computational science 有适应计算科学用例的软件框架的经验
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-04-24 DOI: 10.1016/j.jpdc.2025.105090
V. Venkatesh Shenoi, Nisha Agrawal
The effective use of HPC infrastructure critically depends on the human resources involved in the maintenance and operation of these systems alongside the domain scientists and scientific programmers who develop scientific applications to leverage these systems. The workforce typically consists of undergraduates/postgraduates in different fields with broad areas of training in scientific computing and some programming skills with aptitude in HPC. However, there is a gap in the university-level curriculum and the skill set required to adapt to the requirements for developing scientific applications. Some efforts are there to fill this gap through workforce training programs to prepare the graduates for HPC jobs in industry/national labs. In this work, we share our experience training the workforce to adapt to AMReX (https://amrex-codes.github.io/amrex/docs_html/), a software framework developed under the Exascale computing project for scientific application development. It requires recapitulation of partial differential equations (PDEs), an indispensable mathematical model for describing physical systems across different scientific domains. We discuss our engagement with the intern, the trainees, and the development team in orienting them to scientific computing on the HPC platform, PDE solvers in particular. We highlight some of the features of the AMReX framework that helped the development team to contribute AMReX-based phase field solvers in the MicroSim phase field solver suite as a case study in adapting to the framework. These solvers can target different architectures without modifications due to the abstraction layer that provides immunity to developers for programming on different architectures. This experience can help to evolve a training model to build the HPC workforce.
HPC基础设施的有效使用主要取决于维护和操作这些系统的人力资源,以及开发利用这些系统的科学应用程序的领域科学家和科学程序员。劳动力通常由不同领域的本科生/研究生组成,他们在科学计算方面接受过广泛的培训,并具备一些在高性能计算方面有天赋的编程技能。然而,在大学水平的课程和适应发展科学应用所需的技能方面存在差距。一些努力通过劳动力培训计划来填补这一空白,为毕业生在工业/国家实验室的高性能计算工作做好准备。在这项工作中,我们分享了培训员工适应AMReX (https://amrex-codes.github.io/amrex/docs_html/)的经验,AMReX是在Exascale计算项目下为科学应用程序开发的软件框架。它需要对偏微分方程(PDEs)进行重述,偏微分方程是描述不同科学领域物理系统不可或缺的数学模型。我们与实习生、受训者和开发团队讨论如何引导他们在HPC平台上进行科学计算,特别是PDE求解器。我们重点介绍了AMReX框架的一些特性,这些特性帮助开发团队在MicroSim相场求解器套件中贡献了基于AMReX的相场求解器,作为适应该框架的案例研究。由于抽象层为开发人员在不同的体系结构上编程提供了免疫力,因此这些求解器可以针对不同的体系结构而无需修改。这种经验可以帮助发展培训模式,以建立HPC劳动力。
{"title":"Experience with adapting to a software framework for a use-case in computational science","authors":"V. Venkatesh Shenoi,&nbsp;Nisha Agrawal","doi":"10.1016/j.jpdc.2025.105090","DOIUrl":"10.1016/j.jpdc.2025.105090","url":null,"abstract":"<div><div>The effective use of HPC infrastructure critically depends on the human resources involved in the maintenance and operation of these systems alongside the domain scientists and scientific programmers who develop scientific applications to leverage these systems. The workforce typically consists of undergraduates/postgraduates in different fields with broad areas of training in scientific computing and some programming skills with aptitude in HPC. However, there is a gap in the university-level curriculum and the skill set required to adapt to the requirements for developing scientific applications. Some efforts are there to fill this gap through workforce training programs to prepare the graduates for HPC jobs in industry/national labs. In this work, we share our experience training the workforce to adapt to AMReX (<span><span>https://amrex-codes.github.io/amrex/docs_html/</span><svg><path></path></svg></span>), a software framework developed under the Exascale computing project for scientific application development. It requires recapitulation of partial differential equations (PDEs), an indispensable mathematical model for describing physical systems across different scientific domains. We discuss our engagement with the intern, the trainees, and the development team in orienting them to scientific computing on the HPC platform, PDE solvers in particular. We highlight some of the features of the AMReX framework that helped the development team to contribute AMReX-based phase field solvers in the MicroSim phase field solver suite as a case study in adapting to the framework. These solvers can target different architectures without modifications due to the abstraction layer that provides immunity to developers for programming on different architectures. This experience can help to evolve a training model to build the HPC workforce.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105090"},"PeriodicalIF":3.4,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143886152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
2-edge-Hamilton-connected dragonfly network 2边汉密尔顿连接蜻蜓网络
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-04-23 DOI: 10.1016/j.jpdc.2025.105095
Huimei Guo , Rong-Xia Hao , Jie Wu
The dragonfly networks are being used in the supercomputers of today. It is of interest to study the topological properties of dragonfly networks. Let G=(V(G),E(G)) be a graph. Let X be a subset of {uv:u,vV(G)anduv} such that every component induced by X on V(G) is a path. If, |X|k and after adding all edges in X to G, the resulting graph contains a Hamiltonian cycle that includes all edges in X, then the graph G is called k-edge-Hamilton-connected. This property can be used to design and optimize routing and forwarding algorithms. By finding such Hamiltonian cycle containing specific edges in the network, it can be ensured that every node can act as an intermediate node to forward packets through a specific channel, thus enabling efficient data transmission and routing. For k=2, determining whether a graph is k-edge-Hamilton-connected is a challenging problem, as it is known to be NP-complete. 2-edge-Hamilton-connected is an extension of Hamilton-connected. In this paper, we prove that the relative arrangement dragonfly network, a type of dragonfly network constructed by the global connections based on relative arrangements, is 2-edge-Hamilton-connected, and this property shows that dragonfly networks have strong reliability. In addition, we determined that D(n,h,g) is 1-Hamilton-connected and paired 2-disjoint path coverable with n4 and h2.
蜻蜓网络被用于今天的超级计算机。研究蜻蜓网络的拓扑特性具有重要的意义。设G=(V(G),E(G))是一个图。设X是{uv:u,v∈v (G)且u≠v}的子集,使得X在v (G)上诱导出的每个分量都是一条路径。如果,|X|≤k,将X中的所有边加到G中,得到的图包含一个包含X中所有边的哈密顿循环,则图G称为k边哈密顿连通。此属性可用于设计和优化路由和转发算法。通过在网络中找到这种包含特定边的哈密顿循环,可以保证每个节点都能作为中间节点通过特定通道转发数据包,从而实现高效的数据传输和路由。对于k=2,确定图是否为k边汉密尔顿连通是一个具有挑战性的问题,因为已知它是np完全的。二边哈密顿连通是哈密顿连通的扩展。本文证明了一种基于相对排列的全局连接构建的蜻蜓网络——相对排列蜻蜓网络是2边hamilton连通的,这一性质表明蜻蜓网络具有较强的可靠性。此外,我们确定了D(n,h,g)是1- hamilton连通和配对的2-不相交路径,可被n≥4和h≥2覆盖。
{"title":"2-edge-Hamilton-connected dragonfly network","authors":"Huimei Guo ,&nbsp;Rong-Xia Hao ,&nbsp;Jie Wu","doi":"10.1016/j.jpdc.2025.105095","DOIUrl":"10.1016/j.jpdc.2025.105095","url":null,"abstract":"<div><div>The dragonfly networks are being used in the supercomputers of today. It is of interest to study the topological properties of dragonfly networks. Let <span><math><mi>G</mi><mo>=</mo><mo>(</mo><mi>V</mi><mo>(</mo><mi>G</mi><mo>)</mo><mo>,</mo><mi>E</mi><mo>(</mo><mi>G</mi><mo>)</mo><mo>)</mo></math></span> be a graph. Let <em>X</em> be a subset of <span><math><mo>{</mo><mi>u</mi><mi>v</mi><mo>:</mo><mi>u</mi><mo>,</mo><mi>v</mi><mo>∈</mo><mi>V</mi><mo>(</mo><mi>G</mi><mo>)</mo><mspace></mspace><mtext>and</mtext><mspace></mspace><mi>u</mi><mo>≠</mo><mi>v</mi><mo>}</mo></math></span> such that every component induced by <em>X</em> on <span><math><mi>V</mi><mo>(</mo><mi>G</mi><mo>)</mo></math></span> is a path. If, <span><math><mo>|</mo><mi>X</mi><mo>|</mo><mo>≤</mo><mi>k</mi></math></span> and after adding all edges in <em>X</em> to <em>G</em>, the resulting graph contains a Hamiltonian cycle that includes all edges in <em>X</em>, then the graph <em>G</em> is called <em>k</em>-edge-Hamilton-connected. This property can be used to design and optimize routing and forwarding algorithms. By finding such Hamiltonian cycle containing specific edges in the network, it can be ensured that every node can act as an intermediate node to forward packets through a specific channel, thus enabling efficient data transmission and routing. For <span><math><mi>k</mi><mo>=</mo><mn>2</mn></math></span>, determining whether a graph is <em>k</em>-edge-Hamilton-connected is a challenging problem, as it is known to be NP-complete. 2-edge-Hamilton-connected is an extension of Hamilton-connected. In this paper, we prove that the relative arrangement dragonfly network, a type of dragonfly network constructed by the global connections based on relative arrangements, is 2-edge-Hamilton-connected, and this property shows that dragonfly networks have strong reliability. In addition, we determined that <span><math><mi>D</mi><mo>(</mo><mi>n</mi><mo>,</mo><mi>h</mi><mo>,</mo><mi>g</mi><mo>)</mo></math></span> is 1-Hamilton-connected and paired 2-disjoint path coverable with <span><math><mi>n</mi><mo>≥</mo><mn>4</mn></math></span> and <span><math><mi>h</mi><mo>≥</mo><mn>2</mn></math></span>.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105095"},"PeriodicalIF":3.4,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143895554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advanced resource management: A hands-on master course in HPC and cloud computing 高级资源管理:HPC和云计算的实践硕士课程
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-04-23 DOI: 10.1016/j.jpdc.2025.105091
Lucia Pons, Salvador Petit, Julio Sahuquillo
Resource management has become a major concern in dealing with performance and fairness in recent computing servers, including a wide variety of shared resources. To achieve high-performing and efficient systems, both hardware and software engineers must be thoroughly trained in effective resource management techniques. This paper introduces the GRE master course (Spanish acronym for Resource Management and Performance Evaluation in Cloud and High-Performance Workloads), which is being offered since Fall 2023. The course is taught by instructors with broad research expertise in resource management and performance evaluation. Subjects covered in this course include workload characterization, state-of-the-art resource management approaches, and performance evaluation tools and methodologies used in production systems. Management techniques are studied both in the context of HPC and cloud computing, where resource efficiency is becoming a primary concern. To enhance the learning experience, the course integrates theoretical concepts with a wide set of hands-on tasks carried out on recent real platforms. A real cloud virtualized environment is mimicked using typical software deployed in production systems such as Proxmox Virtual Environment. Students learn to use tools such as Linux Perf and Intel Vtune Profiler, which are commonly employed by researchers and practitioners to carry out typical tasks like performance bottleneck analysis from a microarchitectural perspective. Overall, the GRE course provides students with a solid foundation and skills in resource management by addressing current hot topics both in the industry and academia. Student satisfaction and learning outcomes prove the success of the GRE course and encourage us to continue in this direction.
在最近的计算服务器(包括各种各样的共享资源)中,资源管理已经成为处理性能和公平性的主要关注点。为了实现高性能和高效的系统,硬件和软件工程师都必须在有效的资源管理技术方面进行彻底的培训。本文介绍了GRE硕士课程(西班牙语是云和高性能工作负载中的资源管理和性能评估的首字母缩略词),该课程自2023年秋季开始提供。该课程由在资源管理和绩效评估方面具有广泛研究专长的教师讲授。本课程涵盖的主题包括工作量表征,最先进的资源管理方法,以及生产系统中使用的性能评估工具和方法。管理技术在高性能计算和云计算的背景下进行了研究,其中资源效率正在成为主要关注的问题。为了增强学习体验,本课程将理论概念与近期在真实平台上进行的广泛实践任务相结合。使用部署在生产系统(如Proxmox Virtual environment)中的典型软件来模拟真实的云虚拟化环境。学生将学习使用Linux Perf和Intel Vtune Profiler等工具,这些工具通常被研究人员和从业者用于执行从微架构角度进行性能瓶颈分析等典型任务。总的来说,GRE课程通过解决当前业界和学术界的热门话题,为学生提供了坚实的资源管理基础和技能。学生的满意度和学习成果证明了GRE课程的成功,并鼓励我们继续沿着这个方向前进。
{"title":"Advanced resource management: A hands-on master course in HPC and cloud computing","authors":"Lucia Pons,&nbsp;Salvador Petit,&nbsp;Julio Sahuquillo","doi":"10.1016/j.jpdc.2025.105091","DOIUrl":"10.1016/j.jpdc.2025.105091","url":null,"abstract":"<div><div>Resource management has become a major concern in dealing with performance and fairness in recent computing servers, including a wide variety of shared resources. To achieve high-performing and efficient systems, both hardware and software engineers must be thoroughly trained in effective resource management techniques. This paper introduces the GRE master course (Spanish acronym for Resource Management and Performance Evaluation in Cloud and High-Performance Workloads), which is being offered since Fall 2023. The course is taught by instructors with broad research expertise in resource management and performance evaluation. Subjects covered in this course include workload characterization, state-of-the-art resource management approaches, and performance evaluation tools and methodologies used in production systems. Management techniques are studied both in the context of HPC and cloud computing, where resource efficiency is becoming a primary concern. To enhance the learning experience, the course integrates theoretical concepts with a wide set of hands-on tasks carried out on recent real platforms. A real cloud virtualized environment is mimicked using typical software deployed in production systems such as Proxmox Virtual Environment. Students learn to use tools such as Linux Perf and Intel Vtune Profiler, which are commonly employed by researchers and practitioners to carry out typical tasks like performance bottleneck analysis from a microarchitectural perspective. Overall, the GRE course provides students with a solid foundation and skills in resource management by addressing current hot topics both in the industry and academia. Student satisfaction and learning outcomes prove the success of the GRE course and encourage us to continue in this direction.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105091"},"PeriodicalIF":3.4,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143869352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HIP-RRTMG_SW: Accelerating a shortwave radiative transfer scheme under the heterogeneous-compute interface for portability (HIP) framework HIP- rrtmg_sw:在异构计算接口移植性(HIP)框架下加速短波辐射传输方案
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-04-23 DOI: 10.1016/j.jpdc.2025.105094
Zhenzhen Wang , Yuzhu Wang , Fei Li , Jinrong Jiang , Xiaocong Wang
With the development of higher-resolution atmospheric circulation models, the amount of calculation increases polynomially with resolution, and the calculation accuracy of physical processes is increasing rapidly. The traditional parallel computing methods based on multi-core CPUs can no longer meet the requirements of high efficiency and real-time computing performance of climate models. In order to improve the computational efficiency and scalability of the Atmospheric General Circulation Model, it is urgent to study efficient parallel algorithms and performance optimization methods for radiation physical process with massive calculations. In this paper, a heterogeneous multidimensional acceleration algorithm is proposed for the shortwave radiation transfer model (RRTMG_SW) based on HIP. Then, the HIP version of RRTMG_SW is developed, namely HIP-RRTMG_SW. In addition, combined with the “MPI + HIP” hybrid programming model, a multi-GPU implementation of RRTMG_SW is also proposed, and it makes full use of the multi-node, multi-core CPU and multi-GPU computing capability of a heterogeneous high performance computing system. Experimental results show that HIP-RRTMG_SW achieves 7.05× of acceleration in the climate simulation with 0.25 resolution using 16 AMD GPUs on the ORISE supercomputer compared with RRTMG_SW using 128 CPU cores. When using 1024 AMD GPUs, HIP-RRTMG_SW is 83.94× faster than RRTMG_SW with 128 CPU cores, indicating that the proposed multi-GPU acceleration algorithm has strong scalability.
随着高分辨率大气环流模式的发展,计算量随分辨率呈多项式增长,物理过程的计算精度迅速提高。传统的基于多核cpu的并行计算方法已不能满足气候模型对高效、实时计算性能的要求。为了提高大气环流模式的计算效率和可扩展性,迫切需要研究大规模计算辐射物理过程的高效并行算法和性能优化方法。针对基于HIP的短波辐射传输模型(RRTMG_SW),提出了一种异构多维加速算法。然后,开发了RRTMG_SW的HIP版本,即HIP-RRTMG_SW。此外,结合“MPI + HIP”混合编程模型,提出了RRTMG_SW的多gpu实现方案,充分利用了异构高性能计算系统的多节点、多核CPU和多gpu计算能力。实验结果表明,与使用128个CPU核的RRTMG_SW相比,在使用16个AMD gpu的ORISE超级计算机上,在0.25°分辨率的气候模拟中,该算法的加速度提高了7.05倍。在使用1024 AMD gpu时,比128 CPU核的RRTMG_SW快83.94倍,表明本文提出的多gpu加速算法具有较强的可扩展性。
{"title":"HIP-RRTMG_SW: Accelerating a shortwave radiative transfer scheme under the heterogeneous-compute interface for portability (HIP) framework","authors":"Zhenzhen Wang ,&nbsp;Yuzhu Wang ,&nbsp;Fei Li ,&nbsp;Jinrong Jiang ,&nbsp;Xiaocong Wang","doi":"10.1016/j.jpdc.2025.105094","DOIUrl":"10.1016/j.jpdc.2025.105094","url":null,"abstract":"<div><div>With the development of higher-resolution atmospheric circulation models, the amount of calculation increases polynomially with resolution, and the calculation accuracy of physical processes is increasing rapidly. The traditional parallel computing methods based on multi-core CPUs can no longer meet the requirements of high efficiency and real-time computing performance of climate models. In order to improve the computational efficiency and scalability of the Atmospheric General Circulation Model, it is urgent to study efficient parallel algorithms and performance optimization methods for radiation physical process with massive calculations. In this paper, a heterogeneous multidimensional acceleration algorithm is proposed for the shortwave radiation transfer model (RRTMG_SW) based on HIP. Then, the HIP version of RRTMG_SW is developed, namely HIP-RRTMG_SW. In addition, combined with the “MPI + HIP” hybrid programming model, a multi-GPU implementation of RRTMG_SW is also proposed, and it makes full use of the multi-node, multi-core CPU and multi-GPU computing capability of a heterogeneous high performance computing system. Experimental results show that HIP-RRTMG_SW achieves 7.05× of acceleration in the climate simulation with 0.25<sup>∘</sup> resolution using 16 AMD GPUs on the ORISE supercomputer compared with RRTMG_SW using 128 CPU cores. When using 1024 AMD GPUs, HIP-RRTMG_SW is 83.94× faster than RRTMG_SW with 128 CPU cores, indicating that the proposed multi-GPU acceleration algorithm has strong scalability.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105094"},"PeriodicalIF":3.4,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143869317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editor's note 编者按
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-04-18 DOI: 10.1016/j.jpdc.2025.105089
Ananth Kalyanaraman
{"title":"Editor's note","authors":"Ananth Kalyanaraman","doi":"10.1016/j.jpdc.2025.105089","DOIUrl":"10.1016/j.jpdc.2025.105089","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105089"},"PeriodicalIF":3.4,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143860371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient parameter tuning for a structure-based virtual screening HPC application 基于结构的虚拟筛选HPC应用程序的有效参数调整
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-04-15 DOI: 10.1016/j.jpdc.2025.105087
Bruno Guindani, Davide Gadioli, Roberto Rocco, Danilo Ardagna, Gianluca Palermo
Virtual screening applications are highly parameterized to optimize the balance between quality and execution performance. While output quality is critical, the entire screening process must be completed within a reasonable time. In fact, a slight reduction in output accuracy may be acceptable when dealing with large datasets. Finding the optimal quality-throughput trade-off depends on the specific HPC system used and should be re-evaluated with each new deployment or significant code update. This paper presents two parallel autotuning techniques for constrained optimization in distributed High-Performance Computing (HPC) environments. These techniques extend sequential Bayesian Optimization (BO) with two parallel asynchronous approaches, and they integrate predictions from Machine Learning (ML) models to help comply with constraints. Our target application is LiGen, a real-world virtual screening software for drug discovery. The proposed methods address two relevant challenges: efficient exploration of the parameter space and performance measurement using domain-specific metrics and procedures. We conduct an experimental campaign comparing the two methods with a popular state-of-the-art autotuner. Results show that our methods find configurations that are, on average, up to 35–42% better than the ones found by the autotuner and the default expert-picked LiGen configuration.
虚拟筛选应用程序高度参数化,以优化质量和执行性能之间的平衡。虽然输出质量至关重要,但整个筛选过程必须在合理的时间内完成。事实上,在处理大型数据集时,输出精度的轻微降低是可以接受的。找到最佳的质量-吞吐量权衡取决于所使用的特定HPC系统,并且应该在每次新的部署或重要的代码更新时重新评估。本文提出了分布式高性能计算(HPC)环境下约束优化的两种并行自调优技术。这些技术通过两种并行异步方法扩展了顺序贝叶斯优化(BO),并集成了机器学习(ML)模型的预测,以帮助遵守约束。我们的目标应用程序是LiGen,一个用于药物发现的现实世界虚拟筛选软件。提出的方法解决了两个相关的挑战:有效地探索参数空间和使用特定领域的度量和程序进行性能测量。我们进行了一项实验活动,将这两种方法与流行的最先进的自动调谐器进行比较。结果表明,我们的方法发现的配置平均比自动调谐器和默认专家选择的LiGen配置发现的配置好35-42%。
{"title":"Efficient parameter tuning for a structure-based virtual screening HPC application","authors":"Bruno Guindani,&nbsp;Davide Gadioli,&nbsp;Roberto Rocco,&nbsp;Danilo Ardagna,&nbsp;Gianluca Palermo","doi":"10.1016/j.jpdc.2025.105087","DOIUrl":"10.1016/j.jpdc.2025.105087","url":null,"abstract":"<div><div>Virtual screening applications are highly parameterized to optimize the balance between quality and execution performance. While output quality is critical, the entire screening process must be completed within a reasonable time. In fact, a slight reduction in output accuracy may be acceptable when dealing with large datasets. Finding the optimal quality-throughput trade-off depends on the specific HPC system used and should be re-evaluated with each new deployment or significant code update. This paper presents two parallel autotuning techniques for constrained optimization in distributed High-Performance Computing (HPC) environments. These techniques extend sequential Bayesian Optimization (BO) with two parallel asynchronous approaches, and they integrate predictions from Machine Learning (ML) models to help comply with constraints. Our target application is LiGen, a real-world virtual screening software for drug discovery. The proposed methods address two relevant challenges: efficient exploration of the parameter space and performance measurement using domain-specific metrics and procedures. We conduct an experimental campaign comparing the two methods with a popular state-of-the-art autotuner. Results show that our methods find configurations that are, on average, up to 35–42% better than the ones found by the autotuner and the default expert-picked LiGen configuration.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105087"},"PeriodicalIF":3.4,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143860372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Schedule multi-instance microservices to minimize response time under budget constraint in cloud HPC systems 在云高性能计算系统中,调度多实例微服务以在预算限制下最小化响应时间
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-04-08 DOI: 10.1016/j.jpdc.2025.105086
Dong Wang , Hong Shen , Hui Tian , Yuanhao Yang
In the emerging microservice-based architecture of cloud HPC systems, a challenging problem of critical importance for system service capability is how we can schedule microservices to minimize the end-to-end response time for user requests while keeping cost within the specified budget. We address this problem for multi-instance microservices requested by a single application to which no existing result is known to our knowledge. We propose an effective two-stage solution of first allocating budget (resources) to microservices within the budget constraint and then deploying microservice instances on servers to minimize system operational overhead. For budget allocation, we formulate it as the Discrete Time Cost Tradeoff (DTCT) problem which is NP-hard, present a linear program (LP) based algorithm, and provide a rigorous proof of its worst-case performance guarantee of 4 from the optimal solution. For microservice deployment, we show that it is harder than the NP-hard problem of 1-D binpacking through establishing its mathematical model, and propose a heuristic algorithm of Least First Mapping that greedily places microservice instances on fewest possible servers to minimize system operation cost. The experiment results of extensive simulations on DAG-based applications of different sizes demonstrate the superior performance of our algorithm in comparison with the existing approaches.
在新兴的基于微服务的云高性能计算系统架构中,一个对系统服务能力至关重要的具有挑战性的问题是,我们如何调度微服务以最小化用户请求的端到端响应时间,同时将成本保持在指定的预算范围内。对于单个应用程序请求的多实例微服务,我们解决了这个问题,据我们所知,这些应用程序没有已知的现有结果。我们提出了一个有效的两阶段解决方案,首先在预算约束内为微服务分配预算(资源),然后在服务器上部署微服务实例以最小化系统操作开销。对于预算分配问题,我们将其表述为np困难的离散时间成本权衡(DTCT)问题,提出了一种基于线性规划(LP)的算法,并从最优解给出了其最坏情况性能保证4的严格证明。对于微服务部署,我们通过建立其数学模型,证明了它比一维绑定包装的np困难问题更难,并提出了一种启发式的最小优先映射算法,该算法将微服务实例贪心地放置在尽可能少的服务器上,以最小化系统运行成本。在不同规模的基于dag的应用程序上进行了大量的仿真实验,结果表明,与现有方法相比,我们的算法具有优越的性能。
{"title":"Schedule multi-instance microservices to minimize response time under budget constraint in cloud HPC systems","authors":"Dong Wang ,&nbsp;Hong Shen ,&nbsp;Hui Tian ,&nbsp;Yuanhao Yang","doi":"10.1016/j.jpdc.2025.105086","DOIUrl":"10.1016/j.jpdc.2025.105086","url":null,"abstract":"<div><div>In the emerging microservice-based architecture of cloud HPC systems, a challenging problem of critical importance for system service capability is how we can schedule microservices to minimize the end-to-end response time for user requests while keeping cost within the specified budget. We address this problem for multi-instance microservices requested by a single application to which no existing result is known to our knowledge. We propose an effective two-stage solution of first allocating budget (resources) to microservices within the budget constraint and then deploying microservice instances on servers to minimize system operational overhead. For budget allocation, we formulate it as the Discrete Time Cost Tradeoff (DTCT) problem which is NP-hard, present a linear program (LP) based algorithm, and provide a rigorous proof of its worst-case performance guarantee of 4 from the optimal solution. For microservice deployment, we show that it is harder than the NP-hard problem of 1-D binpacking through establishing its mathematical model, and propose a heuristic algorithm of Least First Mapping that greedily places microservice instances on fewest possible servers to minimize system operation cost. The experiment results of extensive simulations on DAG-based applications of different sizes demonstrate the superior performance of our algorithm in comparison with the existing approaches.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105086"},"PeriodicalIF":3.4,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143839548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues) 封面1 -完整的扉页(每期)/特刊扉页(每期)
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-04-06 DOI: 10.1016/S0743-7315(25)00041-3
{"title":"Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)","authors":"","doi":"10.1016/S0743-7315(25)00041-3","DOIUrl":"10.1016/S0743-7315(25)00041-3","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"200 ","pages":"Article 105074"},"PeriodicalIF":3.4,"publicationDate":"2025-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143785399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Parallel and Distributed Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1