首页 > 最新文献

Journal of Parallel and Distributed Computing最新文献

英文 中文
Advanced resource management: A hands-on master course in HPC and cloud computing 高级资源管理:HPC和云计算的实践硕士课程
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-08-01 Epub Date: 2025-04-23 DOI: 10.1016/j.jpdc.2025.105091
Lucia Pons, Salvador Petit, Julio Sahuquillo
Resource management has become a major concern in dealing with performance and fairness in recent computing servers, including a wide variety of shared resources. To achieve high-performing and efficient systems, both hardware and software engineers must be thoroughly trained in effective resource management techniques. This paper introduces the GRE master course (Spanish acronym for Resource Management and Performance Evaluation in Cloud and High-Performance Workloads), which is being offered since Fall 2023. The course is taught by instructors with broad research expertise in resource management and performance evaluation. Subjects covered in this course include workload characterization, state-of-the-art resource management approaches, and performance evaluation tools and methodologies used in production systems. Management techniques are studied both in the context of HPC and cloud computing, where resource efficiency is becoming a primary concern. To enhance the learning experience, the course integrates theoretical concepts with a wide set of hands-on tasks carried out on recent real platforms. A real cloud virtualized environment is mimicked using typical software deployed in production systems such as Proxmox Virtual Environment. Students learn to use tools such as Linux Perf and Intel Vtune Profiler, which are commonly employed by researchers and practitioners to carry out typical tasks like performance bottleneck analysis from a microarchitectural perspective. Overall, the GRE course provides students with a solid foundation and skills in resource management by addressing current hot topics both in the industry and academia. Student satisfaction and learning outcomes prove the success of the GRE course and encourage us to continue in this direction.
在最近的计算服务器(包括各种各样的共享资源)中,资源管理已经成为处理性能和公平性的主要关注点。为了实现高性能和高效的系统,硬件和软件工程师都必须在有效的资源管理技术方面进行彻底的培训。本文介绍了GRE硕士课程(西班牙语是云和高性能工作负载中的资源管理和性能评估的首字母缩略词),该课程自2023年秋季开始提供。该课程由在资源管理和绩效评估方面具有广泛研究专长的教师讲授。本课程涵盖的主题包括工作量表征,最先进的资源管理方法,以及生产系统中使用的性能评估工具和方法。管理技术在高性能计算和云计算的背景下进行了研究,其中资源效率正在成为主要关注的问题。为了增强学习体验,本课程将理论概念与近期在真实平台上进行的广泛实践任务相结合。使用部署在生产系统(如Proxmox Virtual environment)中的典型软件来模拟真实的云虚拟化环境。学生将学习使用Linux Perf和Intel Vtune Profiler等工具,这些工具通常被研究人员和从业者用于执行从微架构角度进行性能瓶颈分析等典型任务。总的来说,GRE课程通过解决当前业界和学术界的热门话题,为学生提供了坚实的资源管理基础和技能。学生的满意度和学习成果证明了GRE课程的成功,并鼓励我们继续沿着这个方向前进。
{"title":"Advanced resource management: A hands-on master course in HPC and cloud computing","authors":"Lucia Pons,&nbsp;Salvador Petit,&nbsp;Julio Sahuquillo","doi":"10.1016/j.jpdc.2025.105091","DOIUrl":"10.1016/j.jpdc.2025.105091","url":null,"abstract":"<div><div>Resource management has become a major concern in dealing with performance and fairness in recent computing servers, including a wide variety of shared resources. To achieve high-performing and efficient systems, both hardware and software engineers must be thoroughly trained in effective resource management techniques. This paper introduces the GRE master course (Spanish acronym for Resource Management and Performance Evaluation in Cloud and High-Performance Workloads), which is being offered since Fall 2023. The course is taught by instructors with broad research expertise in resource management and performance evaluation. Subjects covered in this course include workload characterization, state-of-the-art resource management approaches, and performance evaluation tools and methodologies used in production systems. Management techniques are studied both in the context of HPC and cloud computing, where resource efficiency is becoming a primary concern. To enhance the learning experience, the course integrates theoretical concepts with a wide set of hands-on tasks carried out on recent real platforms. A real cloud virtualized environment is mimicked using typical software deployed in production systems such as Proxmox Virtual Environment. Students learn to use tools such as Linux Perf and Intel Vtune Profiler, which are commonly employed by researchers and practitioners to carry out typical tasks like performance bottleneck analysis from a microarchitectural perspective. Overall, the GRE course provides students with a solid foundation and skills in resource management by addressing current hot topics both in the industry and academia. Student satisfaction and learning outcomes prove the success of the GRE course and encourage us to continue in this direction.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105091"},"PeriodicalIF":3.4,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143869352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editor's note 编者按
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-08-01 Epub Date: 2025-04-18 DOI: 10.1016/j.jpdc.2025.105089
Ananth Kalyanaraman
{"title":"Editor's note","authors":"Ananth Kalyanaraman","doi":"10.1016/j.jpdc.2025.105089","DOIUrl":"10.1016/j.jpdc.2025.105089","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105089"},"PeriodicalIF":3.4,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143860371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient parameter tuning for a structure-based virtual screening HPC application 基于结构的虚拟筛选HPC应用程序的有效参数调整
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-08-01 Epub Date: 2025-04-15 DOI: 10.1016/j.jpdc.2025.105087
Bruno Guindani, Davide Gadioli, Roberto Rocco, Danilo Ardagna, Gianluca Palermo
Virtual screening applications are highly parameterized to optimize the balance between quality and execution performance. While output quality is critical, the entire screening process must be completed within a reasonable time. In fact, a slight reduction in output accuracy may be acceptable when dealing with large datasets. Finding the optimal quality-throughput trade-off depends on the specific HPC system used and should be re-evaluated with each new deployment or significant code update. This paper presents two parallel autotuning techniques for constrained optimization in distributed High-Performance Computing (HPC) environments. These techniques extend sequential Bayesian Optimization (BO) with two parallel asynchronous approaches, and they integrate predictions from Machine Learning (ML) models to help comply with constraints. Our target application is LiGen, a real-world virtual screening software for drug discovery. The proposed methods address two relevant challenges: efficient exploration of the parameter space and performance measurement using domain-specific metrics and procedures. We conduct an experimental campaign comparing the two methods with a popular state-of-the-art autotuner. Results show that our methods find configurations that are, on average, up to 35–42% better than the ones found by the autotuner and the default expert-picked LiGen configuration.
虚拟筛选应用程序高度参数化,以优化质量和执行性能之间的平衡。虽然输出质量至关重要,但整个筛选过程必须在合理的时间内完成。事实上,在处理大型数据集时,输出精度的轻微降低是可以接受的。找到最佳的质量-吞吐量权衡取决于所使用的特定HPC系统,并且应该在每次新的部署或重要的代码更新时重新评估。本文提出了分布式高性能计算(HPC)环境下约束优化的两种并行自调优技术。这些技术通过两种并行异步方法扩展了顺序贝叶斯优化(BO),并集成了机器学习(ML)模型的预测,以帮助遵守约束。我们的目标应用程序是LiGen,一个用于药物发现的现实世界虚拟筛选软件。提出的方法解决了两个相关的挑战:有效地探索参数空间和使用特定领域的度量和程序进行性能测量。我们进行了一项实验活动,将这两种方法与流行的最先进的自动调谐器进行比较。结果表明,我们的方法发现的配置平均比自动调谐器和默认专家选择的LiGen配置发现的配置好35-42%。
{"title":"Efficient parameter tuning for a structure-based virtual screening HPC application","authors":"Bruno Guindani,&nbsp;Davide Gadioli,&nbsp;Roberto Rocco,&nbsp;Danilo Ardagna,&nbsp;Gianluca Palermo","doi":"10.1016/j.jpdc.2025.105087","DOIUrl":"10.1016/j.jpdc.2025.105087","url":null,"abstract":"<div><div>Virtual screening applications are highly parameterized to optimize the balance between quality and execution performance. While output quality is critical, the entire screening process must be completed within a reasonable time. In fact, a slight reduction in output accuracy may be acceptable when dealing with large datasets. Finding the optimal quality-throughput trade-off depends on the specific HPC system used and should be re-evaluated with each new deployment or significant code update. This paper presents two parallel autotuning techniques for constrained optimization in distributed High-Performance Computing (HPC) environments. These techniques extend sequential Bayesian Optimization (BO) with two parallel asynchronous approaches, and they integrate predictions from Machine Learning (ML) models to help comply with constraints. Our target application is LiGen, a real-world virtual screening software for drug discovery. The proposed methods address two relevant challenges: efficient exploration of the parameter space and performance measurement using domain-specific metrics and procedures. We conduct an experimental campaign comparing the two methods with a popular state-of-the-art autotuner. Results show that our methods find configurations that are, on average, up to 35–42% better than the ones found by the autotuner and the default expert-picked LiGen configuration.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105087"},"PeriodicalIF":3.4,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143860372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A knowledge-driven approach to multi-objective IoT task graph scheduling in fog-cloud computing 雾云计算中多目标物联网任务图调度的知识驱动方法
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-08-01 Epub Date: 2025-03-18 DOI: 10.1016/j.jpdc.2025.105069
Hadi Gholami, Hongyang Sun
Despite the significant growth of Internet of Things (IoT), there are prominent limitations of this emerging technology, such as limited processing power and storage. Along with the expansion of IoT networks, the fog-cloud computing paradigm has been developed to optimize the provision of services to IoT users by offloading computations to the more powerful processing resources. In this paper, with the aim of optimizing multiple objectives of makespan, energy consumption, and cost, we develop a novel automatic three-module algorithm to schedule multiple task graphs offloaded from IoT devices to the fog-cloud environment. Our algorithm combines the Genetic Algorithm (GA) and the Random Forest (RF) classifier, which we call Hybrid GA-RF (HGARF). Each of the three modules has a responsibility and they are repeated sequentially to extract knowledge from the solution space in the form of IF-THEN rules. The first module is responsible for generating solutions for the training set using a GA. Here, we introduce a chromosome encoding method and a crossover operator to create diversity for multiple task graphs. By expressing a concept called bottleneck and two conditions, we also develop a mutation operator to identify and reduce the workload of certain processing centers. The second module aims at generating rules from the solutions of the training set, and to that end employs an RF classifier. Here, in addition to proposing features to construct decision trees, we develop a format for extracting and recording IF-THEN rules. The third module checks the quality of the generated rules and refines them by predicting the processing resources as well as removing less important rules from the rule set. Finally, the developed HGARF algorithm automatically determines its termination condition based on the quality of the provided solutions. Experimental results demonstrate that our method effectively improves the objective functions in large-size task graphs by up to 13.24 % compared to some state-of-the-art methods.
尽管物联网(IoT)的显著增长,但这种新兴技术存在突出的局限性,例如有限的处理能力和存储。随着物联网网络的扩展,雾云计算范式已经被开发出来,通过将计算卸载到更强大的处理资源来优化为物联网用户提供的服务。在本文中,为了优化完工时间、能耗和成本的多个目标,我们开发了一种新的自动三模块算法来调度从物联网设备卸载到雾云环境的多个任务图。我们的算法结合了遗传算法(GA)和随机森林(RF)分类器,我们称之为混合GA-RF (HGARF)。这三个模块都有各自的职责,它们依次重复,以IF-THEN规则的形式从解空间中提取知识。第一个模块负责使用遗传算法生成训练集的解。在这里,我们引入了染色体编码方法和交叉算子来创建多任务图的多样性。通过表达瓶颈和两个条件的概念,我们还开发了一个突变算子来识别和减少某些加工中心的工作量。第二个模块旨在从训练集的解中生成规则,并为此使用RF分类器。在这里,除了提出构造决策树的特征之外,我们还开发了一种用于提取和记录IF-THEN规则的格式。第三个模块检查生成的规则的质量,并通过预测处理资源以及从规则集中删除不太重要的规则来改进它们。最后,开发的HGARF算法根据所提供的解的质量自动确定其终止条件。实验结果表明,与现有的方法相比,该方法有效地提高了大型任务图的目标函数,提高了13.24%。
{"title":"A knowledge-driven approach to multi-objective IoT task graph scheduling in fog-cloud computing","authors":"Hadi Gholami,&nbsp;Hongyang Sun","doi":"10.1016/j.jpdc.2025.105069","DOIUrl":"10.1016/j.jpdc.2025.105069","url":null,"abstract":"<div><div>Despite the significant growth of Internet of Things (IoT), there are prominent limitations of this emerging technology, such as limited processing power and storage. Along with the expansion of IoT networks, the fog-cloud computing paradigm has been developed to optimize the provision of services to IoT users by offloading computations to the more powerful processing resources. In this paper, with the aim of optimizing multiple objectives of makespan, energy consumption, and cost, we develop a novel automatic three-module algorithm to schedule multiple task graphs offloaded from IoT devices to the fog-cloud environment. Our algorithm combines the Genetic Algorithm (GA) and the Random Forest (RF) classifier, which we call Hybrid GA-RF (HGARF). Each of the three modules has a responsibility and they are repeated sequentially to extract knowledge from the solution space in the form of IF-THEN rules. The first module is responsible for generating solutions for the training set using a GA. Here, we introduce a chromosome encoding method and a crossover operator to create diversity for multiple task graphs. By expressing a concept called bottleneck and two conditions, we also develop a mutation operator to identify and reduce the workload of certain processing centers. The second module aims at generating rules from the solutions of the training set, and to that end employs an RF classifier. Here, in addition to proposing features to construct decision trees, we develop a format for extracting and recording IF-THEN rules. The third module checks the quality of the generated rules and refines them by predicting the processing resources as well as removing less important rules from the rule set. Finally, the developed HGARF algorithm automatically determines its termination condition based on the quality of the provided solutions. Experimental results demonstrate that our method effectively improves the objective functions in large-size task graphs by up to 13.24 % compared to some state-of-the-art methods.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105069"},"PeriodicalIF":3.4,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143869316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HIP-RRTMG_SW: Accelerating a shortwave radiative transfer scheme under the heterogeneous-compute interface for portability (HIP) framework HIP- rrtmg_sw:在异构计算接口移植性(HIP)框架下加速短波辐射传输方案
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-08-01 Epub Date: 2025-04-23 DOI: 10.1016/j.jpdc.2025.105094
Zhenzhen Wang , Yuzhu Wang , Fei Li , Jinrong Jiang , Xiaocong Wang
With the development of higher-resolution atmospheric circulation models, the amount of calculation increases polynomially with resolution, and the calculation accuracy of physical processes is increasing rapidly. The traditional parallel computing methods based on multi-core CPUs can no longer meet the requirements of high efficiency and real-time computing performance of climate models. In order to improve the computational efficiency and scalability of the Atmospheric General Circulation Model, it is urgent to study efficient parallel algorithms and performance optimization methods for radiation physical process with massive calculations. In this paper, a heterogeneous multidimensional acceleration algorithm is proposed for the shortwave radiation transfer model (RRTMG_SW) based on HIP. Then, the HIP version of RRTMG_SW is developed, namely HIP-RRTMG_SW. In addition, combined with the “MPI + HIP” hybrid programming model, a multi-GPU implementation of RRTMG_SW is also proposed, and it makes full use of the multi-node, multi-core CPU and multi-GPU computing capability of a heterogeneous high performance computing system. Experimental results show that HIP-RRTMG_SW achieves 7.05× of acceleration in the climate simulation with 0.25 resolution using 16 AMD GPUs on the ORISE supercomputer compared with RRTMG_SW using 128 CPU cores. When using 1024 AMD GPUs, HIP-RRTMG_SW is 83.94× faster than RRTMG_SW with 128 CPU cores, indicating that the proposed multi-GPU acceleration algorithm has strong scalability.
随着高分辨率大气环流模式的发展,计算量随分辨率呈多项式增长,物理过程的计算精度迅速提高。传统的基于多核cpu的并行计算方法已不能满足气候模型对高效、实时计算性能的要求。为了提高大气环流模式的计算效率和可扩展性,迫切需要研究大规模计算辐射物理过程的高效并行算法和性能优化方法。针对基于HIP的短波辐射传输模型(RRTMG_SW),提出了一种异构多维加速算法。然后,开发了RRTMG_SW的HIP版本,即HIP-RRTMG_SW。此外,结合“MPI + HIP”混合编程模型,提出了RRTMG_SW的多gpu实现方案,充分利用了异构高性能计算系统的多节点、多核CPU和多gpu计算能力。实验结果表明,与使用128个CPU核的RRTMG_SW相比,在使用16个AMD gpu的ORISE超级计算机上,在0.25°分辨率的气候模拟中,该算法的加速度提高了7.05倍。在使用1024 AMD gpu时,比128 CPU核的RRTMG_SW快83.94倍,表明本文提出的多gpu加速算法具有较强的可扩展性。
{"title":"HIP-RRTMG_SW: Accelerating a shortwave radiative transfer scheme under the heterogeneous-compute interface for portability (HIP) framework","authors":"Zhenzhen Wang ,&nbsp;Yuzhu Wang ,&nbsp;Fei Li ,&nbsp;Jinrong Jiang ,&nbsp;Xiaocong Wang","doi":"10.1016/j.jpdc.2025.105094","DOIUrl":"10.1016/j.jpdc.2025.105094","url":null,"abstract":"<div><div>With the development of higher-resolution atmospheric circulation models, the amount of calculation increases polynomially with resolution, and the calculation accuracy of physical processes is increasing rapidly. The traditional parallel computing methods based on multi-core CPUs can no longer meet the requirements of high efficiency and real-time computing performance of climate models. In order to improve the computational efficiency and scalability of the Atmospheric General Circulation Model, it is urgent to study efficient parallel algorithms and performance optimization methods for radiation physical process with massive calculations. In this paper, a heterogeneous multidimensional acceleration algorithm is proposed for the shortwave radiation transfer model (RRTMG_SW) based on HIP. Then, the HIP version of RRTMG_SW is developed, namely HIP-RRTMG_SW. In addition, combined with the “MPI + HIP” hybrid programming model, a multi-GPU implementation of RRTMG_SW is also proposed, and it makes full use of the multi-node, multi-core CPU and multi-GPU computing capability of a heterogeneous high performance computing system. Experimental results show that HIP-RRTMG_SW achieves 7.05× of acceleration in the climate simulation with 0.25<sup>∘</sup> resolution using 16 AMD GPUs on the ORISE supercomputer compared with RRTMG_SW using 128 CPU cores. When using 1024 AMD GPUs, HIP-RRTMG_SW is 83.94× faster than RRTMG_SW with 128 CPU cores, indicating that the proposed multi-GPU acceleration algorithm has strong scalability.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105094"},"PeriodicalIF":3.4,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143869317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues) 封面1 -完整的扉页(每期)/特刊扉页(每期)
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-08-01 Epub Date: 2025-05-21 DOI: 10.1016/S0743-7315(25)00079-6
{"title":"Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)","authors":"","doi":"10.1016/S0743-7315(25)00079-6","DOIUrl":"10.1016/S0743-7315(25)00079-6","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105112"},"PeriodicalIF":3.4,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144105472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Experience with adapting to a software framework for a use-case in computational science 有适应计算科学用例的软件框架的经验
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-08-01 Epub Date: 2025-04-24 DOI: 10.1016/j.jpdc.2025.105090
V. Venkatesh Shenoi, Nisha Agrawal
The effective use of HPC infrastructure critically depends on the human resources involved in the maintenance and operation of these systems alongside the domain scientists and scientific programmers who develop scientific applications to leverage these systems. The workforce typically consists of undergraduates/postgraduates in different fields with broad areas of training in scientific computing and some programming skills with aptitude in HPC. However, there is a gap in the university-level curriculum and the skill set required to adapt to the requirements for developing scientific applications. Some efforts are there to fill this gap through workforce training programs to prepare the graduates for HPC jobs in industry/national labs. In this work, we share our experience training the workforce to adapt to AMReX (https://amrex-codes.github.io/amrex/docs_html/), a software framework developed under the Exascale computing project for scientific application development. It requires recapitulation of partial differential equations (PDEs), an indispensable mathematical model for describing physical systems across different scientific domains. We discuss our engagement with the intern, the trainees, and the development team in orienting them to scientific computing on the HPC platform, PDE solvers in particular. We highlight some of the features of the AMReX framework that helped the development team to contribute AMReX-based phase field solvers in the MicroSim phase field solver suite as a case study in adapting to the framework. These solvers can target different architectures without modifications due to the abstraction layer that provides immunity to developers for programming on different architectures. This experience can help to evolve a training model to build the HPC workforce.
HPC基础设施的有效使用主要取决于维护和操作这些系统的人力资源,以及开发利用这些系统的科学应用程序的领域科学家和科学程序员。劳动力通常由不同领域的本科生/研究生组成,他们在科学计算方面接受过广泛的培训,并具备一些在高性能计算方面有天赋的编程技能。然而,在大学水平的课程和适应发展科学应用所需的技能方面存在差距。一些努力通过劳动力培训计划来填补这一空白,为毕业生在工业/国家实验室的高性能计算工作做好准备。在这项工作中,我们分享了培训员工适应AMReX (https://amrex-codes.github.io/amrex/docs_html/)的经验,AMReX是在Exascale计算项目下为科学应用程序开发的软件框架。它需要对偏微分方程(PDEs)进行重述,偏微分方程是描述不同科学领域物理系统不可或缺的数学模型。我们与实习生、受训者和开发团队讨论如何引导他们在HPC平台上进行科学计算,特别是PDE求解器。我们重点介绍了AMReX框架的一些特性,这些特性帮助开发团队在MicroSim相场求解器套件中贡献了基于AMReX的相场求解器,作为适应该框架的案例研究。由于抽象层为开发人员在不同的体系结构上编程提供了免疫力,因此这些求解器可以针对不同的体系结构而无需修改。这种经验可以帮助发展培训模式,以建立HPC劳动力。
{"title":"Experience with adapting to a software framework for a use-case in computational science","authors":"V. Venkatesh Shenoi,&nbsp;Nisha Agrawal","doi":"10.1016/j.jpdc.2025.105090","DOIUrl":"10.1016/j.jpdc.2025.105090","url":null,"abstract":"<div><div>The effective use of HPC infrastructure critically depends on the human resources involved in the maintenance and operation of these systems alongside the domain scientists and scientific programmers who develop scientific applications to leverage these systems. The workforce typically consists of undergraduates/postgraduates in different fields with broad areas of training in scientific computing and some programming skills with aptitude in HPC. However, there is a gap in the university-level curriculum and the skill set required to adapt to the requirements for developing scientific applications. Some efforts are there to fill this gap through workforce training programs to prepare the graduates for HPC jobs in industry/national labs. In this work, we share our experience training the workforce to adapt to AMReX (<span><span>https://amrex-codes.github.io/amrex/docs_html/</span><svg><path></path></svg></span>), a software framework developed under the Exascale computing project for scientific application development. It requires recapitulation of partial differential equations (PDEs), an indispensable mathematical model for describing physical systems across different scientific domains. We discuss our engagement with the intern, the trainees, and the development team in orienting them to scientific computing on the HPC platform, PDE solvers in particular. We highlight some of the features of the AMReX framework that helped the development team to contribute AMReX-based phase field solvers in the MicroSim phase field solver suite as a case study in adapting to the framework. These solvers can target different architectures without modifications due to the abstraction layer that provides immunity to developers for programming on different architectures. This experience can help to evolve a training model to build the HPC workforce.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105090"},"PeriodicalIF":3.4,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143886152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Teaching parallel and distributed computing in a single undergraduate-level course 在单一的本科水平课程中教授并行和分布式计算
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-08-01 Epub Date: 2025-04-29 DOI: 10.1016/j.jpdc.2025.105092
Tia Newhall
As the application of parallel distributed computing (PDC) becomes ever more pervasive, it is increasingly important that undergraduate CS curricula expose students to a wide range of PDC topics in order to prepare them for the workforce. We present the curricular design and learning goals of an upper-level undergraduate course that covers a wide breadth of topics in parallel and distributed computing, while also providing students with depth of experience and development of problem solving, programming, and analysis skills. We discuss lessons learned from our experiences teaching this course over 15 years, and we discuss changes and improvements we have made in its offerings, as well as choices and trade-offs we made to achieve a balance between breadth and depth of coverage across these two huge fields. Evaluations from students support that our approach works well meeting the goals of exposing students to a broad range of PDC topics, building important PDC thinking and programming skills, and meeting other pedagogical goals of an advanced upper-level undergraduate CS course. Although initially designed as a single course due to constraints that are common to smaller schools, our experiences with this course lead us to conclude that it is a good approach for an advanced undergraduate course on PDC at any institution.
随着并行分布式计算(PDC)的应用变得越来越普遍,本科CS课程让学生接触到广泛的PDC主题,以便为他们的工作做好准备,这一点变得越来越重要。我们提出了一个高级本科课程的课程设计和学习目标,该课程涵盖了并行和分布式计算的广泛主题,同时也为学生提供了解决问题、编程和分析技能的深度经验和发展。我们讨论了15年来我们教授这门课程的经验教训,我们讨论了我们在课程中所做的改变和改进,以及我们为实现这两个巨大领域的广度和深度之间的平衡而做出的选择和权衡。学生的评价支持我们的方法很好地满足了让学生接触广泛的PDC主题,培养重要的PDC思维和编程技能,以及满足高级本科CS课程的其他教学目标的目标。虽然由于小型学校的限制,最初设计为单一课程,但我们对这门课程的经验使我们得出结论,对于任何机构的PDC高级本科课程来说,这都是一个很好的方法。
{"title":"Teaching parallel and distributed computing in a single undergraduate-level course","authors":"Tia Newhall","doi":"10.1016/j.jpdc.2025.105092","DOIUrl":"10.1016/j.jpdc.2025.105092","url":null,"abstract":"<div><div>As the application of parallel distributed computing (PDC) becomes ever more pervasive, it is increasingly important that undergraduate CS curricula expose students to a wide range of PDC topics in order to prepare them for the workforce. We present the curricular design and learning goals of an upper-level undergraduate course that covers a wide breadth of topics in parallel and distributed computing, while also providing students with depth of experience and development of problem solving, programming, and analysis skills. We discuss lessons learned from our experiences teaching this course over 15 years, and we discuss changes and improvements we have made in its offerings, as well as choices and trade-offs we made to achieve a balance between breadth and depth of coverage across these two huge fields. Evaluations from students support that our approach works well meeting the goals of exposing students to a broad range of PDC topics, building important PDC thinking and programming skills, and meeting other pedagogical goals of an advanced upper-level undergraduate CS course. Although initially designed as a single course due to constraints that are common to smaller schools, our experiences with this course lead us to conclude that it is a good approach for an advanced undergraduate course on PDC at any institution.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105092"},"PeriodicalIF":3.4,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143912437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Teaching parallel and distributed computing using data-intensive computing modules 使用数据密集型计算模块教授并行和分布式计算
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-08-01 Epub Date: 2025-05-12 DOI: 10.1016/j.jpdc.2025.105093
Michael Gowanlock
Parallel and distributed computing (PDC) courses are useful for computer science (CS) and domain science students. For CS students, PDC is a fundamental field that examines concepts relating to a range of CS subfields, such as algorithms, architecture, simulation, software, systems, among others. Students with domain science backgrounds also require PDC to carry out their research objectives, and the ongoing data revolution has exacerbated this necessity. Given the rise of data science and other data-enabled computational fields, we propose several data-intensive pedagogic modules that are used to teach PDC using message-passing programming with the Message Passing Interface (MPI). These modules employ activities that are interesting, relevant, and accessible to both computer and domain science students enrolled in graduate level programs.
Using pre- and post-module completion quizzes and anonymous free response surveys, we evaluated the efficacy of the pedagogic modules across four cohorts of students enrolled in a graduate level High Performance Computing (HPC) course at Northern Arizona University. The students have diverse educational backgrounds as some students were enrolled in programs outside of CS. These programs include electrical and computer engineering, mechanical engineering, astronomy & planetary science, bioinformatics, and ecoinformatics. Despite the multi-disciplinary backgrounds of the students, we find that the hands-on application-driven approach to teaching PDC was successful at helping students learn core PDC concepts, and that the modules are useful for facilitating online learning which was required during the COVID-19 pandemic.
并行和分布式计算(PDC)课程对计算机科学(CS)和领域科学的学生很有用。对于计算机科学专业的学生来说,PDC是一个基础领域,它研究了与一系列计算机科学子领域相关的概念,如算法、架构、仿真、软件、系统等。具有领域科学背景的学生也需要PDC来完成他们的研究目标,而正在进行的数据革命加剧了这种必要性。鉴于数据科学和其他支持数据的计算领域的兴起,我们提出了几个数据密集型教学模块,用于使用消息传递接口(MPI)的消息传递编程来教授PDC。这些模块采用的活动是有趣的,相关的,并可访问的计算机和领域科学的学生就读研究生水平的课程。采用模块完成前和模块完成后的测验和匿名自由回答调查,我们评估了四组在北亚利桑那大学注册研究生水平高性能计算(HPC)课程的学生的教学模块的有效性。这些学生有着不同的教育背景,一些学生参加了CS以外的课程。这些专业包括电气与计算机工程、机械工程、天文学等。行星科学、生物信息学和生态信息学。尽管学生具有多学科背景,但我们发现,实践应用驱动的PDC教学方法在帮助学生学习PDC核心概念方面取得了成功,并且这些模块有助于促进COVID-19大流行期间所需的在线学习。
{"title":"Teaching parallel and distributed computing using data-intensive computing modules","authors":"Michael Gowanlock","doi":"10.1016/j.jpdc.2025.105093","DOIUrl":"10.1016/j.jpdc.2025.105093","url":null,"abstract":"<div><div>Parallel and distributed computing (PDC) courses are useful for computer science (CS) and domain science students. For CS students, PDC is a fundamental field that examines concepts relating to a range of CS subfields, such as algorithms, architecture, simulation, software, systems, among others. Students with domain science backgrounds also require PDC to carry out their research objectives, and the ongoing data revolution has exacerbated this necessity. Given the rise of data science and other data-enabled computational fields, we propose several data-intensive pedagogic modules that are used to teach PDC using message-passing programming with the Message Passing Interface (MPI). These modules employ activities that are interesting, relevant, and accessible to both computer and domain science students enrolled in graduate level programs.</div><div>Using pre- and post-module completion quizzes and anonymous free response surveys, we evaluated the efficacy of the pedagogic modules across four cohorts of students enrolled in a graduate level High Performance Computing (HPC) course at Northern Arizona University. The students have diverse educational backgrounds as some students were enrolled in programs outside of CS. These programs include electrical and computer engineering, mechanical engineering, astronomy &amp; planetary science, bioinformatics, and ecoinformatics. Despite the multi-disciplinary backgrounds of the students, we find that the hands-on application-driven approach to teaching PDC was successful at helping students learn core PDC concepts, and that the modules are useful for facilitating online learning which was required during the COVID-19 pandemic.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105093"},"PeriodicalIF":3.4,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143947307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Embedded scaffolding for teaching and assessing inquiry-based hands-on laboratory on distributed systems 嵌入式脚手架用于教学和评估基于探究的分布式系统动手实验
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-07-01 Epub Date: 2025-04-03 DOI: 10.1016/j.jpdc.2025.105082
Jordi Guitart

Context

Information Technology education must cultivate proficiency on distributed systems, including strong hands-on laboratory skills, to meet the needs of the society and the industry. Given the complexity of distributed systems, any successful methodology to teach them to novice students must be scaffolded appropriately to ensure that the students acquire the required degree of expertise.

Objective

We propose a comprehensive scaffolding approach for inquiry-based hands-on laboratory on a distributed systems course, which guides not only the learning process, but also its assessment. The approach is based mainly on embedded scaffolds, namely explicit coding and experimental milestones and open questions with predefined grades, but also features contingent scaffolds provided by the teacher when additional assistance is needed.

Method

We apply the methodology in the context of the subject ‘Distributed Network Systems’ offered by our university. We compare the students' performance during three academic courses using the proposed methodology with respect to the three previous courses that were still using the former methodology. We use both visual representations and planned Analysis of Variance (ANOVA) tests to verify our hypothesis defined as a complex contrast.

Findings

We find that there is a statistically significant improvement in the students' performance when using the new methodology, both in their grades of the assignments (F(1, 75.364) = 17.770, p=6.85×105) and, more importantly, also in their grades of the exam questions about the practicals (F(1, 123.186) = 13.285, p=3.93×104).

Implications

Our results encourage other instructors to incorporate embedded scaffolds for teaching and assessing their hands-on laboratories on distributed systems.
背景信息技术教育必须培养对分布式系统的熟练程度,包括强大的动手实验技能,以满足社会和行业的需求。考虑到分布式系统的复杂性,任何向新手教授分布式系统的成功方法都必须有适当的框架,以确保学生获得所需的专业知识。目的在分布式系统课程中,我们提出了一种基于探究性实践实验的综合脚手架方法,该方法不仅指导了学习过程,而且指导了评估过程。该方法主要基于嵌入式框架,即明确的编码和实验里程碑以及预定义分数的开放性问题,但也有教师在需要额外帮助时提供的临时框架。方法我们将该方法应用于我校提供的“分布式网络系统”课题。我们比较了学生在三门学术课程中使用所提出的方法的表现,以及仍然使用前一种方法的三门课程。我们使用视觉表示和计划的方差分析(ANOVA)检验来验证我们定义为复杂对比的假设。我们发现,在使用新方法时,学生的表现在统计上有显着的改善,无论是在他们的作业成绩(F(1,75.364) = 17.770, p=6.85×10−5),更重要的是,在他们的实践考试问题的成绩(F(1,123.186) = 13.285, p=3.93×10−4)。我们的结果鼓励其他教师将嵌入式支架纳入分布式系统的教学和评估实践实验。
{"title":"Embedded scaffolding for teaching and assessing inquiry-based hands-on laboratory on distributed systems","authors":"Jordi Guitart","doi":"10.1016/j.jpdc.2025.105082","DOIUrl":"10.1016/j.jpdc.2025.105082","url":null,"abstract":"<div><h3>Context</h3><div>Information Technology education must cultivate proficiency on distributed systems, including strong hands-on laboratory skills, to meet the needs of the society and the industry. Given the complexity of distributed systems, any successful methodology to teach them to novice students must be scaffolded appropriately to ensure that the students acquire the required degree of expertise.</div></div><div><h3>Objective</h3><div>We propose a comprehensive scaffolding approach for inquiry-based hands-on laboratory on a distributed systems course, which guides not only the learning process, but also its assessment. The approach is based mainly on embedded scaffolds, namely explicit coding and experimental milestones and open questions with predefined grades, but also features contingent scaffolds provided by the teacher when additional assistance is needed.</div></div><div><h3>Method</h3><div>We apply the methodology in the context of the subject ‘Distributed Network Systems’ offered by our university. We compare the students' performance during three academic courses using the proposed methodology with respect to the three previous courses that were still using the former methodology. We use both visual representations and planned Analysis of Variance (ANOVA) tests to verify our hypothesis defined as a complex contrast.</div></div><div><h3>Findings</h3><div>We find that there is a statistically significant improvement in the students' performance when using the new methodology, both in their grades of the assignments (<em>F</em>(1, 75.364) = 17.770, <span><math><mi>p</mi><mo>=</mo><mn>6.85</mn><mo>×</mo><msup><mrow><mn>10</mn></mrow><mrow><mo>−</mo><mn>5</mn></mrow></msup></math></span>) and, more importantly, also in their grades of the exam questions about the practicals (<em>F</em>(1, 123.186) = 13.285, <span><math><mi>p</mi><mo>=</mo><mn>3.93</mn><mo>×</mo><msup><mrow><mn>10</mn></mrow><mrow><mo>−</mo><mn>4</mn></mrow></msup></math></span>).</div></div><div><h3>Implications</h3><div>Our results encourage other instructors to incorporate embedded scaffolds for teaching and assessing their hands-on laboratories on distributed systems.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"201 ","pages":"Article 105082"},"PeriodicalIF":3.4,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143792549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Parallel and Distributed Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1