首页 > 最新文献

Journal of Parallel and Distributed Computing最新文献

英文 中文
ConCeal: A Winograd convolution code template for optimising GCU in parallel 一个Winograd卷积代码模板,用于并行优化GCU
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-05-21 DOI: 10.1016/j.jpdc.2025.105108
Tian Chen , Yu-an Tan , Thar Baker , Haokai Wu , Qiuyu Zhang , Yuanzhang Li
By minimising arithmetic operations, Winograd convolution substantially reduces the computational complexity of convolution, a pivotal operation in the training and inference stages of Convolutional Neural Networks (CNNs). This study leverages the hardware architecture and capabilities of Shanghai Enflame Technology's AI accelerator, the General Computing Unit (GCU). We develop a code template named ConCeal for Winograd convolution with 3 × 3 kernels, employing a set of interrelated optimisations, including task partitioning, memory layout design, and parallelism. These optimisations fully exploit GCU's computing resources by optimising dataflow and parallelizing the execution of tasks on GCU cores, thereby enhancing Winograd convolution. Moreover, the integrated optimisations in the template are efficiently applicable to other operators, such as max pooling. Using this template, we implement and assess the performance of four Winograd convolution operators on GCU. The experimental results showcase that Conceal operators achieve a maximum of 2.04× and an average of 1.49× speedup compared to the fastest GEMM-based convolution implementations on GCU. Additionally, the ConCeal operators demonstrate competitive or superior computing resource utilisation in certain ResNet and VGG convolution layers when compared to cuDNN on RTX2080.
通过最小化算术运算,Winograd卷积大大降低了卷积的计算复杂度,卷积是卷积神经网络(cnn)训练和推理阶段的关键操作。本研究利用了上海恩焰科技人工智能加速器通用计算单元(GCU)的硬件架构和功能。我们开发了一个名为“隐藏”的代码模板,用于3x3内核的Winograd卷积,采用了一组相关的优化,包括任务分区、内存布局设计和并行性。这些优化充分利用了GCU的计算资源,优化了数据流,并在GCU核心上并行执行任务,从而增强了Winograd卷积。此外,模板中的集成优化可以有效地应用于其他操作,例如最大池。使用该模板,我们在GCU上实现并评估了四个Winograd卷积算子的性能。实验结果表明,与GCU上最快的基于gem的卷积实现相比,隐蔽算子的最大加速速度为2.04倍,平均加速速度为1.49倍。此外,与RTX2080上的cuDNN相比,在某些ResNet和VGG卷积层中,hide运算符显示出具有竞争力或更高的计算资源利用率。
{"title":"ConCeal: A Winograd convolution code template for optimising GCU in parallel","authors":"Tian Chen ,&nbsp;Yu-an Tan ,&nbsp;Thar Baker ,&nbsp;Haokai Wu ,&nbsp;Qiuyu Zhang ,&nbsp;Yuanzhang Li","doi":"10.1016/j.jpdc.2025.105108","DOIUrl":"10.1016/j.jpdc.2025.105108","url":null,"abstract":"<div><div>By minimising arithmetic operations, Winograd convolution substantially reduces the computational complexity of convolution, a pivotal operation in the training and inference stages of Convolutional Neural Networks (CNNs). This study leverages the hardware architecture and capabilities of Shanghai Enflame Technology's AI accelerator, the General Computing Unit (GCU). We develop a code template named ConCeal for Winograd convolution with 3 × 3 kernels, employing a set of interrelated optimisations, including task partitioning, memory layout design, and parallelism. These optimisations fully exploit GCU's computing resources by optimising dataflow and parallelizing the execution of tasks on GCU cores, thereby enhancing Winograd convolution. Moreover, the integrated optimisations in the template are efficiently applicable to other operators, such as max pooling. Using this template, we implement and assess the performance of four Winograd convolution operators on GCU. The experimental results showcase that Conceal operators achieve a maximum of 2.04× and an average of 1.49× speedup compared to the fastest GEMM-based convolution implementations on GCU. Additionally, the ConCeal operators demonstrate competitive or superior computing resource utilisation in certain ResNet and VGG convolution layers when compared to cuDNN on RTX2080.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"203 ","pages":"Article 105108"},"PeriodicalIF":3.4,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144114726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Thermal modeling and optimal allocation of avionics safety-critical tasks on heterogeneous MPSoCs 异构mpsoc上航空电子安全关键任务的热建模和优化分配
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-05-20 DOI: 10.1016/j.jpdc.2025.105107
Zdeněk Hanzálek , Ondřej Benedikt , Přemysl Šůcha , Pavel Zaykov , Michal Sojka
Multi-Processor Systems-on-Chip (MPSoC) can deliver high performance needed in many industrial domains, including aerospace. However, their high power consumption, combined with avionics safety standards, brings new thermal management challenges. This paper investigates techniques for offline thermal-aware allocation of periodic tasks on heterogeneous MPSoCs running at a fixed clock frequency, as required in avionics. The goal is to find the assignment of tasks to (i) cores and (ii) temporal isolation windows, as required in ARINC 653 standard, while minimizing the MPSoC temperature. To achieve that, we formulate a new optimization problem, we derive its NP-hardness, and we identify its subproblem solvable in polynomial time. Furthermore, we propose and analyze three power models, and integrate them within several novel optimization approaches based on heuristics, a black-box optimizer, and Integer Linear Programming (ILP). We perform the experimental evaluation on three popular MPSoC platforms (NXP i.MX8QM MEK, NXP i.MX8QM Ixora, NVIDIA TX2) and observe a difference of up to 5.5 °C among the tested methods (corresponding to a 22% reduction w.r.t. the ambient temperature). We also show that our method, integrating the empirical power model with the ILP, outperforms the other methods on all tested platforms.
多处理器片上系统(MPSoC)可以提供包括航空航天在内的许多工业领域所需的高性能。然而,它们的高功耗,加上航空电子安全标准,带来了新的热管理挑战。本文研究了在航空电子设备中需要的以固定时钟频率运行的异构mpsoc上的周期性任务的离线热感知分配技术。目标是找到任务分配到(i)核心和(ii)时间隔离窗口,如ARINC 653标准所要求的,同时最小化MPSoC温度。为了实现这一目标,我们提出了一个新的优化问题,我们推导了它的np -硬度,并确定了它的子问题在多项式时间内可解。此外,我们提出并分析了三种幂模型,并将它们集成到基于启发式、黑盒优化器和整数线性规划(ILP)的几种新型优化方法中。我们在三种流行的MPSoC平台(NXP i.MX8QM MEK, NXP i.MX8QM Ixora, NVIDIA TX2)上进行了实验评估,并观察到测试方法之间的差异高达5.5°C(对应于环境温度降低22%)。我们还表明,我们的方法将经验功率模型与ILP相结合,在所有测试平台上都优于其他方法。
{"title":"Thermal modeling and optimal allocation of avionics safety-critical tasks on heterogeneous MPSoCs","authors":"Zdeněk Hanzálek ,&nbsp;Ondřej Benedikt ,&nbsp;Přemysl Šůcha ,&nbsp;Pavel Zaykov ,&nbsp;Michal Sojka","doi":"10.1016/j.jpdc.2025.105107","DOIUrl":"10.1016/j.jpdc.2025.105107","url":null,"abstract":"<div><div>Multi-Processor Systems-on-Chip (MPSoC) can deliver high performance needed in many industrial domains, including aerospace. However, their high power consumption, combined with avionics safety standards, brings new thermal management challenges. This paper investigates techniques for offline thermal-aware allocation of periodic tasks on heterogeneous MPSoCs running at a fixed clock frequency, as required in avionics. The goal is to find the assignment of tasks to (i) cores and (ii) temporal isolation windows, as required in ARINC 653 standard, while minimizing the MPSoC temperature. To achieve that, we formulate a new optimization problem, we derive its NP-hardness, and we identify its subproblem solvable in polynomial time. Furthermore, we propose and analyze three power models, and integrate them within several novel optimization approaches based on heuristics, a black-box optimizer, and Integer Linear Programming (ILP). We perform the experimental evaluation on three popular MPSoC platforms (NXP i.MX8QM MEK, NXP i.MX8QM Ixora, NVIDIA TX2) and observe a difference of up to 5.5<!--> <!-->°C among the tested methods (corresponding to a 22% reduction w.r.t. the ambient temperature). We also show that our method, integrating the empirical power model with the ILP, outperforms the other methods on all tested platforms.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"203 ","pages":"Article 105107"},"PeriodicalIF":3.4,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144114761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal scheduling algorithms for software-defined radio pipelined and replicated task chains on multicore architectures 多核架构下软件定义无线电流水线和复制任务链的优化调度算法
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-05-16 DOI: 10.1016/j.jpdc.2025.105106
Diane Orhan , Laércio Lima Pilla , Denis Barthou , Adrien Cassagne , Olivier Aumage , Romain Tajan , Christophe Jégo , Camille Leroux
Software-Defined Radio (SDR) represents a move from dedicated hardware to software implementations of digital communication standards. This approach offers flexibility, shorter time to market, maintainability, and lower costs, but it requires an optimized distribution tasks in order to meet performance requirements. Thus, we study the problem of scheduling SDR linear task chains of stateless and stateful tasks for streaming processing. We model this problem as a pipelined workflow scheduling problem based on pipelined and replicated parallelism on homogeneous resources. We propose an optimal dynamic programming solution and an optimal greedy algorithm named OTAC for maximizing throughput while also minimizing resource utilization. Moreover, the optimality of the proposed scheduling algorithm is proved. We evaluate our solutions and compare their execution times and schedules to other algorithms using synthetic task chains and an implementation of the DVB-S2 communication standard on the AFF3CT SDR Domain Specific Language. Our results demonstrate how OTAC quickly finds optimal schedules, leading consistently to better results than other algorithms, or equivalent results with fewer resources.
软件定义无线电(SDR)代表了数字通信标准从专用硬件到软件实现的转变。这种方法提供了灵活性、更短的上市时间、可维护性和更低的成本,但是为了满足性能需求,它需要优化的分发任务。因此,我们研究了流处理中无状态和有状态任务的SDR线性任务链调度问题。我们将此问题建模为基于同构资源上的流水线并行和复制并行的流水线工作流调度问题。提出了一种最优动态规划方案和最优贪心算法OTAC,以实现吞吐量最大化和资源利用率最小化。此外,还证明了所提调度算法的最优性。我们评估了我们的解决方案,并使用合成任务链和在AFF3CT SDR域特定语言上实现DVB-S2通信标准,将其执行时间和时间表与其他算法进行比较。我们的结果展示了OTAC如何快速找到最佳调度,从而始终比其他算法获得更好的结果,或者用更少的资源获得相同的结果。
{"title":"Optimal scheduling algorithms for software-defined radio pipelined and replicated task chains on multicore architectures","authors":"Diane Orhan ,&nbsp;Laércio Lima Pilla ,&nbsp;Denis Barthou ,&nbsp;Adrien Cassagne ,&nbsp;Olivier Aumage ,&nbsp;Romain Tajan ,&nbsp;Christophe Jégo ,&nbsp;Camille Leroux","doi":"10.1016/j.jpdc.2025.105106","DOIUrl":"10.1016/j.jpdc.2025.105106","url":null,"abstract":"<div><div>Software-Defined Radio (SDR) represents a move from dedicated hardware to software implementations of digital communication standards. This approach offers flexibility, shorter time to market, maintainability, and lower costs, but it requires an optimized distribution tasks in order to meet performance requirements. Thus, we study the problem of scheduling SDR linear task chains of stateless and stateful tasks for streaming processing. We model this problem as a pipelined workflow scheduling problem based on pipelined and replicated parallelism on homogeneous resources. We propose an optimal dynamic programming solution and an optimal greedy algorithm named OTAC for maximizing throughput while also minimizing resource utilization. Moreover, the optimality of the proposed scheduling algorithm is proved. We evaluate our solutions and compare their execution times and schedules to other algorithms using synthetic task chains and an implementation of the DVB-S2 communication standard on the AFF3CT SDR Domain Specific Language. Our results demonstrate how OTAC quickly finds optimal schedules, leading consistently to better results than other algorithms, or equivalent results with fewer resources.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"204 ","pages":"Article 105106"},"PeriodicalIF":3.4,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144195960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lock-free simulation algorithm to enhance the performance of sequential and parallel DEVS simulators in shared-memory architectures 无锁仿真算法在共享内存架构下提高顺序和并行DEVS仿真器的性能
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-05-15 DOI: 10.1016/j.jpdc.2025.105105
Román Cárdenas , Patricia Arroba , José L. Risco-Martín
This paper presents a new algorithm for the Discrete EVent System Specification (DEVS) formalism that improves the performance of simulating complex systems by reducing the number of iterations through the model components in each simulation step. It also minimizes unnecessary visits to model components by propagating simulation routines only when necessary. Additionally, we provide two parallel versions of this new simulation algorithm that use work-stealing scheduling and avoid locking mechanisms without compromising the validity of the execution in shared-memory architectures. We implemented the proposed algorithms in the xDEVS simulator and evaluated their performance using the DEVStone synthetic benchmark. The results show that the proposed algorithms outperform state-of-the-art alternatives. For computationally intensive models, parallel implementations achieve high parallelism efficiency. Furthermore, they are more resilient to model complexity than the sequential algorithm, showing better performance for complex models even without computational overhead in state transition functions.
本文提出了一种新的离散事件系统规范(DEVS)算法,该算法通过减少模型组件在每个仿真步骤中的迭代次数来提高模拟复杂系统的性能。它还通过仅在必要时传播仿真例程来最大限度地减少对模型组件的不必要访问。此外,我们提供了这种新模拟算法的两个并行版本,它们使用偷工调度并避免锁定机制,而不会影响共享内存体系结构中执行的有效性。我们在xDEVS模拟器中实现了所提出的算法,并使用DEVStone综合基准评估了它们的性能。结果表明,所提出的算法优于最先进的替代方案。对于计算密集型模型,并行实现可以实现较高的并行效率。此外,它们比顺序算法更能适应模型复杂性,即使在状态转换函数中没有计算开销,也能在复杂模型中表现出更好的性能。
{"title":"Lock-free simulation algorithm to enhance the performance of sequential and parallel DEVS simulators in shared-memory architectures","authors":"Román Cárdenas ,&nbsp;Patricia Arroba ,&nbsp;José L. Risco-Martín","doi":"10.1016/j.jpdc.2025.105105","DOIUrl":"10.1016/j.jpdc.2025.105105","url":null,"abstract":"<div><div>This paper presents a new algorithm for the Discrete EVent System Specification (DEVS) formalism that improves the performance of simulating complex systems by reducing the number of iterations through the model components in each simulation step. It also minimizes unnecessary visits to model components by propagating simulation routines only when necessary. Additionally, we provide two parallel versions of this new simulation algorithm that use work-stealing scheduling and avoid locking mechanisms without compromising the validity of the execution in shared-memory architectures. We implemented the proposed algorithms in the xDEVS simulator and evaluated their performance using the DEVStone synthetic benchmark. The results show that the proposed algorithms outperform state-of-the-art alternatives. For computationally intensive models, parallel implementations achieve high parallelism efficiency. Furthermore, they are more resilient to model complexity than the sequential algorithm, showing better performance for complex models even without computational overhead in state transition functions.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"203 ","pages":"Article 105105"},"PeriodicalIF":3.4,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144084012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Throughput of Byzantine Broadcast 拜占庭广播的吞吐量
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-05-15 DOI: 10.1016/j.jpdc.2025.105104
Ruomu Hou, Haifeng Yu, Prateek Saxena
Byzantine broadcast is a classic problem in distributed computing, with a wide variety of target applications. This work is motivated by the emerging new application of byzantine broadcast in blockchains, which has prompted us to consider the throughput of byzantine broadcast protocols. To our knowledge, this work is the very first to investigate the throughput of byzantine broadcast. We first show that the throughput of existing byzantine broadcast protocols are all far from ideal. We then obtain a simple upper bound on the throughput of byzantine broadcast protocols, showing that no protocol can do better than this upper bound. As the central contribution of this work, we propose a novel byzantine broadcast protocol called OverlayBB. OverlayBB achieves the optimal throughput, and is the very first protocol that can do so. Our protocol does not sacrifice other aspects of the performance.
拜占庭广播是分布式计算中的一个经典问题,其目标应用非常广泛。这项工作的动机是拜占庭广播在区块链中的新应用,这促使我们考虑拜占庭广播协议的吞吐量。据我们所知,这项工作是第一次调查拜占庭广播的吞吐量。我们首先表明,现有的拜占庭广播协议的吞吐量都远不理想。然后,我们得到了拜占庭广播协议吞吐量的一个简单上界,表明没有协议可以比这个上界做得更好。作为这项工作的核心贡献,我们提出了一种新的拜占庭广播协议,称为OverlayBB。OverlayBB实现了最佳吞吐量,并且是第一个可以做到这一点的协议。我们的协议不会牺牲性能的其他方面。
{"title":"Throughput of Byzantine Broadcast","authors":"Ruomu Hou,&nbsp;Haifeng Yu,&nbsp;Prateek Saxena","doi":"10.1016/j.jpdc.2025.105104","DOIUrl":"10.1016/j.jpdc.2025.105104","url":null,"abstract":"<div><div><em>Byzantine broadcast</em> is a classic problem in distributed computing, with a wide variety of target applications. This work is motivated by the emerging new application of byzantine broadcast in blockchains, which has prompted us to consider the <em>throughput</em> of byzantine broadcast protocols. To our knowledge, this work is the very first to investigate the throughput of byzantine broadcast. We first show that the throughput of existing byzantine broadcast protocols are all far from ideal. We then obtain a simple upper bound on the throughput of byzantine broadcast protocols, showing that no protocol can do better than this upper bound. As the central contribution of this work, we propose a novel byzantine broadcast protocol called <span>OverlayBB</span>. <span>OverlayBB</span> achieves the <em>optimal</em> throughput, and is the very first protocol that can do so. Our protocol does not sacrifice other aspects of the performance.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"203 ","pages":"Article 105104"},"PeriodicalIF":3.4,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144088996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Flotilla: A scalable, modular and resilient federated learning framework for heterogeneous resources Flotilla:针对异构资源的可伸缩、模块化和弹性的联邦学习框架
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-05-14 DOI: 10.1016/j.jpdc.2025.105103
Roopkatha Banerjee , Prince Modi , Jinal Vyas , Chunduru Sri Abhijit , Tejus Chandrashekar , Harsha Varun Marisetty , Manik Gupta , Yogesh Simmhan
With the recent improvements in mobile and edge computing and rising concerns of data privacy, Federated Learning (FL) has rapidly gained popularity as a privacy-preserving, distributed machine learning methodology. Several FL frameworks have been built for testing novel FL strategies. However, most focus on validating the learning aspects of FL through pseudo-distributed simulation but not for deploying on real edge hardware in a distributed manner to meaningfully evaluate the federated aspects from a systems perspective. Current frameworks are also inherently not designed to support asynchronous aggregation, which is gaining popularity, and have limited resilience to client and server failures. We introduce Flotilla, a scalable and lightweight FL framework. It adopts a “user-first” modular design to help rapidly compose various synchronous and asynchronous FL strategies while being agnostic to the DNN architecture. It uses stateless clients and a server design that separates out the session state, which are periodically or incrementally checkpointed. We demonstrate the modularity of Flotilla by evaluating five different FL strategies for training five DNN models. We also evaluate the client and server-side fault tolerance on 200+ clients, and showcase its ability to rapidly failover within seconds. Finally, we show that Flotilla's resource usage on Raspberry Pis and Nvidia Jetson edge accelerators are comparable to or better than three state-of-the-art FL frameworks, Flower, OpenFL and FedML. It also scales significantly better compared to Flower for 1000+ clients. This positions Flotilla as a competitive candidate to build novel FL strategies on, compare them uniformly, rapidly deploy them, and perform systems research and optimizations.
随着最近移动和边缘计算的改进以及对数据隐私的日益关注,联邦学习(FL)作为一种保护隐私的分布式机器学习方法迅速受到欢迎。已经建立了几个FL框架来测试新的FL策略。然而,大多数都侧重于通过伪分布式仿真验证FL的学习方面,而不是以分布式方式部署在真正的边缘硬件上,以便从系统的角度有意义地评估联邦方面。当前的框架本身也没有设计成支持异步聚合(异步聚合越来越流行),并且对客户端和服务器故障的恢复能力有限。我们介绍Flotilla,一个可扩展的轻量级FL框架。它采用“用户优先”的模块化设计,帮助快速组合各种同步和异步FL策略,同时与DNN架构无关。它使用无状态客户机和分离会话状态的服务器设计,会话状态是定期或增量检查点。我们通过评估五种不同的FL策略来训练五种DNN模型来展示Flotilla的模块化。我们还在200多个客户机上评估了客户机和服务器端的容错性,并展示了它在几秒钟内快速故障转移的能力。最后,我们表明Flotilla在Raspberry Pis和Nvidia Jetson边缘加速器上的资源使用情况与三个最先进的FL框架Flower, OpenFL和FedML相当或更好。与Flower相比,它的可扩展性也明显更好,可以支持1000多个客户端。这使得Flotilla成为一个有竞争力的候选人,可以在其上建立新的FL策略,统一比较它们,快速部署它们,并进行系统研究和优化。
{"title":"Flotilla: A scalable, modular and resilient federated learning framework for heterogeneous resources","authors":"Roopkatha Banerjee ,&nbsp;Prince Modi ,&nbsp;Jinal Vyas ,&nbsp;Chunduru Sri Abhijit ,&nbsp;Tejus Chandrashekar ,&nbsp;Harsha Varun Marisetty ,&nbsp;Manik Gupta ,&nbsp;Yogesh Simmhan","doi":"10.1016/j.jpdc.2025.105103","DOIUrl":"10.1016/j.jpdc.2025.105103","url":null,"abstract":"<div><div>With the recent improvements in mobile and edge computing and rising concerns of data privacy, <em>Federated Learning (FL)</em> has rapidly gained popularity as a privacy-preserving, distributed machine learning methodology. Several FL frameworks have been built for testing novel FL strategies. However, most focus on validating the <em>learning</em> aspects of FL through pseudo-distributed simulation but not for deploying on real edge hardware in a distributed manner to meaningfully evaluate the <em>federated</em> aspects from a systems perspective. Current frameworks are also inherently not designed to support asynchronous aggregation, which is gaining popularity, and have limited resilience to client and server failures. We introduce <span>Flotilla</span>, a scalable and lightweight FL framework. It adopts a “user-first” modular design to help rapidly compose various synchronous and asynchronous FL strategies while being agnostic to the DNN architecture. It uses stateless clients and a server design that separates out the session state, which are periodically or incrementally checkpointed. We demonstrate the modularity of <span>Flotilla</span> by evaluating five different FL strategies for training five DNN models. We also evaluate the client and server-side fault tolerance on 200+ clients, and showcase its ability to rapidly failover within seconds. Finally, we show that <span>Flotilla</span>'s resource usage on Raspberry Pis and Nvidia Jetson edge accelerators are comparable to or better than three state-of-the-art FL frameworks, Flower, OpenFL and FedML. It also scales significantly better compared to Flower for 1000+ clients. This positions <span>Flotilla</span> as a competitive candidate to build novel FL strategies on, compare them uniformly, rapidly deploy them, and perform systems research and optimizations.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"203 ","pages":"Article 105103"},"PeriodicalIF":3.4,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144107305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Teaching parallel and distributed computing using data-intensive computing modules 使用数据密集型计算模块教授并行和分布式计算
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-05-12 DOI: 10.1016/j.jpdc.2025.105093
Michael Gowanlock
Parallel and distributed computing (PDC) courses are useful for computer science (CS) and domain science students. For CS students, PDC is a fundamental field that examines concepts relating to a range of CS subfields, such as algorithms, architecture, simulation, software, systems, among others. Students with domain science backgrounds also require PDC to carry out their research objectives, and the ongoing data revolution has exacerbated this necessity. Given the rise of data science and other data-enabled computational fields, we propose several data-intensive pedagogic modules that are used to teach PDC using message-passing programming with the Message Passing Interface (MPI). These modules employ activities that are interesting, relevant, and accessible to both computer and domain science students enrolled in graduate level programs.
Using pre- and post-module completion quizzes and anonymous free response surveys, we evaluated the efficacy of the pedagogic modules across four cohorts of students enrolled in a graduate level High Performance Computing (HPC) course at Northern Arizona University. The students have diverse educational backgrounds as some students were enrolled in programs outside of CS. These programs include electrical and computer engineering, mechanical engineering, astronomy & planetary science, bioinformatics, and ecoinformatics. Despite the multi-disciplinary backgrounds of the students, we find that the hands-on application-driven approach to teaching PDC was successful at helping students learn core PDC concepts, and that the modules are useful for facilitating online learning which was required during the COVID-19 pandemic.
并行和分布式计算(PDC)课程对计算机科学(CS)和领域科学的学生很有用。对于计算机科学专业的学生来说,PDC是一个基础领域,它研究了与一系列计算机科学子领域相关的概念,如算法、架构、仿真、软件、系统等。具有领域科学背景的学生也需要PDC来完成他们的研究目标,而正在进行的数据革命加剧了这种必要性。鉴于数据科学和其他支持数据的计算领域的兴起,我们提出了几个数据密集型教学模块,用于使用消息传递接口(MPI)的消息传递编程来教授PDC。这些模块采用的活动是有趣的,相关的,并可访问的计算机和领域科学的学生就读研究生水平的课程。采用模块完成前和模块完成后的测验和匿名自由回答调查,我们评估了四组在北亚利桑那大学注册研究生水平高性能计算(HPC)课程的学生的教学模块的有效性。这些学生有着不同的教育背景,一些学生参加了CS以外的课程。这些专业包括电气与计算机工程、机械工程、天文学等。行星科学、生物信息学和生态信息学。尽管学生具有多学科背景,但我们发现,实践应用驱动的PDC教学方法在帮助学生学习PDC核心概念方面取得了成功,并且这些模块有助于促进COVID-19大流行期间所需的在线学习。
{"title":"Teaching parallel and distributed computing using data-intensive computing modules","authors":"Michael Gowanlock","doi":"10.1016/j.jpdc.2025.105093","DOIUrl":"10.1016/j.jpdc.2025.105093","url":null,"abstract":"<div><div>Parallel and distributed computing (PDC) courses are useful for computer science (CS) and domain science students. For CS students, PDC is a fundamental field that examines concepts relating to a range of CS subfields, such as algorithms, architecture, simulation, software, systems, among others. Students with domain science backgrounds also require PDC to carry out their research objectives, and the ongoing data revolution has exacerbated this necessity. Given the rise of data science and other data-enabled computational fields, we propose several data-intensive pedagogic modules that are used to teach PDC using message-passing programming with the Message Passing Interface (MPI). These modules employ activities that are interesting, relevant, and accessible to both computer and domain science students enrolled in graduate level programs.</div><div>Using pre- and post-module completion quizzes and anonymous free response surveys, we evaluated the efficacy of the pedagogic modules across four cohorts of students enrolled in a graduate level High Performance Computing (HPC) course at Northern Arizona University. The students have diverse educational backgrounds as some students were enrolled in programs outside of CS. These programs include electrical and computer engineering, mechanical engineering, astronomy &amp; planetary science, bioinformatics, and ecoinformatics. Despite the multi-disciplinary backgrounds of the students, we find that the hands-on application-driven approach to teaching PDC was successful at helping students learn core PDC concepts, and that the modules are useful for facilitating online learning which was required during the COVID-19 pandemic.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105093"},"PeriodicalIF":3.4,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143947307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cognitive behavioural characteristics identification for remote user authentication for cybersecurity 面向网络安全的远程用户认证认知行为特征识别
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-05-01 DOI: 10.1016/j.jpdc.2025.105102
Ahmet Orun , Emre Orun , Fatih Kurugollu
Nowadays cyber-attacks keep threatening global networks and information infrastructures. Day-by-day, the threat is gradually getting more destructive and harder to counter, as the global networks continue to enlarge exponentially with limited security counter-measures. This occurrence urgently demands more sophisticated methods and techniques, such as multi-factor authentication and soft biometrics to respond to evolving threats. This paper is concerned with behavioural soft biometrics and proposes a multidisciplinary remote cognitive observation technique to meet today’s cybersecurity needs. The proposed method introduces a non-traditional “cognitive psychology” and “artificial intelligence” based approach. According to contemporary cognitive psychology research, human cognitive processes can be affected by many different personal factors and emotional states which are specific to an individual. Those factors mainly include personal perception, memory, decision-making, reasoning, learning, etc. In this study we focus on visual (graphical) perception with the support of graphical stimuli environments and investigate how such personal cognitive factors can be exploited within the cybersecurity area for remote user authentication. This technique enables remote access to the cognitive behavioural parameters of an intruder/hacker without any physical contact via online connection, disregarding the distance of the threat. The results show that cognitive stimuli provide crucial information for a behavioural user authentication system to classify the user as “authentic” or “intruder”. The ultimate goal of this work is to develop a supplementary cognitive cyber security tool for “next generation” secure online banking, finance or trade systems.
当前,网络攻击不断威胁着全球网络和信息基础设施。随着全球网络继续呈指数级增长,而安全应对措施却有限,这种威胁的破坏性日益增强,也越来越难以应对。这种情况迫切需要更复杂的方法和技术,如多因素认证和软生物识别技术来应对不断变化的威胁。本文关注行为软生物识别技术,提出了一种多学科远程认知观察技术,以满足当今的网络安全需求。该方法引入了一种非传统的“认知心理学”和基于“人工智能”的方法。根据当代认知心理学的研究,人类的认知过程可以受到许多不同的个人因素和个人特定的情绪状态的影响。这些因素主要包括个人感知、记忆、决策、推理、学习等。在本研究中,我们将重点放在图形刺激环境下的视觉(图形)感知上,并研究如何在网络安全领域内利用这些个人认知因素进行远程用户身份验证。这种技术可以远程访问入侵者/黑客的认知行为参数,而无需通过在线连接进行任何物理接触,而不考虑威胁的距离。结果表明,认知刺激为行为用户认证系统将用户分类为“真实”或“入侵者”提供了关键信息。这项工作的最终目标是为“下一代”安全的网上银行、金融或贸易系统开发一种补充的认知网络安全工具。
{"title":"Cognitive behavioural characteristics identification for remote user authentication for cybersecurity","authors":"Ahmet Orun ,&nbsp;Emre Orun ,&nbsp;Fatih Kurugollu","doi":"10.1016/j.jpdc.2025.105102","DOIUrl":"10.1016/j.jpdc.2025.105102","url":null,"abstract":"<div><div>Nowadays cyber-attacks keep threatening global networks and information infrastructures. Day-by-day, the threat is gradually getting more destructive and harder to counter, as the global networks continue to enlarge exponentially with limited security counter-measures. This occurrence urgently demands more sophisticated methods and techniques, such as multi-factor authentication and soft biometrics to respond to evolving threats. This paper is concerned with behavioural soft biometrics and proposes a multidisciplinary remote cognitive observation technique to meet today’s cybersecurity needs. The proposed method introduces a non-traditional “cognitive psychology” and “artificial intelligence” based approach. According to contemporary cognitive psychology research, human cognitive processes can be affected by many different personal factors and emotional states which are specific to an individual. Those factors mainly include personal perception, memory, decision-making, reasoning, learning, etc. In this study we focus on visual (graphical) perception with the support of graphical stimuli environments and investigate how such personal cognitive factors can be exploited within the cybersecurity area for remote user authentication. This technique enables remote access to the cognitive behavioural parameters of an intruder/hacker without any physical contact via online connection, disregarding the distance of the threat. The results show that cognitive stimuli provide crucial information for a behavioural user authentication system to classify the user as “authentic” or “intruder”. The ultimate goal of this work is to develop a supplementary cognitive cyber security tool for “next generation” secure online banking, finance or trade systems.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105102"},"PeriodicalIF":3.4,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143923475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Teaching parallel and distributed computing in a single undergraduate-level course 在单一的本科水平课程中教授并行和分布式计算
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-04-29 DOI: 10.1016/j.jpdc.2025.105092
Tia Newhall
As the application of parallel distributed computing (PDC) becomes ever more pervasive, it is increasingly important that undergraduate CS curricula expose students to a wide range of PDC topics in order to prepare them for the workforce. We present the curricular design and learning goals of an upper-level undergraduate course that covers a wide breadth of topics in parallel and distributed computing, while also providing students with depth of experience and development of problem solving, programming, and analysis skills. We discuss lessons learned from our experiences teaching this course over 15 years, and we discuss changes and improvements we have made in its offerings, as well as choices and trade-offs we made to achieve a balance between breadth and depth of coverage across these two huge fields. Evaluations from students support that our approach works well meeting the goals of exposing students to a broad range of PDC topics, building important PDC thinking and programming skills, and meeting other pedagogical goals of an advanced upper-level undergraduate CS course. Although initially designed as a single course due to constraints that are common to smaller schools, our experiences with this course lead us to conclude that it is a good approach for an advanced undergraduate course on PDC at any institution.
随着并行分布式计算(PDC)的应用变得越来越普遍,本科CS课程让学生接触到广泛的PDC主题,以便为他们的工作做好准备,这一点变得越来越重要。我们提出了一个高级本科课程的课程设计和学习目标,该课程涵盖了并行和分布式计算的广泛主题,同时也为学生提供了解决问题、编程和分析技能的深度经验和发展。我们讨论了15年来我们教授这门课程的经验教训,我们讨论了我们在课程中所做的改变和改进,以及我们为实现这两个巨大领域的广度和深度之间的平衡而做出的选择和权衡。学生的评价支持我们的方法很好地满足了让学生接触广泛的PDC主题,培养重要的PDC思维和编程技能,以及满足高级本科CS课程的其他教学目标的目标。虽然由于小型学校的限制,最初设计为单一课程,但我们对这门课程的经验使我们得出结论,对于任何机构的PDC高级本科课程来说,这都是一个很好的方法。
{"title":"Teaching parallel and distributed computing in a single undergraduate-level course","authors":"Tia Newhall","doi":"10.1016/j.jpdc.2025.105092","DOIUrl":"10.1016/j.jpdc.2025.105092","url":null,"abstract":"<div><div>As the application of parallel distributed computing (PDC) becomes ever more pervasive, it is increasingly important that undergraduate CS curricula expose students to a wide range of PDC topics in order to prepare them for the workforce. We present the curricular design and learning goals of an upper-level undergraduate course that covers a wide breadth of topics in parallel and distributed computing, while also providing students with depth of experience and development of problem solving, programming, and analysis skills. We discuss lessons learned from our experiences teaching this course over 15 years, and we discuss changes and improvements we have made in its offerings, as well as choices and trade-offs we made to achieve a balance between breadth and depth of coverage across these two huge fields. Evaluations from students support that our approach works well meeting the goals of exposing students to a broad range of PDC topics, building important PDC thinking and programming skills, and meeting other pedagogical goals of an advanced upper-level undergraduate CS course. Although initially designed as a single course due to constraints that are common to smaller schools, our experiences with this course lead us to conclude that it is a good approach for an advanced undergraduate course on PDC at any institution.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105092"},"PeriodicalIF":3.4,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143912437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues) 封面1 -完整的扉页(每期)/特刊扉页(每期)
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-04-25 DOI: 10.1016/S0743-7315(25)00065-6
{"title":"Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)","authors":"","doi":"10.1016/S0743-7315(25)00065-6","DOIUrl":"10.1016/S0743-7315(25)00065-6","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"201 ","pages":"Article 105098"},"PeriodicalIF":3.4,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143874679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Parallel and Distributed Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1