首页 > 最新文献

Journal of Parallel and Distributed Computing最新文献

英文 中文
Revisiting I/O bandwidth-sharing strategies for HPC applications 重新审视高性能计算应用的 I/O 带宽共享策略
IF 3.8 3区 计算机科学 Q1 Mathematics Pub Date : 2024-02-01 DOI: 10.1016/j.jpdc.2024.104863
A. Benoit, T. Hérault, Lucas Perotin, Yves Robert, F. Vivien
{"title":"Revisiting I/O bandwidth-sharing strategies for HPC applications","authors":"A. Benoit, T. Hérault, Lucas Perotin, Yves Robert, F. Vivien","doi":"10.1016/j.jpdc.2024.104863","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104863","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139818636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An edge architecture for enabling autonomous aerial navigation with embedded collision avoidance through remote nonlinear model predictive control 通过远程非线性模型预测控制实现嵌入式防撞自主空中导航的边缘架构
IF 3.8 3区 计算机科学 Q1 Mathematics Pub Date : 2024-01-29 DOI: 10.1016/j.jpdc.2024.104849
Achilleas Santi Seisa, Björn Lindqvist, Sumeet Gajanan Satpute, George Nikolakopoulos

In this article, we present an edge-based architecture for enhancing the autonomous capabilities of resource-constrained aerial robots by enabling a remote nonlinear model predictive control scheme, which can be computationally heavy to run on the aerial robots' onboard processors. The nonlinear model predictive control is used to control the trajectory of an unmanned aerial vehicle while detecting, and preventing potential collisions. The proposed edge architecture enables trajectory recalculation for resource-constrained unmanned aerial vehicles in relatively real-time, which will allow them to have fully autonomous behaviors. The architecture is implemented with a remote Kubernetes cluster on the edge side, and it is evaluated on an unmanned aerial vehicle as our controllable robot, while the robotic operating system is used for managing the source codes, and overall communication. With the utilization of edge computing and the architecture presented in this work, we can overcome computational limitations, that resource-constrained robots have, and provide or improve features that are essential for autonomous missions. At the same time, we can minimize the relative travel time delays for time-critical missions over the edge, in comparison to the cloud. We investigate the validity of this hypothesis by evaluating the system's behavior through a series of experiments by utilizing either the unmanned aerial vehicle or the edge resources for the collision avoidance mission.

在本文中,我们提出了一种基于边缘的架构,通过启用远程非线性模型预测控制方案来增强资源受限的空中机器人的自主能力。非线性模型预测控制用于控制无人飞行器的轨迹,同时检测和防止潜在的碰撞。所提出的边缘架构能够相对实时地为资源受限的无人驾驶飞行器重新计算轨迹,从而使其具有完全自主的行为。该架构是通过边缘侧的远程 Kubernetes 集群实现的,并在无人驾驶飞行器上进行了评估,该飞行器是我们的可控机器人,而机器人操作系统则用于管理源代码和整体通信。利用边缘计算和本作品中介绍的架构,我们可以克服资源有限的机器人在计算方面的限制,并提供或改进对自主任务至关重要的功能。同时,与云计算相比,我们可以通过边缘计算最大限度地减少时间紧迫任务的相对旅行时间延迟。我们通过一系列实验,利用无人飞行器或边缘资源执行防撞任务,评估系统的行为,从而研究这一假设的正确性。
{"title":"An edge architecture for enabling autonomous aerial navigation with embedded collision avoidance through remote nonlinear model predictive control","authors":"Achilleas Santi Seisa,&nbsp;Björn Lindqvist,&nbsp;Sumeet Gajanan Satpute,&nbsp;George Nikolakopoulos","doi":"10.1016/j.jpdc.2024.104849","DOIUrl":"10.1016/j.jpdc.2024.104849","url":null,"abstract":"<div><p>In this article, we present an edge-based architecture for enhancing the autonomous capabilities of resource-constrained aerial robots by enabling a remote nonlinear model predictive control scheme, which can be computationally heavy to run on the aerial robots' onboard processors. The nonlinear model predictive control is used to control the trajectory of an unmanned aerial vehicle while detecting, and preventing potential collisions. The proposed edge architecture enables trajectory recalculation for resource-constrained unmanned aerial vehicles in relatively real-time, which will allow them to have fully autonomous behaviors. The architecture is implemented with a remote Kubernetes cluster on the edge side, and it is evaluated on an unmanned aerial vehicle as our controllable robot, while the robotic operating system is used for managing the source codes, and overall communication. With the utilization of edge computing and the architecture presented in this work, we can overcome computational limitations, that resource-constrained robots have, and provide or improve features that are essential for autonomous missions. At the same time, we can minimize the relative travel time delays for time-critical missions over the edge, in comparison to the cloud. We investigate the validity of this hypothesis by evaluating the system's behavior through a series of experiments by utilizing either the unmanned aerial vehicle or the edge resources for the collision avoidance mission.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0743731524000133/pdfft?md5=169e2b20b28c91c01823d3205e5e5fe7&pid=1-s2.0-S0743731524000133-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139588496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An active queue management for wireless sensor networks with priority scheduling strategy 无线传感器网络的主动队列管理与优先调度策略
IF 3.8 3区 计算机科学 Q1 Mathematics Pub Date : 2024-01-26 DOI: 10.1016/j.jpdc.2024.104848
Changzhen Zhang, Jun Yang, Ning Wang

In Wireless Sensor Networks (WSNs), the packet congestion will lead to high delay and high packet loss rate, which severely affects the timely transmission of real-time packets. As a congestion control method, Random Early Detection (RED) is able to stabilize the queue length at a low level. However, it does not classify the data of WSNs to achieve a targeted queue management. Since real-time packets are more urgent and important than non-real-time packets, differential packets scheduling and queue management are necessary. To deal with these problems, we propose an Active Queue Management (AQM) method called Classified Enhanced Random Early Detection (CERED). In CERED, the preemption priority is conferred on real-time packets, and the queue management with enhanced initial drop probability is implemented for non-real-time packets. Next, we develop a preemptive priority M/M/1/C vacation queueing model with queue management to evaluate the proposed method, and the finite-state matrix geometry method is used to solve the stationary distribution of the queueing model. Then we formulate a non-linear integer programming problem for the minimum delay of real-time packets, which subjects to constraints on the steady state and system cost. Finally, a numerical example is given to show the effectiveness of the proposed method.

在无线传感器网络(WSN)中,数据包拥塞会导致高延迟和高数据包丢失率,严重影响实时数据包的及时传输。作为一种拥塞控制方法,随机早期检测(RED)能将队列长度稳定在较低水平。然而,它并不能对 WSN 的数据进行分类,从而实现有针对性的队列管理。由于实时数据包比非实时数据包更紧急、更重要,因此有必要进行差异化数据包调度和队列管理。为解决这些问题,我们提出了一种名为分类增强随机早期检测(CERED)的主动队列管理(AQM)方法。在 CERED 中,对实时数据包赋予抢占优先权,对非实时数据包实施具有增强初始丢弃概率的队列管理。接下来,我们建立了一个带队列管理的抢占优先 M/M/1/C 假期队列模型来评估所提出的方法,并使用有限状态矩阵几何方法求解队列模型的静态分布。然后,我们针对实时数据包的最小延迟提出了一个非线性整数编程问题,该问题受制于稳态和系统成本。最后,我们给出了一个数值示例来说明所提方法的有效性。
{"title":"An active queue management for wireless sensor networks with priority scheduling strategy","authors":"Changzhen Zhang,&nbsp;Jun Yang,&nbsp;Ning Wang","doi":"10.1016/j.jpdc.2024.104848","DOIUrl":"10.1016/j.jpdc.2024.104848","url":null,"abstract":"<div><p><span><span>In Wireless Sensor Networks (WSNs), the packet congestion will lead to high delay and high </span>packet loss<span> rate, which severely affects the timely transmission of real-time packets. As a congestion control method<span><span>, Random Early Detection (RED) is able to stabilize the queue length at a low level. However, it does not classify the data of WSNs to achieve a targeted queue management. Since real-time packets are more urgent and important than non-real-time packets, differential </span>packets scheduling<span> and queue management are necessary. To deal with these problems, we propose an Active Queue Management (AQM) method called Classified Enhanced Random Early Detection (CERED). In CERED, the preemption priority is conferred on real-time packets, and the queue management with enhanced initial drop probability is implemented for non-real-time packets. Next, we develop a preemptive priority M/M/1/</span></span></span></span><em>C</em><span> vacation queueing model with queue management to evaluate the proposed method, and the finite-state matrix geometry method is used to solve the stationary distribution of the queueing model. Then we formulate a non-linear integer programming problem for the minimum delay of real-time packets, which subjects to constraints on the steady state and system cost. Finally, a numerical example is given to show the effectiveness of the proposed method.</span></p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139588416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multi-objective grey-wolf optimization based approach for scheduling on cloud platforms 基于多目标灰狼优化的云平台调度方法
IF 3.8 3区 计算机科学 Q1 Mathematics Pub Date : 2024-01-22 DOI: 10.1016/j.jpdc.2024.104847
Minhaj Ahmad Khan , Raihan ur Rasool

A cloud computing environment processes user workloads or tasks by exploiting its high performance computational, storage, of reducing and network resources. The virtual machines in the cloud environment are allocated to tasks with the aim of reducing overall execution time. The use of high performance resources incurs monetary costs, as well as high power consumption. The heuristic based approaches implemented for scheduling tasks are unable to cope with the complexity of optimizing multiple parameters. In this paper, we propose a multi-objective grey-wolf optimization based algorithm for scheduling tasks on cloud platforms. The proposed algorithm targets to minimize schedule length (overall execution time), energy consumption, and monetary cost required for executing tasks. For optimization, the algorithm incorporates steps that are performed iteratively for mimicking the behavior of grey wolves attacking their prey. It uses discrete values for positioning wolves for encircling and attacking the prey. The assignment of tasks to virtual machines is performed using the solution found after multi-objective optimization that incorporates weighted sorting for arranging solutions. Our experimentation performed using the CloudSim framework shows that the proposed algorithm outperforms other algorithms with performance improvement ranging from 3.98% to 16.07%, while considering the schedule length, monetary cost, and energy consumption.

云计算环境通过利用其高性能计算、存储和网络资源来处理用户工作负载或任务。云环境中的虚拟机被分配给任务,目的是缩短整体执行时间。使用高性能资源会产生货币成本和高能耗。基于启发式的任务调度方法无法应对优化多个参数的复杂性。在本文中,我们提出了一种基于灰狼优化的多目标算法,用于在云平台上调度任务。所提算法的目标是最大限度地减少执行任务所需的计划长度(总体执行时间)、能耗和货币成本。为了进行优化,该算法采用了模仿灰狼攻击猎物行为的迭代步骤。它使用离散值来定位狼群,以便包围和攻击猎物。将任务分配给虚拟机时,使用的是多目标优化后找到的解决方案,该方案结合了加权排序来安排解决方案。我们使用 CloudSim 框架进行的实验表明,在考虑计划长度、货币成本和能源消耗的情况下,所提出的算法优于其他算法,性能提高了 3.98% 至 16.07%。
{"title":"A multi-objective grey-wolf optimization based approach for scheduling on cloud platforms","authors":"Minhaj Ahmad Khan ,&nbsp;Raihan ur Rasool","doi":"10.1016/j.jpdc.2024.104847","DOIUrl":"10.1016/j.jpdc.2024.104847","url":null,"abstract":"<div><p><span>A cloud computing environment processes user workloads or tasks by exploiting its high performance computational, storage, of reducing and network resources. The virtual machines in the cloud environment are allocated to tasks with the aim of reducing overall execution time. The use of high performance resources incurs monetary costs, as well as high </span>power consumption. The heuristic based approaches implemented for scheduling tasks are unable to cope with the complexity of optimizing multiple parameters. In this paper, we propose a multi-objective grey-wolf optimization based algorithm for scheduling tasks on cloud platforms. The proposed algorithm targets to minimize schedule length (overall execution time), energy consumption, and monetary cost required for executing tasks. For optimization, the algorithm incorporates steps that are performed iteratively for mimicking the behavior of grey wolves attacking their prey. It uses discrete values for positioning wolves for encircling and attacking the prey. The assignment of tasks to virtual machines is performed using the solution found after multi-objective optimization that incorporates weighted sorting for arranging solutions. Our experimentation performed using the CloudSim framework shows that the proposed algorithm outperforms other algorithms with performance improvement ranging from 3.98% to 16.07%, while considering the schedule length, monetary cost, and energy consumption.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139553778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Antipaxos: Taking interactive consistency to the next level 安提帕索斯:让互动一致性更上一层楼
IF 3.8 3区 计算机科学 Q1 Mathematics Pub Date : 2024-01-17 DOI: 10.1016/j.jpdc.2024.104839
Chunyu Mao , Wojciech Golab , Bernard Wong

Classical Paxos-like consensus protocols limit system scalability due to a single leader and the inability to process conflicting proposals in parallel. We introduce a novel agreement protocol, called Antipaxos, that instead reaches agreement on a collection of proposals using an efficient leaderless fast path when the environment is synchronous and failure-free, and falls back on a more elaborate slow path to handle other cases. We first specify the main safety property of Antipaxos by formalizing a new agreement problem called k-Interactive Consistency (k-IC). Then, we present a solution to this problem in the Byzantine failure model. We prove safety and liveness, and also present an experimental performance evaluation in the Amazon cloud. Our experiments show that Antipaxos achieves several-fold higher failure-free peak throughput than Mir-BFT. The inherent efficiency of our approach stems from the low message complexity of the fast path: agreement on n batches of conflict-prone proposals is achieved using only Θ(n2) messages in one consensus cycle, or Θ(n) amortized messages per batch.

由于只有一个领导者,而且无法并行处理相互冲突的提议,类似 Paxos 的经典共识协议限制了系统的可扩展性。我们引入了一种名为 Antipaxos 的新型协议,在环境同步且无故障的情况下,该协议使用高效的无领导快速路径就一系列提议达成一致,而在其他情况下则使用更复杂的慢速路径来处理。我们首先通过形式化一个名为 "k-交互一致性(k-IC)"的新协议问题来说明 Antipaxos 的主要安全特性。然后,我们在拜占庭故障模型中提出了该问题的解决方案。我们证明了安全性和有效性,并在亚马逊云中进行了性能评估。实验表明,Antipaxos 的无故障峰值吞吐量比 Mir-BFT 高出数倍。我们方法的内在效率源于快速路径的低报文复杂性:在一个共识周期内,只需使用 Θ(n2) 条报文,或每批使用 Θ(n) 条摊销报文,就能就 n 批易发生冲突的提案达成一致。
{"title":"Antipaxos: Taking interactive consistency to the next level","authors":"Chunyu Mao ,&nbsp;Wojciech Golab ,&nbsp;Bernard Wong","doi":"10.1016/j.jpdc.2024.104839","DOIUrl":"10.1016/j.jpdc.2024.104839","url":null,"abstract":"<div><p>Classical Paxos-like consensus protocols limit system scalability due to a single leader and the inability to process conflicting proposals in parallel. We introduce a novel agreement protocol, called Antipaxos, that instead reaches agreement on a collection of proposals using an efficient leaderless fast path when the environment is synchronous and failure-free, and falls back on a more elaborate slow path to handle other cases. We first specify the main safety property of Antipaxos by formalizing a new agreement problem called <em>k</em>-<em>Interactive Consistency</em> (<em>k</em>-<em>IC</em>). Then, we present a solution to this problem in the Byzantine failure model. We prove safety and liveness, and also present an experimental performance evaluation in the Amazon cloud. Our experiments show that Antipaxos achieves several-fold higher failure-free peak throughput than Mir-BFT. The inherent efficiency of our approach stems from the low message complexity of the fast path: agreement on <em>n</em> batches of conflict-prone proposals is achieved using only <span><math><mi>Θ</mi><mo>(</mo><msup><mrow><mi>n</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>)</mo></math></span> messages in one consensus cycle, or <span><math><mi>Θ</mi><mo>(</mo><mi>n</mi><mo>)</mo></math></span> amortized messages per batch.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0743731524000030/pdfft?md5=56b477c7bd90e57e09dc0b78ca7891dc&pid=1-s2.0-S0743731524000030-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139495260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues) 封面 1 - 完整扉页(常规期刊)/特刊扉页(特刊)
IF 3.8 3区 计算机科学 Q1 Mathematics Pub Date : 2024-01-16 DOI: 10.1016/S0743-7315(24)00007-8
{"title":"Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)","authors":"","doi":"10.1016/S0743-7315(24)00007-8","DOIUrl":"https://doi.org/10.1016/S0743-7315(24)00007-8","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0743731524000078/pdfft?md5=a3e8019093c0d8c91175061677e5cf8e&pid=1-s2.0-S0743731524000078-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139479948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HyLAC: Hybrid linear assignment solver in CUDA HyLAC:CUDA 中的混合线性赋值求解器
IF 3.8 3区 计算机科学 Q1 Mathematics Pub Date : 2024-01-15 DOI: 10.1016/j.jpdc.2024.104838
Samiran Kawtikwar , Rakesh Nagi

The Linear Assignment Problem (LAP) is a fundamental combinatorial optimization problem with a wide range of applications. Over the years, significant progress has been made in developing efficient algorithms to solve the LAP, particularly in the realm of high-performance computing, leading to remarkable reductions in computation time. In recent years, hardware improvements in General Purpose Graphics Processing Units (GPGPUs) have shown promise in meeting the ever-increasing compute bandwidth requirements. This has attracted researchers to develop GPU-accelerated algorithms to solve the LAP.

Recent work in the GPU domain has uncovered parallelism available in the problem structure to achieve significant performance improvements. However, each solution presented so far targets either sparse or dense instances of the problem and has some scope for improvement. The Hungarian algorithm is one of the most famous approaches to solving the LAP in polynomial time. Hungarian algorithm has classical O(N4) (Munkres') and tree based O(N3) (Lawler's) implementations. It is well established that the Munkres' implementation is faster for sparse LAP instances while the Lawler's implementation is faster for dense instances. In this work, we blend the GPU implementations of Munkres' and Lawler's to develop a Hybrid GPU-accelerated solver for LAP that switches between the two implementations based on available sparsity. Also, we improve the existing GPU implementations to reduce memory contention, minimize CPU-GPU synchronizations, and coalesced memory access. The resulting solver (HyLAC) works faster than existing CPU/GPU LAP solvers for sparse as well as dense problem instances. HyLAC achieves a speedup of up to 6.14× over existing state-of-the-art GPU implementation when run on the same hardware. We also develop an implementation to solve a list of small LAPs (tiled LAP), which is particularly useful in the optimization domain. This tiled LAP solver performs 22.59× faster than the existing implementation.

线性赋值问题(LAP)是一个应用广泛的基本组合优化问题。多年来,在开发解决线性赋值问题的高效算法方面取得了重大进展,特别是在高性能计算领域,计算时间显著缩短。近年来,通用图形处理器(GPGPU)的硬件改进在满足不断增长的计算带宽需求方面显示出了前景。这吸引了研究人员开发 GPU 加速算法来求解 LAP。最近在 GPU 领域开展的工作发现了问题结构中的并行性,从而显著提高了性能。不过,迄今为止提出的每种解决方案都针对问题的稀疏或密集实例,还有一定的改进空间。匈牙利算法是在多项式时间内求解 LAP 的最著名方法之一。匈牙利算法有经典的 O(N4)(Munkres's)和基于树的 O(N3)(Lawler's)实现。众所周知,对于稀疏的 LAP 实例,Munkres 算法的实现速度更快,而对于密集的实例,Lawler 算法的实现速度更快。在这项工作中,我们融合了 Munkres 和 Lawler 的 GPU 实现,为 LAP 开发了一种混合 GPU 加速求解器,可根据可用稀疏度在两种实现之间切换。此外,我们还改进了现有的 GPU 实现,以减少内存争用、最小化 CPU-GPU 同步和凝聚内存访问。由此产生的求解器(HyLAC)在稀疏和密集问题实例上的运行速度都比现有的 CPU/GPU LAP 求解器快。在相同硬件上运行时,HyLAC 比现有最先进的 GPU 实现速度提高了 6.14 倍。我们还开发了一种用于求解小型 LAP 列表(平铺 LAP)的实现方法,这在优化领域特别有用。这种平铺 LAP 求解器的性能比现有实现快 22.59 倍。
{"title":"HyLAC: Hybrid linear assignment solver in CUDA","authors":"Samiran Kawtikwar ,&nbsp;Rakesh Nagi","doi":"10.1016/j.jpdc.2024.104838","DOIUrl":"10.1016/j.jpdc.2024.104838","url":null,"abstract":"<div><p>The Linear Assignment Problem (LAP) is a fundamental combinatorial optimization problem with a wide range of applications. Over the years, significant progress has been made in developing efficient algorithms to solve the LAP, particularly in the realm of high-performance computing, leading to remarkable reductions in computation time. In recent years, hardware improvements in General Purpose Graphics Processing Units (GPGPUs) have shown promise in meeting the ever-increasing compute bandwidth requirements. This has attracted researchers to develop GPU-accelerated algorithms to solve the LAP.</p><p>Recent work in the GPU domain has uncovered parallelism available in the problem structure to achieve significant performance improvements. However, each solution presented so far targets either sparse or dense instances of the problem and has some scope for improvement. The Hungarian algorithm is one of the most famous approaches to solving the LAP in polynomial time. Hungarian algorithm has classical <span><math><mi>O</mi><mo>(</mo><msup><mrow><mi>N</mi></mrow><mrow><mn>4</mn></mrow></msup><mo>)</mo></math></span> (<em>Munkres'</em>) and tree based <span><math><mi>O</mi><mo>(</mo><msup><mrow><mi>N</mi></mrow><mrow><mn>3</mn></mrow></msup><mo>)</mo></math></span> (<em>Lawler's</em>) implementations. It is well established that the <em>Munkres'</em> implementation is faster for sparse LAP instances while the <em>Lawler's</em> implementation is faster for dense instances. In this work, we blend the GPU implementations of <em>Munkres'</em> and <em>Lawler's</em> to develop a Hybrid GPU-accelerated solver for LAP that switches between the two implementations based on available sparsity. Also, we improve the existing GPU implementations to reduce memory contention, minimize CPU-GPU synchronizations, and coalesced memory access. The resulting solver (HyLAC) works faster than existing CPU/GPU LAP solvers for sparse as well as dense problem instances. HyLAC achieves a speedup of up to 6.14× over existing state-of-the-art GPU implementation when run on the same hardware. We also develop an implementation to solve a list of small LAPs (tiled LAP), which is particularly useful in the optimization domain. This tiled LAP solver performs 22.59× faster than the existing implementation.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0743731524000029/pdfft?md5=2d6ab397b84658306f77d4de8c57dfed&pid=1-s2.0-S0743731524000029-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139475615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reliable IoT analytics at scale 可靠的大规模物联网分析
IF 3.8 3区 计算机科学 Q1 Mathematics Pub Date : 2024-01-15 DOI: 10.1016/j.jpdc.2024.104840
Panagiotis Gkikopoulos , Peter Kropf , Valerio Schiavoni , Josef Spillner

Societies and legislations are moving towards automated decision-making based on measured data in safety-critical environments. Over the next years, density and frequency of measurements will increase to generate more insights and get a more solid basis for decisions, including through redundant low-cost sensor deployments. The resulting data characteristics lead to large-scale system design in which small input data errors may lead to severe cascading problems including ultimately wrong decisions. To ensure internal data consistency to mitigate this risk in such IoT environments, fast-paced data fusion and consensus among redundant measurements need to be achieved. In this context, we introduce history-aware sensor fusion powered by accurate voting with clustering as a promising approach to achieve fast and informed consensus, which can converge to the output up to 4X faster than the state of the art history-based voting. Leveraging three case studies, we investigate different voting schemes and show how this approach can improve data accuracy by up to 30% and performance by up to 12% compared to state-of-the-art sensor fusion approaches. We furthermore contribute a specification format for easily deploying our methods in practice and use it to develop a pilot implementation.

社会和法律正朝着基于安全关键环境中的测量数据进行自动决策的方向发展。在未来几年中,测量的密度和频率都将增加,以产生更多的洞察力,为决策提供更坚实的基础,包括通过冗余的低成本传感器部署。由此产生的数据特征会导致大规模的系统设计,其中微小的输入数据错误可能会导致严重的连锁问题,包括最终的错误决策。为了确保内部数据的一致性以降低这种物联网环境中的风险,需要在冗余测量中实现快速的数据融合和共识。在此背景下,我们引入了历史感知传感器融合技术,该技术由带有聚类的精确投票驱动,是实现快速和知情共识的一种很有前途的方法,其收敛到输出的速度比最先进的基于历史的投票快 4 倍。通过三个案例研究,我们调查了不同的投票方案,并展示了与最先进的传感器融合方法相比,这种方法如何将数据准确性提高 30%,将性能提高 12%。此外,我们还为在实践中轻松部署我们的方法提供了一种规范格式,并利用它开发了一个试点实施方案。
{"title":"Reliable IoT analytics at scale","authors":"Panagiotis Gkikopoulos ,&nbsp;Peter Kropf ,&nbsp;Valerio Schiavoni ,&nbsp;Josef Spillner","doi":"10.1016/j.jpdc.2024.104840","DOIUrl":"10.1016/j.jpdc.2024.104840","url":null,"abstract":"<div><p>Societies and legislations are moving towards automated decision-making based on measured data in safety-critical environments. Over the next years, density and frequency of measurements will increase to generate more insights and get a more solid basis for decisions, including through redundant low-cost sensor deployments. The resulting data characteristics lead to large-scale system design in which small input data errors may lead to severe cascading problems including ultimately wrong decisions. To ensure internal data consistency to mitigate this risk in such IoT environments, fast-paced data fusion and consensus among redundant measurements need to be achieved. In this context, we introduce <em>history-aware sensor fusion</em> powered by <em>accurate voting with clustering</em> as a promising approach to achieve fast and informed consensus, which can converge to the output up to 4X faster than the state of the art history-based voting. Leveraging three case studies, we investigate different voting schemes and show how this approach can improve data accuracy by up to 30% and performance by up to 12% compared to state-of-the-art sensor fusion approaches. We furthermore contribute a specification format for easily deploying our methods in practice and use it to develop a pilot implementation.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0743731524000042/pdfft?md5=e9a4a69cafef41438390a264948d954d&pid=1-s2.0-S0743731524000042-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139471113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Speedup and efficiency of computational parallelization: A unifying approach and asymptotic analysis 计算并行化的速度和效率:统一方法和渐近分析
IF 3.8 3区 计算机科学 Q1 Mathematics Pub Date : 2024-01-10 DOI: 10.1016/j.jpdc.2023.104835
Guido Schryen

In high performance computing environments, we observe an ongoing increase in the available number of cores. For example, the current TOP500 list reveals that nine clusters have more than 1 million cores. This development calls for re-emphasizing performance (scalability) analysis and speedup laws as suggested in the literature (e.g., Amdahl's law and Gustafson's law), with a focus on asymptotic performance. Understanding speedup and efficiency issues of algorithmic parallelism is useful for several purposes, including the optimization of system operations, temporal predictions on the execution of a program, the analysis of asymptotic properties, and the determination of speedup bounds. However, the literature is fragmented and shows a large diversity and heterogeneity of speedup models and laws. These phenomena make it challenging to obtain an overview of the models and their relationships, to identify the determinants of performance in a given algorithmic and computational context, and, finally, to determine the applicability of performance models and laws to a particular parallel computing setting. In this work, I provide a generic speedup (and thus also efficiency) model for homogeneous computing environments. My approach generalizes many prominent models suggested in the literature and allows showing that they can be considered special cases of a unifying approach. The genericity of the unifying speedup model is achieved through parameterization. Considering combinations of parameter ranges, I identify six different asymptotic speedup cases and eight different asymptotic efficiency cases. Jointly applying these speedup and efficiency cases, I derive eleven scalability cases, from which I build a scalability typology. Researchers can draw upon my suggested typology to classify their speedup model and to determine the asymptotic behavior when the number of parallel processing units increases. Also, the description of two computational experiments demonstrates the practical application of the model and the typology. In addition, my results may be used and extended in future research to address various extensions of my setting.

在高性能计算环境中,我们发现可用内核的数量在不断增加。例如,目前的 TOP500 榜单显示,有九个集群的内核数量超过了 100 万。这种发展要求重新强调性能(可扩展性)分析和文献中提出的加速定律(如阿姆达尔定律和古斯塔夫森定律),并重点关注渐进性能。了解算法并行性的加速和效率问题有多种用途,包括系统操作的优化、程序执行的时间预测、渐近特性分析和加速边界的确定。然而,相关文献十分零散,显示出加速模型和规律的巨大多样性和异质性。这些现象使得对模型及其关系进行概述、确定特定算法和计算环境下的性能决定因素,以及最终确定性能模型和定律对特定并行计算环境的适用性变得极具挑战性。在这项工作中,我为同构计算环境提供了一个通用的加速(以及效率)模型。我的方法概括了文献中提出的许多著名模型,并证明它们可以被视为统一方法的特例。统一加速模型的通用性是通过参数化实现的。考虑到参数范围的组合,我确定了六种不同的渐进加速情况和八种不同的渐进效率情况。联合应用这些加速和效率案例,我得出了 11 种可扩展性案例,并从中建立了可扩展性类型学。当并行处理单元数量增加时,研究人员可以借鉴我建议的类型学对其加速模型进行分类,并确定渐进行为。此外,两个计算实验的描述也展示了模型和类型学的实际应用。此外,我的研究成果还可以在未来的研究中使用和扩展,以解决我设定的各种扩展问题。
{"title":"Speedup and efficiency of computational parallelization: A unifying approach and asymptotic analysis","authors":"Guido Schryen","doi":"10.1016/j.jpdc.2023.104835","DOIUrl":"10.1016/j.jpdc.2023.104835","url":null,"abstract":"<div><p>In high performance computing environments, we observe an ongoing increase in the available number of cores. For example, the current TOP500 list reveals that nine clusters have more than 1 million cores. This development calls for re-emphasizing performance (scalability) analysis and speedup laws as suggested in the literature (e.g., Amdahl's law and Gustafson's law), with a focus on asymptotic performance. Understanding speedup and efficiency issues of algorithmic parallelism is useful for several purposes, including the optimization of system operations, temporal predictions on the execution of a program, the analysis of asymptotic properties, and the determination of speedup bounds. However, the literature is fragmented and shows a large diversity and heterogeneity of speedup models and laws. These phenomena make it challenging to obtain an overview of the models and their relationships, to identify the determinants of performance in a given algorithmic and computational context, and, finally, to determine the applicability of performance models and laws to a particular parallel computing setting. In this work, I provide a generic speedup (and thus also efficiency) model for homogeneous computing environments. My approach generalizes many prominent models suggested in the literature and allows showing that they can be considered special cases of a unifying approach. The genericity of the unifying speedup model is achieved through parameterization. Considering combinations of parameter ranges, I identify six different asymptotic speedup cases and eight different asymptotic efficiency cases. Jointly applying these speedup and efficiency cases, I derive eleven scalability cases, from which I build a scalability typology. Researchers can draw upon my suggested typology to classify their speedup model and to determine the asymptotic behavior when the number of parallel processing units increases. Also, the description of two computational experiments demonstrates the practical application of the model and the typology. In addition, my results may be used and extended in future research to address various extensions of my setting.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0743731523002058/pdfft?md5=b1091089b28a3b14d3e6a9e8596be005&pid=1-s2.0-S0743731523002058-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139411882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Proactive auto-scaling technique for web applications in container-based edge computing using federated learning model 利用联合学习模型为基于容器的边缘计算中的网络应用程序提供主动自动缩放技术
IF 3.8 3区 计算机科学 Q1 Mathematics Pub Date : 2024-01-09 DOI: 10.1016/j.jpdc.2024.104837
Javad Dogani, Farshad Khunjush

Edge computing has emerged as an attractive alternative to traditional cloud computing by utilizing processing, network, and storage resources close to end devices, such as Internet of Things (IoT) sensors. Edge computing is still in its infancy, and resource provisioning and service scheduling remain research concerns. Kubernetes is a container orchestration tool in distributed environments. Proactive auto-scaling techniques in Kubernetes improve utilization by allocating resources based on future workload prediction. However, prediction models run on central cloud nodes, necessitating data transfer between edge and cloud nodes, which increases latency and response time. We present FedAvg-BiGRU, a proactive auto-scaling method in edge computing based on FedAvg and multi-step prediction by a Bidirectional Gated Recurrent Unit (BiGRU). FedAvg is a technique for training machine learning models in a Federated Learning (FL) model. FL reduces network traffic by exchanging only model updates rather than raw data, relieving learning models of the need to store data on a centralized cloud server. In addition, a technique for determining the number of Kubernetes pods based on the Cool Down Time (CDT) concept has been developed, preventing contradictory scaling actions. To our knowledge, our work is the first to employ FL for proactive auto-scaling in cloud and edge computing. The results demonstrate that the FedAvg-BiGRU method has a slightly higher prediction error than the load centralized processing mode, although this difference is not statistically significant. At the same time, it reduces the amount of data transmission between the edge nodes and the cloud server.

边缘计算利用靠近终端设备(如物联网(IoT)传感器)的处理、网络和存储资源,已成为传统云计算的一种极具吸引力的替代方案。边缘计算仍处于起步阶段,资源调配和服务调度仍是研究的重点。Kubernetes 是分布式环境中的容器编排工具。Kubernetes 中的主动自动扩展技术可根据未来工作量预测分配资源,从而提高利用率。然而,预测模型在中央云节点上运行,需要在边缘节点和云节点之间传输数据,从而增加了延迟和响应时间。我们提出的 FedAvg-BiGRU 是边缘计算中的一种主动自动缩放方法,它基于 FedAvg 和双向门控循环单元(BiGRU)的多步骤预测。FedAvg 是一种在联邦学习(FL)模型中训练机器学习模型的技术。FL通过只交换模型更新而不是原始数据来减少网络流量,从而使学习模型无需在集中式云服务器上存储数据。此外,我们还开发了一种基于冷却时间(CDT)概念来确定 Kubernetes pod 数量的技术,从而避免了相互矛盾的扩展行为。据我们所知,我们的工作是首次在云计算和边缘计算中采用 FL 进行主动自动扩展。结果表明,FedAvg-BiGRU 方法的预测误差略高于负载集中处理模式,但这种差异在统计上并不显著。同时,它减少了边缘节点与云服务器之间的数据传输量。
{"title":"Proactive auto-scaling technique for web applications in container-based edge computing using federated learning model","authors":"Javad Dogani,&nbsp;Farshad Khunjush","doi":"10.1016/j.jpdc.2024.104837","DOIUrl":"10.1016/j.jpdc.2024.104837","url":null,"abstract":"<div><p>Edge computing has emerged as an attractive alternative to traditional cloud computing by utilizing processing, network, and storage resources close to end devices, such as Internet of Things (IoT) sensors. Edge computing is still in its infancy, and resource provisioning and service scheduling remain research concerns. Kubernetes is a container orchestration tool in distributed environments. Proactive auto-scaling techniques in Kubernetes improve utilization by allocating resources based on future workload prediction. However, prediction models run on central cloud nodes, necessitating data transfer between edge and cloud nodes, which increases latency and response time. We present FedAvg-BiGRU, a proactive auto-scaling method in edge computing based on FedAvg and multi-step prediction by a Bidirectional Gated Recurrent Unit (BiGRU). FedAvg is a technique for training machine learning models in a Federated Learning (FL) model. FL reduces network traffic by exchanging only model updates rather than raw data, relieving learning models of the need to store data on a centralized cloud server. In addition, a technique for determining the number of Kubernetes pods based on the Cool Down Time (CDT) concept has been developed, preventing contradictory scaling actions. To our knowledge, our work is the first to employ FL for proactive auto-scaling in cloud and edge computing. The results demonstrate that the FedAvg-BiGRU method has a slightly higher prediction error than the load centralized processing mode, although this difference is not statistically significant. At the same time, it reduces the amount of data transmission between the edge nodes and the cloud server.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139411974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Parallel and Distributed Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1