Pub Date : 2024-02-01DOI: 10.1016/j.jpdc.2024.104863
A. Benoit, T. Hérault, Lucas Perotin, Yves Robert, F. Vivien
{"title":"Revisiting I/O bandwidth-sharing strategies for HPC applications","authors":"A. Benoit, T. Hérault, Lucas Perotin, Yves Robert, F. Vivien","doi":"10.1016/j.jpdc.2024.104863","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104863","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139818636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-29DOI: 10.1016/j.jpdc.2024.104849
Achilleas Santi Seisa, Björn Lindqvist, Sumeet Gajanan Satpute, George Nikolakopoulos
In this article, we present an edge-based architecture for enhancing the autonomous capabilities of resource-constrained aerial robots by enabling a remote nonlinear model predictive control scheme, which can be computationally heavy to run on the aerial robots' onboard processors. The nonlinear model predictive control is used to control the trajectory of an unmanned aerial vehicle while detecting, and preventing potential collisions. The proposed edge architecture enables trajectory recalculation for resource-constrained unmanned aerial vehicles in relatively real-time, which will allow them to have fully autonomous behaviors. The architecture is implemented with a remote Kubernetes cluster on the edge side, and it is evaluated on an unmanned aerial vehicle as our controllable robot, while the robotic operating system is used for managing the source codes, and overall communication. With the utilization of edge computing and the architecture presented in this work, we can overcome computational limitations, that resource-constrained robots have, and provide or improve features that are essential for autonomous missions. At the same time, we can minimize the relative travel time delays for time-critical missions over the edge, in comparison to the cloud. We investigate the validity of this hypothesis by evaluating the system's behavior through a series of experiments by utilizing either the unmanned aerial vehicle or the edge resources for the collision avoidance mission.
{"title":"An edge architecture for enabling autonomous aerial navigation with embedded collision avoidance through remote nonlinear model predictive control","authors":"Achilleas Santi Seisa, Björn Lindqvist, Sumeet Gajanan Satpute, George Nikolakopoulos","doi":"10.1016/j.jpdc.2024.104849","DOIUrl":"10.1016/j.jpdc.2024.104849","url":null,"abstract":"<div><p>In this article, we present an edge-based architecture for enhancing the autonomous capabilities of resource-constrained aerial robots by enabling a remote nonlinear model predictive control scheme, which can be computationally heavy to run on the aerial robots' onboard processors. The nonlinear model predictive control is used to control the trajectory of an unmanned aerial vehicle while detecting, and preventing potential collisions. The proposed edge architecture enables trajectory recalculation for resource-constrained unmanned aerial vehicles in relatively real-time, which will allow them to have fully autonomous behaviors. The architecture is implemented with a remote Kubernetes cluster on the edge side, and it is evaluated on an unmanned aerial vehicle as our controllable robot, while the robotic operating system is used for managing the source codes, and overall communication. With the utilization of edge computing and the architecture presented in this work, we can overcome computational limitations, that resource-constrained robots have, and provide or improve features that are essential for autonomous missions. At the same time, we can minimize the relative travel time delays for time-critical missions over the edge, in comparison to the cloud. We investigate the validity of this hypothesis by evaluating the system's behavior through a series of experiments by utilizing either the unmanned aerial vehicle or the edge resources for the collision avoidance mission.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0743731524000133/pdfft?md5=169e2b20b28c91c01823d3205e5e5fe7&pid=1-s2.0-S0743731524000133-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139588496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-26DOI: 10.1016/j.jpdc.2024.104848
Changzhen Zhang, Jun Yang, Ning Wang
In Wireless Sensor Networks (WSNs), the packet congestion will lead to high delay and high packet loss rate, which severely affects the timely transmission of real-time packets. As a congestion control method, Random Early Detection (RED) is able to stabilize the queue length at a low level. However, it does not classify the data of WSNs to achieve a targeted queue management. Since real-time packets are more urgent and important than non-real-time packets, differential packets scheduling and queue management are necessary. To deal with these problems, we propose an Active Queue Management (AQM) method called Classified Enhanced Random Early Detection (CERED). In CERED, the preemption priority is conferred on real-time packets, and the queue management with enhanced initial drop probability is implemented for non-real-time packets. Next, we develop a preemptive priority M/M/1/C vacation queueing model with queue management to evaluate the proposed method, and the finite-state matrix geometry method is used to solve the stationary distribution of the queueing model. Then we formulate a non-linear integer programming problem for the minimum delay of real-time packets, which subjects to constraints on the steady state and system cost. Finally, a numerical example is given to show the effectiveness of the proposed method.
{"title":"An active queue management for wireless sensor networks with priority scheduling strategy","authors":"Changzhen Zhang, Jun Yang, Ning Wang","doi":"10.1016/j.jpdc.2024.104848","DOIUrl":"10.1016/j.jpdc.2024.104848","url":null,"abstract":"<div><p><span><span>In Wireless Sensor Networks (WSNs), the packet congestion will lead to high delay and high </span>packet loss<span> rate, which severely affects the timely transmission of real-time packets. As a congestion control method<span><span>, Random Early Detection (RED) is able to stabilize the queue length at a low level. However, it does not classify the data of WSNs to achieve a targeted queue management. Since real-time packets are more urgent and important than non-real-time packets, differential </span>packets scheduling<span> and queue management are necessary. To deal with these problems, we propose an Active Queue Management (AQM) method called Classified Enhanced Random Early Detection (CERED). In CERED, the preemption priority is conferred on real-time packets, and the queue management with enhanced initial drop probability is implemented for non-real-time packets. Next, we develop a preemptive priority M/M/1/</span></span></span></span><em>C</em><span> vacation queueing model with queue management to evaluate the proposed method, and the finite-state matrix geometry method is used to solve the stationary distribution of the queueing model. Then we formulate a non-linear integer programming problem for the minimum delay of real-time packets, which subjects to constraints on the steady state and system cost. Finally, a numerical example is given to show the effectiveness of the proposed method.</span></p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139588416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-22DOI: 10.1016/j.jpdc.2024.104847
Minhaj Ahmad Khan , Raihan ur Rasool
A cloud computing environment processes user workloads or tasks by exploiting its high performance computational, storage, of reducing and network resources. The virtual machines in the cloud environment are allocated to tasks with the aim of reducing overall execution time. The use of high performance resources incurs monetary costs, as well as high power consumption. The heuristic based approaches implemented for scheduling tasks are unable to cope with the complexity of optimizing multiple parameters. In this paper, we propose a multi-objective grey-wolf optimization based algorithm for scheduling tasks on cloud platforms. The proposed algorithm targets to minimize schedule length (overall execution time), energy consumption, and monetary cost required for executing tasks. For optimization, the algorithm incorporates steps that are performed iteratively for mimicking the behavior of grey wolves attacking their prey. It uses discrete values for positioning wolves for encircling and attacking the prey. The assignment of tasks to virtual machines is performed using the solution found after multi-objective optimization that incorporates weighted sorting for arranging solutions. Our experimentation performed using the CloudSim framework shows that the proposed algorithm outperforms other algorithms with performance improvement ranging from 3.98% to 16.07%, while considering the schedule length, monetary cost, and energy consumption.
{"title":"A multi-objective grey-wolf optimization based approach for scheduling on cloud platforms","authors":"Minhaj Ahmad Khan , Raihan ur Rasool","doi":"10.1016/j.jpdc.2024.104847","DOIUrl":"10.1016/j.jpdc.2024.104847","url":null,"abstract":"<div><p><span>A cloud computing environment processes user workloads or tasks by exploiting its high performance computational, storage, of reducing and network resources. The virtual machines in the cloud environment are allocated to tasks with the aim of reducing overall execution time. The use of high performance resources incurs monetary costs, as well as high </span>power consumption. The heuristic based approaches implemented for scheduling tasks are unable to cope with the complexity of optimizing multiple parameters. In this paper, we propose a multi-objective grey-wolf optimization based algorithm for scheduling tasks on cloud platforms. The proposed algorithm targets to minimize schedule length (overall execution time), energy consumption, and monetary cost required for executing tasks. For optimization, the algorithm incorporates steps that are performed iteratively for mimicking the behavior of grey wolves attacking their prey. It uses discrete values for positioning wolves for encircling and attacking the prey. The assignment of tasks to virtual machines is performed using the solution found after multi-objective optimization that incorporates weighted sorting for arranging solutions. Our experimentation performed using the CloudSim framework shows that the proposed algorithm outperforms other algorithms with performance improvement ranging from 3.98% to 16.07%, while considering the schedule length, monetary cost, and energy consumption.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139553778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-17DOI: 10.1016/j.jpdc.2024.104839
Chunyu Mao , Wojciech Golab , Bernard Wong
Classical Paxos-like consensus protocols limit system scalability due to a single leader and the inability to process conflicting proposals in parallel. We introduce a novel agreement protocol, called Antipaxos, that instead reaches agreement on a collection of proposals using an efficient leaderless fast path when the environment is synchronous and failure-free, and falls back on a more elaborate slow path to handle other cases. We first specify the main safety property of Antipaxos by formalizing a new agreement problem called k-Interactive Consistency (k-IC). Then, we present a solution to this problem in the Byzantine failure model. We prove safety and liveness, and also present an experimental performance evaluation in the Amazon cloud. Our experiments show that Antipaxos achieves several-fold higher failure-free peak throughput than Mir-BFT. The inherent efficiency of our approach stems from the low message complexity of the fast path: agreement on n batches of conflict-prone proposals is achieved using only messages in one consensus cycle, or amortized messages per batch.
{"title":"Antipaxos: Taking interactive consistency to the next level","authors":"Chunyu Mao , Wojciech Golab , Bernard Wong","doi":"10.1016/j.jpdc.2024.104839","DOIUrl":"10.1016/j.jpdc.2024.104839","url":null,"abstract":"<div><p>Classical Paxos-like consensus protocols limit system scalability due to a single leader and the inability to process conflicting proposals in parallel. We introduce a novel agreement protocol, called Antipaxos, that instead reaches agreement on a collection of proposals using an efficient leaderless fast path when the environment is synchronous and failure-free, and falls back on a more elaborate slow path to handle other cases. We first specify the main safety property of Antipaxos by formalizing a new agreement problem called <em>k</em>-<em>Interactive Consistency</em> (<em>k</em>-<em>IC</em>). Then, we present a solution to this problem in the Byzantine failure model. We prove safety and liveness, and also present an experimental performance evaluation in the Amazon cloud. Our experiments show that Antipaxos achieves several-fold higher failure-free peak throughput than Mir-BFT. The inherent efficiency of our approach stems from the low message complexity of the fast path: agreement on <em>n</em> batches of conflict-prone proposals is achieved using only <span><math><mi>Θ</mi><mo>(</mo><msup><mrow><mi>n</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>)</mo></math></span> messages in one consensus cycle, or <span><math><mi>Θ</mi><mo>(</mo><mi>n</mi><mo>)</mo></math></span> amortized messages per batch.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0743731524000030/pdfft?md5=56b477c7bd90e57e09dc0b78ca7891dc&pid=1-s2.0-S0743731524000030-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139495260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-16DOI: 10.1016/S0743-7315(24)00007-8
{"title":"Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)","authors":"","doi":"10.1016/S0743-7315(24)00007-8","DOIUrl":"https://doi.org/10.1016/S0743-7315(24)00007-8","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0743731524000078/pdfft?md5=a3e8019093c0d8c91175061677e5cf8e&pid=1-s2.0-S0743731524000078-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139479948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-15DOI: 10.1016/j.jpdc.2024.104838
Samiran Kawtikwar , Rakesh Nagi
The Linear Assignment Problem (LAP) is a fundamental combinatorial optimization problem with a wide range of applications. Over the years, significant progress has been made in developing efficient algorithms to solve the LAP, particularly in the realm of high-performance computing, leading to remarkable reductions in computation time. In recent years, hardware improvements in General Purpose Graphics Processing Units (GPGPUs) have shown promise in meeting the ever-increasing compute bandwidth requirements. This has attracted researchers to develop GPU-accelerated algorithms to solve the LAP.
Recent work in the GPU domain has uncovered parallelism available in the problem structure to achieve significant performance improvements. However, each solution presented so far targets either sparse or dense instances of the problem and has some scope for improvement. The Hungarian algorithm is one of the most famous approaches to solving the LAP in polynomial time. Hungarian algorithm has classical (Munkres') and tree based (Lawler's) implementations. It is well established that the Munkres' implementation is faster for sparse LAP instances while the Lawler's implementation is faster for dense instances. In this work, we blend the GPU implementations of Munkres' and Lawler's to develop a Hybrid GPU-accelerated solver for LAP that switches between the two implementations based on available sparsity. Also, we improve the existing GPU implementations to reduce memory contention, minimize CPU-GPU synchronizations, and coalesced memory access. The resulting solver (HyLAC) works faster than existing CPU/GPU LAP solvers for sparse as well as dense problem instances. HyLAC achieves a speedup of up to 6.14× over existing state-of-the-art GPU implementation when run on the same hardware. We also develop an implementation to solve a list of small LAPs (tiled LAP), which is particularly useful in the optimization domain. This tiled LAP solver performs 22.59× faster than the existing implementation.
线性赋值问题(LAP)是一个应用广泛的基本组合优化问题。多年来,在开发解决线性赋值问题的高效算法方面取得了重大进展,特别是在高性能计算领域,计算时间显著缩短。近年来,通用图形处理器(GPGPU)的硬件改进在满足不断增长的计算带宽需求方面显示出了前景。这吸引了研究人员开发 GPU 加速算法来求解 LAP。最近在 GPU 领域开展的工作发现了问题结构中的并行性,从而显著提高了性能。不过,迄今为止提出的每种解决方案都针对问题的稀疏或密集实例,还有一定的改进空间。匈牙利算法是在多项式时间内求解 LAP 的最著名方法之一。匈牙利算法有经典的 O(N4)(Munkres's)和基于树的 O(N3)(Lawler's)实现。众所周知,对于稀疏的 LAP 实例,Munkres 算法的实现速度更快,而对于密集的实例,Lawler 算法的实现速度更快。在这项工作中,我们融合了 Munkres 和 Lawler 的 GPU 实现,为 LAP 开发了一种混合 GPU 加速求解器,可根据可用稀疏度在两种实现之间切换。此外,我们还改进了现有的 GPU 实现,以减少内存争用、最小化 CPU-GPU 同步和凝聚内存访问。由此产生的求解器(HyLAC)在稀疏和密集问题实例上的运行速度都比现有的 CPU/GPU LAP 求解器快。在相同硬件上运行时,HyLAC 比现有最先进的 GPU 实现速度提高了 6.14 倍。我们还开发了一种用于求解小型 LAP 列表(平铺 LAP)的实现方法,这在优化领域特别有用。这种平铺 LAP 求解器的性能比现有实现快 22.59 倍。
{"title":"HyLAC: Hybrid linear assignment solver in CUDA","authors":"Samiran Kawtikwar , Rakesh Nagi","doi":"10.1016/j.jpdc.2024.104838","DOIUrl":"10.1016/j.jpdc.2024.104838","url":null,"abstract":"<div><p>The Linear Assignment Problem (LAP) is a fundamental combinatorial optimization problem with a wide range of applications. Over the years, significant progress has been made in developing efficient algorithms to solve the LAP, particularly in the realm of high-performance computing, leading to remarkable reductions in computation time. In recent years, hardware improvements in General Purpose Graphics Processing Units (GPGPUs) have shown promise in meeting the ever-increasing compute bandwidth requirements. This has attracted researchers to develop GPU-accelerated algorithms to solve the LAP.</p><p>Recent work in the GPU domain has uncovered parallelism available in the problem structure to achieve significant performance improvements. However, each solution presented so far targets either sparse or dense instances of the problem and has some scope for improvement. The Hungarian algorithm is one of the most famous approaches to solving the LAP in polynomial time. Hungarian algorithm has classical <span><math><mi>O</mi><mo>(</mo><msup><mrow><mi>N</mi></mrow><mrow><mn>4</mn></mrow></msup><mo>)</mo></math></span> (<em>Munkres'</em>) and tree based <span><math><mi>O</mi><mo>(</mo><msup><mrow><mi>N</mi></mrow><mrow><mn>3</mn></mrow></msup><mo>)</mo></math></span> (<em>Lawler's</em>) implementations. It is well established that the <em>Munkres'</em> implementation is faster for sparse LAP instances while the <em>Lawler's</em> implementation is faster for dense instances. In this work, we blend the GPU implementations of <em>Munkres'</em> and <em>Lawler's</em> to develop a Hybrid GPU-accelerated solver for LAP that switches between the two implementations based on available sparsity. Also, we improve the existing GPU implementations to reduce memory contention, minimize CPU-GPU synchronizations, and coalesced memory access. The resulting solver (HyLAC) works faster than existing CPU/GPU LAP solvers for sparse as well as dense problem instances. HyLAC achieves a speedup of up to 6.14× over existing state-of-the-art GPU implementation when run on the same hardware. We also develop an implementation to solve a list of small LAPs (tiled LAP), which is particularly useful in the optimization domain. This tiled LAP solver performs 22.59× faster than the existing implementation.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0743731524000029/pdfft?md5=2d6ab397b84658306f77d4de8c57dfed&pid=1-s2.0-S0743731524000029-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139475615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-15DOI: 10.1016/j.jpdc.2024.104840
Panagiotis Gkikopoulos , Peter Kropf , Valerio Schiavoni , Josef Spillner
Societies and legislations are moving towards automated decision-making based on measured data in safety-critical environments. Over the next years, density and frequency of measurements will increase to generate more insights and get a more solid basis for decisions, including through redundant low-cost sensor deployments. The resulting data characteristics lead to large-scale system design in which small input data errors may lead to severe cascading problems including ultimately wrong decisions. To ensure internal data consistency to mitigate this risk in such IoT environments, fast-paced data fusion and consensus among redundant measurements need to be achieved. In this context, we introduce history-aware sensor fusion powered by accurate voting with clustering as a promising approach to achieve fast and informed consensus, which can converge to the output up to 4X faster than the state of the art history-based voting. Leveraging three case studies, we investigate different voting schemes and show how this approach can improve data accuracy by up to 30% and performance by up to 12% compared to state-of-the-art sensor fusion approaches. We furthermore contribute a specification format for easily deploying our methods in practice and use it to develop a pilot implementation.
{"title":"Reliable IoT analytics at scale","authors":"Panagiotis Gkikopoulos , Peter Kropf , Valerio Schiavoni , Josef Spillner","doi":"10.1016/j.jpdc.2024.104840","DOIUrl":"10.1016/j.jpdc.2024.104840","url":null,"abstract":"<div><p>Societies and legislations are moving towards automated decision-making based on measured data in safety-critical environments. Over the next years, density and frequency of measurements will increase to generate more insights and get a more solid basis for decisions, including through redundant low-cost sensor deployments. The resulting data characteristics lead to large-scale system design in which small input data errors may lead to severe cascading problems including ultimately wrong decisions. To ensure internal data consistency to mitigate this risk in such IoT environments, fast-paced data fusion and consensus among redundant measurements need to be achieved. In this context, we introduce <em>history-aware sensor fusion</em> powered by <em>accurate voting with clustering</em> as a promising approach to achieve fast and informed consensus, which can converge to the output up to 4X faster than the state of the art history-based voting. Leveraging three case studies, we investigate different voting schemes and show how this approach can improve data accuracy by up to 30% and performance by up to 12% compared to state-of-the-art sensor fusion approaches. We furthermore contribute a specification format for easily deploying our methods in practice and use it to develop a pilot implementation.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0743731524000042/pdfft?md5=e9a4a69cafef41438390a264948d954d&pid=1-s2.0-S0743731524000042-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139471113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-10DOI: 10.1016/j.jpdc.2023.104835
Guido Schryen
In high performance computing environments, we observe an ongoing increase in the available number of cores. For example, the current TOP500 list reveals that nine clusters have more than 1 million cores. This development calls for re-emphasizing performance (scalability) analysis and speedup laws as suggested in the literature (e.g., Amdahl's law and Gustafson's law), with a focus on asymptotic performance. Understanding speedup and efficiency issues of algorithmic parallelism is useful for several purposes, including the optimization of system operations, temporal predictions on the execution of a program, the analysis of asymptotic properties, and the determination of speedup bounds. However, the literature is fragmented and shows a large diversity and heterogeneity of speedup models and laws. These phenomena make it challenging to obtain an overview of the models and their relationships, to identify the determinants of performance in a given algorithmic and computational context, and, finally, to determine the applicability of performance models and laws to a particular parallel computing setting. In this work, I provide a generic speedup (and thus also efficiency) model for homogeneous computing environments. My approach generalizes many prominent models suggested in the literature and allows showing that they can be considered special cases of a unifying approach. The genericity of the unifying speedup model is achieved through parameterization. Considering combinations of parameter ranges, I identify six different asymptotic speedup cases and eight different asymptotic efficiency cases. Jointly applying these speedup and efficiency cases, I derive eleven scalability cases, from which I build a scalability typology. Researchers can draw upon my suggested typology to classify their speedup model and to determine the asymptotic behavior when the number of parallel processing units increases. Also, the description of two computational experiments demonstrates the practical application of the model and the typology. In addition, my results may be used and extended in future research to address various extensions of my setting.
{"title":"Speedup and efficiency of computational parallelization: A unifying approach and asymptotic analysis","authors":"Guido Schryen","doi":"10.1016/j.jpdc.2023.104835","DOIUrl":"10.1016/j.jpdc.2023.104835","url":null,"abstract":"<div><p>In high performance computing environments, we observe an ongoing increase in the available number of cores. For example, the current TOP500 list reveals that nine clusters have more than 1 million cores. This development calls for re-emphasizing performance (scalability) analysis and speedup laws as suggested in the literature (e.g., Amdahl's law and Gustafson's law), with a focus on asymptotic performance. Understanding speedup and efficiency issues of algorithmic parallelism is useful for several purposes, including the optimization of system operations, temporal predictions on the execution of a program, the analysis of asymptotic properties, and the determination of speedup bounds. However, the literature is fragmented and shows a large diversity and heterogeneity of speedup models and laws. These phenomena make it challenging to obtain an overview of the models and their relationships, to identify the determinants of performance in a given algorithmic and computational context, and, finally, to determine the applicability of performance models and laws to a particular parallel computing setting. In this work, I provide a generic speedup (and thus also efficiency) model for homogeneous computing environments. My approach generalizes many prominent models suggested in the literature and allows showing that they can be considered special cases of a unifying approach. The genericity of the unifying speedup model is achieved through parameterization. Considering combinations of parameter ranges, I identify six different asymptotic speedup cases and eight different asymptotic efficiency cases. Jointly applying these speedup and efficiency cases, I derive eleven scalability cases, from which I build a scalability typology. Researchers can draw upon my suggested typology to classify their speedup model and to determine the asymptotic behavior when the number of parallel processing units increases. Also, the description of two computational experiments demonstrates the practical application of the model and the typology. In addition, my results may be used and extended in future research to address various extensions of my setting.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0743731523002058/pdfft?md5=b1091089b28a3b14d3e6a9e8596be005&pid=1-s2.0-S0743731523002058-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139411882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-09DOI: 10.1016/j.jpdc.2024.104837
Javad Dogani, Farshad Khunjush
Edge computing has emerged as an attractive alternative to traditional cloud computing by utilizing processing, network, and storage resources close to end devices, such as Internet of Things (IoT) sensors. Edge computing is still in its infancy, and resource provisioning and service scheduling remain research concerns. Kubernetes is a container orchestration tool in distributed environments. Proactive auto-scaling techniques in Kubernetes improve utilization by allocating resources based on future workload prediction. However, prediction models run on central cloud nodes, necessitating data transfer between edge and cloud nodes, which increases latency and response time. We present FedAvg-BiGRU, a proactive auto-scaling method in edge computing based on FedAvg and multi-step prediction by a Bidirectional Gated Recurrent Unit (BiGRU). FedAvg is a technique for training machine learning models in a Federated Learning (FL) model. FL reduces network traffic by exchanging only model updates rather than raw data, relieving learning models of the need to store data on a centralized cloud server. In addition, a technique for determining the number of Kubernetes pods based on the Cool Down Time (CDT) concept has been developed, preventing contradictory scaling actions. To our knowledge, our work is the first to employ FL for proactive auto-scaling in cloud and edge computing. The results demonstrate that the FedAvg-BiGRU method has a slightly higher prediction error than the load centralized processing mode, although this difference is not statistically significant. At the same time, it reduces the amount of data transmission between the edge nodes and the cloud server.
{"title":"Proactive auto-scaling technique for web applications in container-based edge computing using federated learning model","authors":"Javad Dogani, Farshad Khunjush","doi":"10.1016/j.jpdc.2024.104837","DOIUrl":"10.1016/j.jpdc.2024.104837","url":null,"abstract":"<div><p>Edge computing has emerged as an attractive alternative to traditional cloud computing by utilizing processing, network, and storage resources close to end devices, such as Internet of Things (IoT) sensors. Edge computing is still in its infancy, and resource provisioning and service scheduling remain research concerns. Kubernetes is a container orchestration tool in distributed environments. Proactive auto-scaling techniques in Kubernetes improve utilization by allocating resources based on future workload prediction. However, prediction models run on central cloud nodes, necessitating data transfer between edge and cloud nodes, which increases latency and response time. We present FedAvg-BiGRU, a proactive auto-scaling method in edge computing based on FedAvg and multi-step prediction by a Bidirectional Gated Recurrent Unit (BiGRU). FedAvg is a technique for training machine learning models in a Federated Learning (FL) model. FL reduces network traffic by exchanging only model updates rather than raw data, relieving learning models of the need to store data on a centralized cloud server. In addition, a technique for determining the number of Kubernetes pods based on the Cool Down Time (CDT) concept has been developed, preventing contradictory scaling actions. To our knowledge, our work is the first to employ FL for proactive auto-scaling in cloud and edge computing. The results demonstrate that the FedAvg-BiGRU method has a slightly higher prediction error than the load centralized processing mode, although this difference is not statistically significant. At the same time, it reduces the amount of data transmission between the edge nodes and the cloud server.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139411974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}