Journal of Parallel and Distributed Computing最新文献_第6页

Deep reinforcement learning based controller placement and optimal edge selection in SDN-based multi-access edge computing environments 在基于 SDN 的多接入边缘计算环境中，基于深度强化学习的控制器安置和最优边缘选择

IF 3.4 3区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Parallel and Distributed Computing

Pub Date : 2024-06-27 DOI: 10.1016/j.jpdc.2024.104948

Chunlin Li , Jun Liu , Ning Ma , Qingzhe Zhang , Zhengwei Zhong , Lincheng Jiang , Guolei Jia

Multi-Access Edge Computing (MEC) can provide computility close to the clients to decrease response time and enhance Quality of Service (QoS). However, the complex wireless network consists of various network hardware facilities with different communication protocols and Application Programming Interface (API), which result in the MEC system's high running costs and low running efficiency. To this end, Software-defined networking (SDN) is applied to MEC, which can support access to massive network devices and provide flexible and efficient management. The reasonable SDN controller scheme is crucial to enhance the performance of SDN-assisted MEC. At First, we used the Convolutional Neural Networks (CNN)-Long Short-Term Memory (LSTM) model to predict the network traffic to calculate the load. Then, the optimization objective is formulated by ensuring the load balance and minimizing the system cost. Finally, the Deep Reinforcement Learning (DRL) algorithm is used to obtain the optimal value. Based on the controller placement algorithm ensuring the load balancing, the dynamical edge selection method based on the Channel State Information (CSI) is proposed to optimize the task offloading, and according to CSI, the strategy of task queue execution is designed. Then, the task offloading problem is modeled by using queuing theory. Finally, dynamical edge selection based on Lyapunov's optimization is introduced to get the model solution. In the experiment studies, the assessment method evaluated the performance of two sets of baseline algorithms, including SAPKM, the PSO, the K-means, the LADMA, the LATA, and the OAOP. Compared to the baseline algorithms, the proposed algorithms can effectively reduce the average communication delay and total system energy consumption and improve the utilization of the SDN controller.

多接入边缘计算（MEC）可在客户端附近提供计算能力，从而缩短响应时间并提高服务质量（QoS）。然而，复杂的无线网络由各种网络硬件设施组成，通信协议和应用编程接口（API）各不相同，导致 MEC 系统运行成本高、运行效率低。为此，软件定义网络（SDN）被应用于 MEC，它可以支持海量网络设备的接入，并提供灵活高效的管理。合理的 SDN 控制器方案是提高 SDN 辅助 MEC 性能的关键。首先，我们使用卷积神经网络（CNN）-长短期记忆（LSTM）模型预测网络流量，计算负载。然后，通过确保负载平衡和系统成本最小化来制定优化目标。最后，使用深度强化学习（DRL）算法获得最优值。在确保负载平衡的控制器放置算法基础上，提出了基于信道状态信息（CSI）的动态边缘选择方法来优化任务卸载，并根据 CSI 设计了任务队列执行策略。然后，利用队列理论对任务卸载问题进行建模。最后，引入基于 Lyapunov 优化的动态边缘选择，得到模型解。在实验研究中，评估方法评估了两套基准算法的性能，包括 SAPKM、PSO、K-means、LADMA、LATA 和 OAOP。与基线算法相比，所提出的算法能有效降低平均通信延迟和系统总能耗，提高 SDN 控制器的利用率。

{"title":"Deep reinforcement learning based controller placement and optimal edge selection in SDN-based multi-access edge computing environments","authors":"Chunlin Li , Jun Liu , Ning Ma , Qingzhe Zhang , Zhengwei Zhong , Lincheng Jiang , Guolei Jia","doi":"10.1016/j.jpdc.2024.104948","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104948","url":null,"abstract":"<div><p>Multi-Access Edge Computing (MEC) can provide computility close to the clients to decrease response time and enhance Quality of Service (QoS). However, the complex wireless network consists of various network hardware facilities with different communication protocols and Application Programming Interface (API), which result in the MEC system's high running costs and low running efficiency. To this end, Software-defined networking (SDN) is applied to MEC, which can support access to massive network devices and provide flexible and efficient management. The reasonable SDN controller scheme is crucial to enhance the performance of SDN-assisted MEC. At First, we used the Convolutional Neural Networks (CNN)-Long Short-Term Memory (LSTM) model to predict the network traffic to calculate the load. Then, the optimization objective is formulated by ensuring the load balance and minimizing the system cost. Finally, the Deep Reinforcement Learning (DRL) algorithm is used to obtain the optimal value. Based on the controller placement algorithm ensuring the load balancing, the dynamical edge selection method based on the Channel State Information (CSI) is proposed to optimize the task offloading, and according to CSI, the strategy of task queue execution is designed. Then, the task offloading problem is modeled by using queuing theory. Finally, dynamical edge selection based on Lyapunov's optimization is introduced to get the model solution. In the experiment studies, the assessment method evaluated the performance of two sets of baseline algorithms, including SAPKM, the PSO, the K-means, the LADMA, the LATA, and the OAOP. Compared to the baseline algorithms, the proposed algorithms can effectively reduce the average communication delay and total system energy consumption and improve the utilization of the SDN controller.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"193 ","pages":"Article 104948"},"PeriodicalIF":3.4,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141606834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Experimental evaluation of a multi-installment scheduling strategy based on divisible load paradigm for SAR image reconstruction on a distributed computing infrastructure 对分布式计算基础设施上基于可分割负载范式的合成孔径雷达图像重建多分期调度策略的实验评估

IF 3.4 3区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Parallel and Distributed Computing

Pub Date : 2024-06-26 DOI: 10.1016/j.jpdc.2024.104942

Gokul Madathupalyam Chinnappan , Bharadwaj Veeravalli , Koen Mouthaan , John Wen-Hao Lee

Radar loads, especially Synthetic Aperture Radar (SAR) image reconstruction loads use a large volume of data collected from satellites to create a high-resolution image of the earth. To design near-real-time applications that utilise SAR data, speeding up the image reconstruction algorithm is imperative. This can be achieved by deploying a set of distributed computing infrastructures connected through a network. Scheduling such complex and large divisible loads on a distributed platform can be designed using the Divisible Load Theory (DLT) framework. We performed distributed SAR image reconstruction experiments using the SLURM library on a cloud virtual machine network using two scheduling strategies, namely the Multi-Installment Scheduling with Result Retrieval (MIS-RR) strategy and the traditional EQual-partitioning Strategy (EQS). The DLT model proposed in the MIS-RR strategy is incorporated to make the load divisible. Based on the experimental results and performance analysis carried out using different pixel lengths, pulse set sizes, and the number of virtual machines, we observe that the time performance of MIS-RR is much superior to that of EQS. Hence the MIS-RR strategy is of practical significance in reducing the overall processing time, and cost, and in improving the utilisation of the compute infrastructure. Furthermore, we note that the DLT-based theoretical analysis of MIS-RR coincides well with the experimental data, demonstrating the relevance of DLT in the real world.

雷达载荷，尤其是合成孔径雷达（SAR）图像重建载荷使用从卫星收集的大量数据来创建高分辨率的地球图像。要设计利用合成孔径雷达数据的近实时应用，必须加快图像重建算法的速度。这可以通过部署一组通过网络连接的分布式计算基础设施来实现。在分布式平台上调度这种复杂而庞大的可分割负载，可以使用可分割负载理论（DLT）框架来设计。我们在云虚拟机网络上使用 SLURM 库进行了分布式合成孔径雷达图像重建实验，使用了两种调度策略，即带结果检索的多分期调度（MIS-RR）策略和传统的均衡分区策略（EQS）。MIS-RR 策略中提出的 DLT 模型可使负载可分。根据使用不同像素长度、脉冲集大小和虚拟机数量进行的实验结果和性能分析，我们发现 MIS-RR 的时间性能远远优于 EQS。因此，MIS-RR 策略在减少整体处理时间和成本以及提高计算基础设施的利用率方面具有实际意义。此外，我们注意到，基于 DLT 的 MIS-RR 理论分析与实验数据非常吻合，证明了 DLT 在现实世界中的相关性。

{"title":"Experimental evaluation of a multi-installment scheduling strategy based on divisible load paradigm for SAR image reconstruction on a distributed computing infrastructure","authors":"Gokul Madathupalyam Chinnappan , Bharadwaj Veeravalli , Koen Mouthaan , John Wen-Hao Lee","doi":"10.1016/j.jpdc.2024.104942","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104942","url":null,"abstract":"<div><p>Radar loads, especially Synthetic Aperture Radar (SAR) image reconstruction loads use a large volume of data collected from satellites to create a high-resolution image of the earth. To design near-real-time applications that utilise SAR data, speeding up the image reconstruction algorithm is imperative. This can be achieved by deploying a set of distributed computing infrastructures connected through a network. Scheduling such complex and large divisible loads on a distributed platform can be designed using the Divisible Load Theory (DLT) framework. We performed distributed SAR image reconstruction experiments using the SLURM library on a cloud virtual machine network using two scheduling strategies, namely the Multi-Installment Scheduling with Result Retrieval (MIS-RR) strategy and the traditional EQual-partitioning Strategy (EQS). The DLT model proposed in the MIS-RR strategy is incorporated to make the load divisible. Based on the experimental results and performance analysis carried out using different pixel lengths, pulse set sizes, and the number of virtual machines, we observe that the time performance of MIS-RR is much superior to that of EQS. Hence the MIS-RR strategy is of practical significance in reducing the overall processing time, and cost, and in improving the utilisation of the compute infrastructure. Furthermore, we note that the DLT-based theoretical analysis of MIS-RR coincides well with the experimental data, demonstrating the relevance of DLT in the real world.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"193 ","pages":"Article 104942"},"PeriodicalIF":3.4,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141582082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PPB-MCTS: A novel distributed-memory parallel partial-backpropagation Monte Carlo tree search algorithm PPB-MCTS：新型分布式内存并行部分后向传播蒙特卡洛树搜索算法

IF 3.4 3区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Parallel and Distributed Computing

Pub Date : 2024-06-26 DOI: 10.1016/j.jpdc.2024.104944

Yashar Naderzadeh , Daniel Grosu , Ratna Babu Chinnam

Monte-Carlo Tree Search (MCTS) is an adaptive and heuristic tree-search algorithm designed to uncover sub-optimal actions at each decision-making point. This method progressively constructs a search tree by gathering samples throughout its execution. Predominantly applied within the realm of gaming, MCTS has exhibited exceptional achievements. Additionally, it has displayed promising outcomes when employed to solve NP-hard combinatorial optimization problems. MCTS has been adapted for distributed-memory parallel platforms. The primary challenges associated with distributed-memory parallel MCTS are the substantial communication overhead and the necessity to balance the computational load among various processes. In this work, we introduce a novel distributed-memory parallel MCTS algorithm with partial backpropagations, referred to as Parallel Partial-Backpropagation MCTS (PPB-MCTS). Our design approach aims to significantly reduce the communication overhead while maintaining, or even slightly improving, the performance in the context of combinatorial optimization problems. To address the communication overhead challenge, we propose a strategy involving transmitting an additional backpropagation message. This strategy avoids attaching an information table to the communication messages exchanged by the processes, thus reducing the communication overhead. Furthermore, this approach contributes to enhancing the decision-making accuracy during the selection phase. The load balancing issue is also effectively addressed by implementing a shared transposition table among the parallel processes. Furthermore, we introduce two primary methods for managing duplicate states within distributed-memory parallel MCTS, drawing upon techniques utilized in addressing duplicate states within sequential MCTS. Duplicate states can transform the conventional search tree into a Directed Acyclic Graph (DAG). To evaluate the performance of our proposed parallel algorithm, we conduct an extensive series of experiments on solving instances of the Job-Shop Scheduling Problem (JSSP) and the Weighted Set-Cover Problem (WSCP). These problems are recognized for their complexity and classified as NP-hard combinatorial optimization problems with considerable relevance within industrial applications. The experiments are performed on a cluster of computers with many cores. The empirical results highlight the enhanced scalability of our algorithm compared to that of the existing distributed-memory parallel MCTS algorithms. As the number of processes increases, our algorithm demonstrates increased rollout efficiency while maintaining an improved load balance across processes.

蒙特卡洛树搜索（Monte-Carlo Tree Search，MCTS）是一种自适应的启发式树搜索算法，旨在发现每个决策点的次优行动。这种方法在执行过程中通过收集样本逐步构建搜索树。MCTS 主要应用于游戏领域，取得了卓越的成就。此外，在解决 NP 难度的组合优化问题时，它也取得了可喜的成果。MCTS 适用于分布式内存并行平台。与分布式内存并行 MCTS 相关的主要挑战是巨大的通信开销和在不同进程间平衡计算负载的必要性。在这项工作中，我们介绍了一种新型分布式内存并行 MCTS 算法，该算法采用部分反向传播，被称为并行部分反向传播 MCTS（PPB-MCTS）。我们的设计方法旨在大幅降低通信开销，同时保持甚至略微提高组合优化问题的性能。为解决通信开销难题，我们提出了一种涉及传输额外反向传播信息的策略。这种策略避免了在进程交换的通信信息中附加信息表，从而减少了通信开销。此外，这种方法还有助于提高选择阶段的决策准确性。通过在并行进程间实施共享转置表，负载平衡问题也得到了有效解决。此外，我们还借鉴了处理顺序 MCTS 中重复状态的技术，介绍了在分布式内存并行 MCTS 中管理重复状态的两种主要方法。重复状态会将传统的搜索树转化为有向无环图（DAG）。为了评估我们提出的并行算法的性能，我们在求解工作车间调度问题（JSSP）和加权集合覆盖问题（WSCP）的实例时进行了一系列广泛的实验。这些问题的复杂性是公认的，被归类为 NP-硬组合优化问题，在工业应用中具有相当大的相关性。实验在多核计算机集群上进行。实证结果表明，与现有的分布式内存并行 MCTS 算法相比，我们的算法具有更强的可扩展性。随着进程数量的增加，我们的算法显示出更高的扩展效率，同时保持了各进程之间更好的负载平衡。

{"title":"PPB-MCTS: A novel distributed-memory parallel partial-backpropagation Monte Carlo tree search algorithm","authors":"Yashar Naderzadeh , Daniel Grosu , Ratna Babu Chinnam","doi":"10.1016/j.jpdc.2024.104944","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104944","url":null,"abstract":"<div><p>Monte-Carlo Tree Search (MCTS) is an adaptive and heuristic tree-search algorithm designed to uncover sub-optimal actions at each decision-making point. This method progressively constructs a search tree by gathering samples throughout its execution. Predominantly applied within the realm of gaming, MCTS has exhibited exceptional achievements. Additionally, it has displayed promising outcomes when employed to solve NP-hard combinatorial optimization problems. MCTS has been adapted for distributed-memory parallel platforms. The primary challenges associated with distributed-memory parallel MCTS are the substantial communication overhead and the necessity to balance the computational load among various processes. In this work, we introduce a novel distributed-memory parallel MCTS algorithm with partial backpropagations, referred to as <em>Parallel Partial-Backpropagation MCTS</em> (<span>PPB-MCTS</span>). Our design approach aims to significantly reduce the communication overhead while maintaining, or even slightly improving, the performance in the context of combinatorial optimization problems. To address the communication overhead challenge, we propose a strategy involving transmitting an additional backpropagation message. This strategy avoids attaching an information table to the communication messages exchanged by the processes, thus reducing the communication overhead. Furthermore, this approach contributes to enhancing the decision-making accuracy during the selection phase. The load balancing issue is also effectively addressed by implementing a shared transposition table among the parallel processes. Furthermore, we introduce two primary methods for managing duplicate states within distributed-memory parallel MCTS, drawing upon techniques utilized in addressing duplicate states within sequential MCTS. Duplicate states can transform the conventional search tree into a Directed Acyclic Graph (DAG). To evaluate the performance of our proposed parallel algorithm, we conduct an extensive series of experiments on solving instances of the Job-Shop Scheduling Problem (JSSP) and the Weighted Set-Cover Problem (WSCP). These problems are recognized for their complexity and classified as NP-hard combinatorial optimization problems with considerable relevance within industrial applications. The experiments are performed on a cluster of computers with many cores. The empirical results highlight the enhanced scalability of our algorithm compared to that of the existing distributed-memory parallel MCTS algorithms. As the number of processes increases, our algorithm demonstrates increased rollout efficiency while maintaining an improved load balance across processes.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"193 ","pages":"Article 104944"},"PeriodicalIF":3.4,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141480204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Blockchain-assisted full-session key agreement for secure data sharing in cloud computing 区块链辅助全会话密钥协议促进云计算中的安全数据共享

IF 3.4 3区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Parallel and Distributed Computing

Pub Date : 2024-06-25 DOI: 10.1016/j.jpdc.2024.104943

Yangyang Long , Changgen Peng , Weijie Tan , Yuling Chen

Data sharing in cloud computing allows multiple data owners to freely share their data resources while security and privacy issues remain inevitable challenges. As a foundation of secure communication, authenticated key agreement (AKA) scheme has been recognized as a promising approach to solve such problems. However, most existing AKA schemes are based on the cloud-based architecture, privacy and security issues will inevitably occur once the centralized authority is attacked. Besides, most previous schemes require an online registration authority for authentication, which may consume significant resources. To address these drawbacks, for secure data sharing in cloud computing, a blockchain-assisted full-session key agreement scheme is proposed. After the registration phase, the registration authority does not engage in authentication and key agreement process. By utilizing blockchain technology, a common session key between the remote user and cloud server can be negotiated, and a shared group key among multiple remote users can be negotiated without private information leakage. Formal and informal security proof demonstrated the proposed scheme is able to meet the security and privacy requirements. The detail performance evaluation shows that the proposed scheme has lower computation costs and acceptable communication overheads while superior security is ensured.

云计算中的数据共享允许多个数据所有者自由共享数据资源，但安全和隐私问题仍是不可避免的挑战。作为安全通信的基础，认证密钥协议（AKA）方案被认为是解决此类问题的一种有前途的方法。然而，现有的 AKA 方案大多基于云架构，一旦集中式机构受到攻击，隐私和安全问题将不可避免地出现。此外，以前的方案大多需要在线注册机构进行身份验证，这可能会消耗大量资源。针对这些缺点，为实现云计算中的安全数据共享，提出了一种区块链辅助全会话密钥协议方案。在注册阶段之后，注册机构不参与身份验证和密钥协议过程。利用区块链技术，远程用户和云服务器之间可以协商一个公共会话密钥，多个远程用户之间可以协商一个共享组密钥，而不会泄露私人信息。正式和非正式的安全证明表明，所提出的方案能够满足安全和隐私要求。详细的性能评估表明，所提出的方案具有较低的计算成本和可接受的通信开销，同时确保了卓越的安全性。

{"title":"Blockchain-assisted full-session key agreement for secure data sharing in cloud computing","authors":"Yangyang Long , Changgen Peng , Weijie Tan , Yuling Chen","doi":"10.1016/j.jpdc.2024.104943","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104943","url":null,"abstract":"<div><p>Data sharing in cloud computing allows multiple data owners to freely share their data resources while security and privacy issues remain inevitable challenges. As a foundation of secure communication, authenticated key agreement (AKA) scheme has been recognized as a promising approach to solve such problems. However, most existing AKA schemes are based on the cloud-based architecture, privacy and security issues will inevitably occur once the centralized authority is attacked. Besides, most previous schemes require an online registration authority for authentication, which may consume significant resources. To address these drawbacks, for secure data sharing in cloud computing, a blockchain-assisted full-session key agreement scheme is proposed. After the registration phase, the registration authority does not engage in authentication and key agreement process. By utilizing blockchain technology, a common session key between the remote user and cloud server can be negotiated, and a shared group key among multiple remote users can be negotiated without private information leakage. Formal and informal security proof demonstrated the proposed scheme is able to meet the security and privacy requirements. The detail performance evaluation shows that the proposed scheme has lower computation costs and acceptable communication overheads while superior security is ensured.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"193 ","pages":"Article 104943"},"PeriodicalIF":3.4,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141480203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SpChar: Characterizing the sparse puzzle via decision trees SpChar：通过决策树表征稀疏谜题

IF 3.4 3区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Parallel and Distributed Computing

Pub Date : 2024-06-17 DOI: 10.1016/j.jpdc.2024.104941

Francesco Sgherzi , Marco Siracusa , Ivan Fernandez , Adrià Armejach , Miquel Moretó

Sparse matrix computation is crucial in various modern applications, including large-scale graph analytics, deep learning, and recommender systems. The performance of sparse kernels varies greatly depending on the structure of the input matrix, making it difficult to gain a comprehensive understanding of sparse computation and its relationship to inputs, algorithms, and target machine architecture. Despite extensive research on certain sparse kernels, such as Sparse Matrix-Vector Multiplication (SpMV), the overall family of sparse algorithms has yet to be investigated as a whole. This paper introduces SpChar, a workload characterization methodology for general sparse computation. SpChar employs tree-based models to identify the most relevant hardware and input characteristics, starting from hardware and input-related metrics gathered from Performance Monitoring Counters (PMCs) and matrices. Our analysis enables the creation of a characterization loop that facilitates the optimization of sparse computation by mapping the impact of architectural features to inputs and algorithmic choices. We apply SpChar to more than 600 matrices from the SuiteSparse Matrix collection and three state-of-the-art Arm Central Processing Units (CPUs) to determine the critical hardware and software characteristics that affect sparse computation. In our analysis, we determine that the biggest limiting factors for high-performance sparse computation are (1) the latency of the memory system, (2) the pipeline flush overhead resulting from branch misprediction, and (3) the poor reuse of cached elements. Additionally, we propose software and hardware optimizations that designers can implement to create a platform suitable for sparse computation. We then investigate these optimizations using the gem5 simulator to achieve a significant speedup of up to 2.63× compared to a CPU where the optimizations are not applied.

稀疏矩阵计算在大规模图分析、深度学习和推荐系统等各种现代应用中至关重要。稀疏内核的性能因输入矩阵结构的不同而有很大差异，因此很难全面了解稀疏计算及其与输入、算法和目标机器架构的关系。尽管对某些稀疏内核（如稀疏矩阵-矢量乘法（SpMV））进行了广泛研究，但整个稀疏算法系列仍有待整体研究。本文介绍了 SpChar，这是一种用于一般稀疏计算的工作量表征方法。SpChar 采用基于树的模型，从性能监控计数器 (PMC) 和矩阵中收集的硬件和输入相关指标出发，确定最相关的硬件和输入特征。通过我们的分析，可以创建一个特性循环，将架构特性的影响映射到输入和算法选择上，从而促进稀疏计算的优化。我们将 SpChar 应用于 SuiteSparse Matrix 集合中的 600 多个矩阵和三个最先进的 Arm 中央处理器 (CPU)，以确定影响稀疏计算的关键硬件和软件特性。通过分析，我们确定高性能稀疏计算的最大限制因素是：(1) 内存系统的延迟；(2) 分支错误预测导致的流水线刷新开销；(3) 缓存元素的重复利用率低。此外，我们还提出了软件和硬件优化方案，设计人员可以通过实施这些方案来创建适合稀疏计算的平台。然后，我们使用 gem5 模拟器对这些优化措施进行了研究，结果表明，与未应用优化措施的 CPU 相比，速度显著提高了 2.63 倍。

{"title":"SpChar: Characterizing the sparse puzzle via decision trees","authors":"Francesco Sgherzi , Marco Siracusa , Ivan Fernandez , Adrià Armejach , Miquel Moretó","doi":"10.1016/j.jpdc.2024.104941","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104941","url":null,"abstract":"<div><p>Sparse matrix computation is crucial in various modern applications, including large-scale graph analytics, deep learning, and recommender systems. The performance of sparse kernels varies greatly depending on the structure of the input matrix, making it difficult to gain a comprehensive understanding of sparse computation and its relationship to inputs, algorithms, and target machine architecture. Despite extensive research on certain sparse kernels, such as Sparse Matrix-Vector Multiplication (SpMV), the overall family of sparse algorithms has yet to be investigated as a whole. This paper introduces SpChar, a workload characterization methodology for general sparse computation. SpChar employs tree-based models to identify the most relevant hardware and input characteristics, starting from hardware and input-related metrics gathered from Performance Monitoring Counters (PMCs) and matrices. Our analysis enables the creation of a <em>characterization loop</em> that facilitates the optimization of sparse computation by mapping the impact of architectural features to inputs and algorithmic choices. We apply SpChar to more than 600 matrices from the SuiteSparse Matrix collection and three state-of-the-art Arm Central Processing Units (CPUs) to determine the critical hardware and software characteristics that affect sparse computation. In our analysis, we determine that the biggest limiting factors for high-performance sparse computation are (1) the latency of the memory system, (2) the pipeline flush overhead resulting from branch misprediction, and (3) the poor reuse of cached elements. Additionally, we propose software and hardware optimizations that designers can implement to create a platform suitable for sparse computation. We then investigate these optimizations using the gem5 simulator to achieve a significant speedup of up to 2.63× compared to a CPU where the optimizations are not applied.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"192 ","pages":"Article 104941"},"PeriodicalIF":3.4,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141433984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A multi-level parallel approach to increase the computation efficiency of a global ocean temperature dataset reconstruction 提高全球海洋温度数据集重建计算效率的多级并行方法

IF 3.8 3区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Parallel and Distributed Computing

Pub Date : 2024-06-14 DOI: 10.1016/j.jpdc.2024.104938

Huifeng Yuan , Lijing Cheng , Yuying Pan , Zhetao Tan , Qian Liu , Zhong Jin

There is an increasing need to provide real-time datasets for climate monitoring and applications. However, the current data products from all international groups have at least a month delay for data release. One reason for this delay is the long computing time of the global reconstruction algorithm (so-called mapping approach). To tackle this issue, this paper proposes a multi-level parallel computing model to improve the efficiency of data construction by parallelization of computation, reducing code branch prediction, optimizing data spatial locality, cache utilization, and other measures. This model has been applied to a mapping approach proposed by the Institute of Atmospheric Physics (IAP), one of the world's most widely used data products in the ocean and climate field. Compared with the traditional serial construction of MATLAB-based scheme on a single node, the speed of the construction after parallel optimizations is speeded up by ∼4.7 times. A large-scale parallel experiment of a long-term (∼1000 months) gridded dataset utilizing over 16,000 processor cores proves the model's scalability, improving ∼1200 times. In summary, this new model represents another example of the application of high-performance computing in oceanography and climatology.

为气候监测和应用提供实时数据集的需求日益增加。然而，目前所有国际组织的数据产品至少要延迟一个月才能发布数据。造成这种延迟的原因之一是全局重建算法（即所谓的映射法）的计算时间过长。针对这一问题，本文提出了一种多级并行计算模型，通过计算并行化、减少代码分支预测、优化数据空间位置、缓存利用等措施提高数据构建效率。该模型已应用于大气物理研究所（IAP）提出的制图方法，该方法是世界上海洋和气候领域应用最广泛的数据产品之一。与传统的基于 MATLAB 的单节点串行构建方案相比，并行优化后的构建速度提高了 4.7 倍。利用超过 16,000 个处理器内核对长期（∼1000 个月）网格数据集进行的大规模并行实验证明了该模型的可扩展性，提高了∼1200 倍。总之，这一新模型是高性能计算在海洋学和气候学中应用的又一范例。

{"title":"A multi-level parallel approach to increase the computation efficiency of a global ocean temperature dataset reconstruction","authors":"Huifeng Yuan , Lijing Cheng , Yuying Pan , Zhetao Tan , Qian Liu , Zhong Jin","doi":"10.1016/j.jpdc.2024.104938","DOIUrl":"10.1016/j.jpdc.2024.104938","url":null,"abstract":"<div><p>There is an increasing need to provide real-time datasets for climate monitoring and applications. However, the current data products from all international groups have at least a month delay for data release. One reason for this delay is the long computing time of the global reconstruction algorithm (so-called mapping approach). To tackle this issue, this paper proposes a multi-level parallel computing model to improve the efficiency of data construction by parallelization of computation, reducing code branch prediction, optimizing data spatial locality, cache utilization, and other measures. This model has been applied to a mapping approach proposed by the Institute of Atmospheric Physics (IAP), one of the world's most widely used data products in the ocean and climate field. Compared with the traditional serial construction of MATLAB-based scheme on a single node, the speed of the construction after parallel optimizations is speeded up by ∼4.7 times. A large-scale parallel experiment of a long-term (∼1000 months) gridded dataset utilizing over 16,000 processor cores proves the model's scalability, improving ∼1200 times. In summary, this new model represents another example of the application of high-performance computing in oceanography and climatology.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"192 ","pages":"Article 104938"},"PeriodicalIF":3.8,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141405481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Using hardware-transactional-memory support to implement speculative task execution 利用硬件-事务-内存支持来执行特定任务

IF 3.4 3区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Parallel and Distributed Computing

Pub Date : 2024-06-14 DOI: 10.1016/j.jpdc.2024.104939

Juan Salamanca , Alexandro Baldassin

Loops take up most of the time of computer programs, so optimizing them so that they run in the shortest time possible is a continuous task. However, this task is not negligible; on the contrary, it is an open area of research since many irregular loops are hard to parallelize. Generally, these loops have loop-carried (DOACROSS) dependencies and the appearance of dependencies could depend on the context. Many techniques have been studied to be able to parallelize these loops efficiently; however, for example in the OpenMP standard there is no efficient way to parallelize them. This article presents Speculative Task Execution (STE), a technique that enables the execution of OpenMP tasks in a speculative way to accelerate certain hot-code regions (such as loops) marked by OpenMP directives. It also presents a detailed analysis of the application of Hardware Transactional Memory (HTM) support for executing tasks speculatively and describes a careful evaluation of the implementation of STE using HTM on modern machines. In particular, we consider the scenario in which speculative tasks are generated by the OpenMP taskloop construct (Speculative Taskloop (STL)). As a result, it provides evidence to support several important claims about the performance of STE over HTM in modern processor architectures. Experimental results reveal that: (a) by implementing STL on top of HTM for hot-code regions, speed-ups of up to 5.39× can be obtained in IBM POWER8 and of up to 2.41× in Intel processors using 4 cores; and (b) STL-ROT, a variant of STL using rollback-only transactions (ROTs), achieves speed-ups of up to 17.70× in IBM POWER9 processor using 20 cores.

循环占用了计算机程序的大部分时间，因此优化循环以使其在尽可能短的时间内运行是一项持续性任务。然而，这项任务并非可以忽略不计，相反，它还是一个开放的研究领域，因为许多不规则循环很难并行化。一般来说，这些循环具有循环携带（DOACROSS）依赖性，而依赖性的出现可能取决于上下文。为了能高效地并行处理这些循环，人们研究了许多技术；然而，例如在 OpenMP 标准中，并没有高效的方法来并行处理这些循环。本文介绍了 "投机任务执行"（STE）技术，它能以投机方式执行 OpenMP 任务，以加速 OpenMP 指令标记的某些热代码区域（如循环）。报告还详细分析了硬件事务内存（HTM）支持投机执行任务的应用，并描述了在现代机器上使用 HTM 实现 STE 的细致评估。我们特别考虑了由 OpenMP 任务循环结构（Speculative Taskloop (STL)）生成投机任务的情况。因此，在现代处理器架构中，它为 STE 优于 HTM 性能的几个重要说法提供了证据支持。实验结果表明(a) 通过在 HTM 基础上为热代码区域实施 STL，在使用 4 个内核的 IBM POWER8 处理器中可获得高达 5.39 倍的速度提升，在使用 4 个内核的英特尔处理器中可获得高达 2.41 倍的速度提升；以及 (b) STL-ROT 是 STL 的一种变体，使用仅回滚事务 (ROT)，在使用 20 个内核的 IBM POWER9 处理器中可获得高达 17.70 倍的速度提升。

{"title":"Using hardware-transactional-memory support to implement speculative task execution","authors":"Juan Salamanca , Alexandro Baldassin","doi":"10.1016/j.jpdc.2024.104939","DOIUrl":"10.1016/j.jpdc.2024.104939","url":null,"abstract":"<div><p>Loops take up most of the time of computer programs, so optimizing them so that they run in the shortest time possible is a continuous task. However, this task is not negligible; on the contrary, it is an open area of research since many irregular loops are hard to parallelize. Generally, these loops have loop-carried (DOACROSS) dependencies and the appearance of dependencies could depend on the context. Many techniques have been studied to be able to parallelize these loops efficiently; however, for example in the OpenMP standard there is no efficient way to parallelize them. This article presents Speculative Task Execution (STE), a technique that enables the execution of OpenMP tasks in a speculative way to accelerate certain hot-code regions (such as loops) marked by OpenMP directives. It also presents a detailed analysis of the application of Hardware Transactional Memory (HTM) support for executing tasks speculatively and describes a careful evaluation of the implementation of STE using HTM on modern machines. In particular, we consider the scenario in which speculative tasks are generated by the OpenMP <span>taskloop</span> construct (<em>Speculative Taskloop (STL)</em>). As a result, it provides evidence to support several important claims about the performance of STE over HTM in modern processor architectures. Experimental results reveal that: (a) by implementing STL on top of HTM for hot-code regions, speed-ups of up to 5.39× can be obtained in IBM POWER8 and of up to 2.41× in Intel processors using 4 cores; and (b) STL-ROT, a variant of STL using rollback-only transactions (ROTs), achieves speed-ups of up to 17.70× in IBM POWER9 processor using 20 cores.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"192 ","pages":"Article 104939"},"PeriodicalIF":3.4,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141411270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PA-SPS: A predictive adaptive approach for an elastic stream processing system PA-SPS：弹性流处理系统的预测自适应方法

IF 3.8 3区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Parallel and Distributed Computing

Pub Date : 2024-06-14 DOI: 10.1016/j.jpdc.2024.104940

Daniel Wladdimiro , Luciana Arantes , Pierre Sens , Nicolás Hidalgo

Stream Processing Systems (SPSs) dynamically process input events. Since the input is usually not a constant flow, presenting rate fluctuations, many works in the literature propose to dynamically replicate SPS operators, aiming at reducing the processing bottleneck induced by such fluctuations. However, these SPSs do not consider the problem of load balancing of the replicas or the cost involved in reconfiguring the system whenever the number of replicas changes. We present in this paper a predictive model which, based on input rate variation, execution time of operators, and queued events, dynamically defines the necessary current number of replicas of each operator. A predictor, composed of different models (i.e., mathematical and Machine Learning ones), predicts the input rate. We also propose a Storm-based SPS, named PA-SPS, which uses such a predictive model, not requiring reboot reconfiguration when the number of operators replica change. PA-SPS also implements a load balancer that distributes incoming events evenly among replicas of an operator. We have conducted experiments on Google Cloud Platform (GCP) for evaluation PA-SPS using real traffic traces of different applications and also compared it with Storm and other existing SPSs.

流处理系统（SPS）可动态处理输入事件。由于输入通常不是恒定流，会出现速率波动，因此许多文献建议动态复制 SPS 操作员，以减少这种波动引起的处理瓶颈。然而，这些 SPS 并没有考虑复制的负载平衡问题，也没有考虑在复制数量发生变化时重新配置系统所涉及的成本。我们在本文中提出了一个预测模型，该模型基于输入率变化、操作员执行时间和排队事件，动态定义每个操作员当前所需的副本数量。由不同模型（即数学模型和机器学习模型）组成的预测器可预测输入率。我们还提出了一种基于 Storm 的 SPS，名为 PA-SPS，它使用这种预测模型，在操作员副本数量发生变化时不需要重启重新配置。PA-SPS 还实现了一个负载平衡器，可在操作员副本之间平均分配传入事件。我们在谷歌云平台（GCP）上使用不同应用的真实流量轨迹对 PA-SPS 进行了评估实验，并将其与 Storm 和其他现有 SPS 进行了比较。

{"title":"PA-SPS: A predictive adaptive approach for an elastic stream processing system","authors":"Daniel Wladdimiro , Luciana Arantes , Pierre Sens , Nicolás Hidalgo","doi":"10.1016/j.jpdc.2024.104940","DOIUrl":"10.1016/j.jpdc.2024.104940","url":null,"abstract":"<div><p>Stream Processing Systems (SPSs) dynamically process input events. Since the input is usually not a constant flow, presenting rate fluctuations, many works in the literature propose to dynamically replicate SPS operators, aiming at reducing the processing bottleneck induced by such fluctuations. However, these SPSs do not consider the problem of load balancing of the replicas or the cost involved in reconfiguring the system whenever the number of replicas changes. We present in this paper a predictive model which, based on input rate variation, execution time of operators, and queued events, dynamically defines the necessary current number of replicas of each operator. A predictor, composed of different models (i.e., mathematical and Machine Learning ones), predicts the input rate. We also propose a Storm-based SPS, named PA-SPS, which uses such a predictive model, not requiring reboot reconfiguration when the number of operators replica change. PA-SPS also implements a load balancer that distributes incoming events evenly among replicas of an operator. We have conducted experiments on Google Cloud Platform (GCP) for evaluation PA-SPS using real traffic traces of different applications and also compared it with Storm and other existing SPSs.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"192 ","pages":"Article 104940"},"PeriodicalIF":3.8,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141401072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Meta-Fed IDS: Meta-learning and Federated learning based fog-cloud approach to detect known and zero-day cyber attacks in IoMT networks 元喂养 IDS：基于元学习和联合学习的雾云方法，用于检测 IoMT 网络中的已知和零日网络攻击

IF 3.8 3区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Parallel and Distributed Computing

Pub Date : 2024-06-05 DOI: 10.1016/j.jpdc.2024.104934

Umer Zukaib , Xiaohui Cui , Chengliang Zheng , Dong Liang , Salah Ud Din

The Internet of Medical Things (IoMT) is a transformative fusion of medical sensors, equipment, and the Internet of Things, positioned to transform healthcare. However, security and privacy concerns hinder widespread IoMT adoption, intensified by the scarcity of high-quality datasets for developing effective security solutions. Addressing these challenges, we propose a novel framework for cyberattack detection in dynamic IoMT networks. This framework integrates Federated Learning with Meta-learning, employing a multi-phase architecture for identifying known attacks, and incorporates advanced clustering and biased classifiers to address zero-day attacks. The framework's deployment is adaptable to dynamic and diverse environments, utilizing an Infrastructure-as-a-Service (IaaS) model on the cloud and a Software-as-a-Service (SaaS) model on the fog end. To reflect real-world scenarios, we introduce a specialized IoMT dataset. Our experimental results indicate high accuracy and low misclassification rates, demonstrating the framework's capability in detecting cyber threats in complex IoMT environments. This approach shows significant promise in bolstering cybersecurity in advanced healthcare technologies.

医疗物联网（IoMT）是医疗传感器、设备和物联网的变革性融合，将改变医疗保健行业。然而，安全和隐私问题阻碍了 IoMT 的广泛应用，而用于开发有效安全解决方案的高质量数据集的稀缺又加剧了这一问题。为了应对这些挑战，我们提出了一种用于动态物联网技术网络中网络攻击检测的新型框架。该框架将联邦学习与元学习相结合，采用多阶段架构来识别已知攻击，并结合先进的聚类和偏差分类器来应对零日攻击。该框架的部署可适应动态和多样化的环境，在云端采用基础设施即服务（IaaS）模式，在雾端采用软件即服务（SaaS）模式。为了反映真实世界的场景，我们引入了专门的 IoMT 数据集。实验结果表明，该框架具有较高的准确率和较低的误分类率，证明了其在复杂的 IoMT 环境中检测网络威胁的能力。这种方法在加强先进医疗保健技术的网络安全方面大有可为。

{"title":"Meta-Fed IDS: Meta-learning and Federated learning based fog-cloud approach to detect known and zero-day cyber attacks in IoMT networks","authors":"Umer Zukaib , Xiaohui Cui , Chengliang Zheng , Dong Liang , Salah Ud Din","doi":"10.1016/j.jpdc.2024.104934","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104934","url":null,"abstract":"<div><p>The Internet of Medical Things (IoMT) is a transformative fusion of medical sensors, equipment, and the Internet of Things, positioned to transform healthcare. However, security and privacy concerns hinder widespread IoMT adoption, intensified by the scarcity of high-quality datasets for developing effective security solutions. Addressing these challenges, we propose a novel framework for cyberattack detection in dynamic IoMT networks. This framework integrates Federated Learning with Meta-learning, employing a multi-phase architecture for identifying known attacks, and incorporates advanced clustering and biased classifiers to address zero-day attacks. The framework's deployment is adaptable to dynamic and diverse environments, utilizing an Infrastructure-as-a-Service (IaaS) model on the cloud and a Software-as-a-Service (SaaS) model on the fog end. To reflect real-world scenarios, we introduce a specialized IoMT dataset. Our experimental results indicate high accuracy and low misclassification rates, demonstrating the framework's capability in detecting cyber threats in complex IoMT environments. This approach shows significant promise in bolstering cybersecurity in advanced healthcare technologies.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"192 ","pages":"Article 104934"},"PeriodicalIF":3.8,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141302619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DRACO: Distributed Resource-aware Admission Control for large-scale, multi-tier systems DRACO：面向大规模多层系统的分布式资源感知准入控制

IF 3.8 3区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Parallel and Distributed Computing

Pub Date : 2024-06-04 DOI: 10.1016/j.jpdc.2024.104935

Domenico Cotroneo, Roberto Natella, Stefano Rosiello

Modern distributed systems are designed to manage overload conditions, by throttling the traffic in excess that cannot be served through overload control techniques. However, the adoption of large-scale NoSQL datastores make systems vulnerable to unbalanced overloads, where specific datastore nodes are overloaded because of hot-spot resources and hogs. In this paper, we propose DRACO, a novel overload control solution that is aware of data dependencies between the application and the datastore tiers. DRACO performs selective admission control of application requests, by only dropping the ones that map to resources on overloaded datastore nodes, while achieving high resource utilization on non-overloaded datastore nodes. We evaluate DRACO on two case studies with high availability and performance requirements, a virtualized IP Multimedia Subsystem and a distributed fileserver. Results show that the solution can achieve high performance and resource utilization even under extreme overload conditions, up to 100x the engineered capacity.

现代分布式系统设计用于管理过载情况，通过过载控制技术对无法提供服务的过量流量进行节流。然而，大规模 NoSQL 数据存储的采用使系统容易受到不平衡过载的影响，在这种情况下，特定的数据存储节点会因为热点资源和占用而过载。在本文中，我们提出了一种新颖的过载控制解决方案 DRACO，它能感知应用程序和数据存储层之间的数据依赖关系。DRACO 对应用请求执行选择性准入控制，只丢弃映射到过载数据存储节点资源的请求，同时实现非过载数据存储节点的高资源利用率。我们在两个具有高可用性和高性能要求的案例研究（虚拟化 IP 多媒体子系统和分布式文件服务器）中对 DRACO 进行了评估。结果表明，即使在极端过载条件下，该解决方案也能实现高性能和资源利用率，最高可达设计容量的 100 倍。

{"title":"DRACO: Distributed Resource-aware Admission Control for large-scale, multi-tier systems","authors":"Domenico Cotroneo, Roberto Natella, Stefano Rosiello","doi":"10.1016/j.jpdc.2024.104935","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104935","url":null,"abstract":"<div><p>Modern distributed systems are designed to manage overload conditions, by throttling the traffic in excess that cannot be served through <em>overload control</em> techniques. However, the adoption of large-scale NoSQL datastores make systems vulnerable to <em>unbalanced overloads</em>, where specific datastore nodes are overloaded because of hot-spot resources and hogs. In this paper, we propose DRACO, a novel overload control solution that is aware of data dependencies between the application and the datastore tiers. DRACO performs selective admission control of application requests, by only dropping the ones that map to resources on overloaded datastore nodes, while achieving high resource utilization on non-overloaded datastore nodes. We evaluate DRACO on two case studies with high availability and performance requirements, a virtualized IP Multimedia Subsystem and a distributed fileserver. Results show that the solution can achieve high performance and resource utilization even under extreme overload conditions, up to 100x the engineered capacity.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"192 ","pages":"Article 104935"},"PeriodicalIF":3.8,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0743731524000996/pdfft?md5=47aadc5c325c36c8ff181fd763795f30&pid=1-s2.0-S0743731524000996-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141291810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0