Journal of Parallel and Distributed Computing最新文献_第4页

B2DFL: Bringing butterfly to decentralized federated learning assisted with blockchain B2DFL：为区块链辅助的分散式联合学习带来蝴蝶效应

IF 3.4 3区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Parallel and Distributed Computing

Pub Date : 2024-09-16 DOI: 10.1016/j.jpdc.2024.104978

Hao Wang , Yichen Cai , Yu Tao , Luyao Wang , Yanbin Li , Lu Zhou

We propose a novel decentralized federated learning framework called B2DFL. It decomposes the aggregation process of vanilla FL into layered and serialized sub-aggregation processes and offloads the communication and computation from a single point to distributed nodes, thus addressing the single point of failure issue in centralized FL. The decentralization of B2DFL is based on the Butterfly, a distributed network topology, to organize and orchestrate the order and rules of node aggregation. Additionally, to mitigate potential risks such as dropouts or tampering, we leverage the blockchain and IPFS systems. Specifically, after each node completes its computation (including training and aggregation), it generates a hash value of the results as proof. We maintain a Tamper-evident Data Structure (TDS) on the blockchain, which records these proofs to ensure tamper-proofing and fast verification. To reduce the storage burden on the blockchain and improve throughput, we store the aggregated results on IPFS, a system that enables quick data location through hash values of data, for data backup. We also design a node replacement mechanism for quick dropout handling. We conduct a comprehensive performance evaluation and experimental results demonstrate that B2DFL presents a significant performance improvement while achieving privacy and decentralization.

我们提出了一种名为 B2DFL 的新型分散式联合学习框架。它将虚幻 FL 的聚合过程分解为分层和序列化的子聚合过程，并将通信和计算从单点卸载到分布式节点，从而解决了集中式 FL 中的单点故障问题。B2DFL 的去中心化基于分布式网络拓扑结构 Butterfly，以组织和协调节点聚合的顺序和规则。此外，为了降低掉线或篡改等潜在风险，我们还利用了区块链和 IPFS 系统。具体来说，每个节点完成计算（包括训练和聚合）后，都会生成结果的哈希值作为证明。我们在区块链上维护一个防篡改数据结构（TDS），记录这些证明，以确保防篡改和快速验证。为了减轻区块链的存储负担并提高吞吐量，我们将汇总结果存储在 IPFS 上，该系统可通过数据的哈希值快速定位数据，以便进行数据备份。我们还设计了一种节点替换机制，用于快速处理掉链问题。我们进行了全面的性能评估，实验结果表明，B2DFL 在实现隐私和去中心化的同时，还显著提高了性能。

{"title":"B2DFL: Bringing butterfly to decentralized federated learning assisted with blockchain","authors":"Hao Wang , Yichen Cai , Yu Tao , Luyao Wang , Yanbin Li , Lu Zhou","doi":"10.1016/j.jpdc.2024.104978","DOIUrl":"10.1016/j.jpdc.2024.104978","url":null,"abstract":"<div><p>We propose a novel decentralized federated learning framework called B2DFL. It decomposes the aggregation process of vanilla FL into layered and serialized sub-aggregation processes and offloads the communication and computation from a single point to distributed nodes, thus addressing the single point of failure issue in centralized FL. The decentralization of B2DFL is based on the Butterfly, a distributed network topology, to organize and orchestrate the order and rules of node aggregation. Additionally, to mitigate potential risks such as dropouts or tampering, we leverage the blockchain and IPFS systems. Specifically, after each node completes its computation (including training and aggregation), it generates a hash value of the results as proof. We maintain a Tamper-evident Data Structure (TDS) on the blockchain, which records these proofs to ensure tamper-proofing and fast verification. To reduce the storage burden on the blockchain and improve throughput, we store the aggregated results on IPFS, a system that enables quick data location through hash values of data, for data backup. We also design a node replacement mechanism for quick dropout handling. We conduct a comprehensive performance evaluation and experimental results demonstrate that B2DFL presents a significant performance improvement while achieving privacy and decentralization.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"195 ","pages":"Article 104978"},"PeriodicalIF":3.4,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142242362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Accelerating Fortran codes: A method for integrating Coarray Fortran with CUDA Fortran and OpenMP 加速 Fortran 代码：将 Coarray Fortran 与 CUDA Fortran 和 OpenMP 集成的方法

IF 3.4 3区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Parallel and Distributed Computing

Pub Date : 2024-09-06 DOI: 10.1016/j.jpdc.2024.104977

James McKevitt , Eduard I. Vorobyov , Igor Kulikov

Fortran's prominence in scientific computing requires strategies to ensure both that legacy codes are efficient on high-performance computing systems, and that the language remains attractive for the development of new high-performance codes. Coarray Fortran (CAF), part of the Fortran 2008 standard introduced for parallel programming, facilitates distributed memory parallelism with a syntax familiar to Fortran programmers, simplifying the transition from single-processor to multi-processor coding. This research focuses on innovating and refining a parallel programming methodology that fuses the strengths of Intel Coarray Fortran, Nvidia CUDA Fortran, and OpenMP for distributed memory parallelism, high-speed GPU acceleration and shared memory parallelism respectively. We consider the management of pageable and pinned memory, CPU-GPU affinity in NUMA multiprocessors, and robust compiler interfacing with speed optimisation. We demonstrate our method through its application to a parallelised Poisson solver and compare the methodology, implementation, and scaling performance to that of the Message Passing Interface (MPI), finding CAF offers similar speeds with easier implementation. For new codes, this approach offers a faster route to optimised parallel computing. For legacy codes, it eases the transition to parallel computing, allowing their transformation into scalable, high-performance computing applications without the need for extensive re-design or additional syntax.

Fortran 在科学计算领域的突出地位要求我们采取策略，既要确保传统代码在高性能计算系统上的效率，又要确保该语言对开发新的高性能代码保持吸引力。Coarray Fortran（CAF）是为并行编程引入的 Fortran 2008 标准的一部分，它以 Fortran 程序员熟悉的语法促进了分布式内存并行性，简化了从单处理器到多处理器编码的过渡。本研究的重点是创新和完善一种并行编程方法，它融合了英特尔 Coarray Fortran、Nvidia CUDA Fortran 和 OpenMP 在分布式内存并行、高速 GPU 加速和共享内存并行方面的优势。我们考虑了可分页内存和针式内存的管理、NUMA 多核处理器中 CPU-GPU 的亲和性以及编译器与速度优化的稳健接口。我们将我们的方法应用于并行泊松求解器，并与消息传递接口（MPI）的方法、实现和扩展性能进行了比较，发现 CAF 提供了类似的速度，且更易于实现。对于新代码而言，这种方法为优化并行计算提供了更快的途径。对于传统代码来说，它简化了向并行计算的过渡，使其能够转变为可扩展的高性能计算应用，而无需大量的重新设计或额外的语法。

{"title":"Accelerating Fortran codes: A method for integrating Coarray Fortran with CUDA Fortran and OpenMP","authors":"James McKevitt , Eduard I. Vorobyov , Igor Kulikov","doi":"10.1016/j.jpdc.2024.104977","DOIUrl":"10.1016/j.jpdc.2024.104977","url":null,"abstract":"<div><p>Fortran's prominence in scientific computing requires strategies to ensure both that legacy codes are efficient on high-performance computing systems, and that the language remains attractive for the development of new high-performance codes. Coarray Fortran (CAF), part of the Fortran 2008 standard introduced for parallel programming, facilitates distributed memory parallelism with a syntax familiar to Fortran programmers, simplifying the transition from single-processor to multi-processor coding. This research focuses on innovating and refining a parallel programming methodology that fuses the strengths of Intel Coarray Fortran, Nvidia CUDA Fortran, and OpenMP for distributed memory parallelism, high-speed GPU acceleration and shared memory parallelism respectively. We consider the management of pageable and pinned memory, CPU-GPU affinity in NUMA multiprocessors, and robust compiler interfacing with speed optimisation. We demonstrate our method through its application to a parallelised Poisson solver and compare the methodology, implementation, and scaling performance to that of the Message Passing Interface (MPI), finding CAF offers similar speeds with easier implementation. For new codes, this approach offers a faster route to optimised parallel computing. For legacy codes, it eases the transition to parallel computing, allowing their transformation into scalable, high-performance computing applications without the need for extensive re-design or additional syntax.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"195 ","pages":"Article 104977"},"PeriodicalIF":3.4,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0743731524001412/pdfft?md5=69e1ea2ba9c62d46ed1506e701029846&pid=1-s2.0-S0743731524001412-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142172595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues) 封面 1 - 完整扉页（常规期刊）/特刊扉页（特刊）

IF 3.4 3区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Parallel and Distributed Computing

Pub Date : 2024-08-24 DOI: 10.1016/S0743-7315(24)00136-9

引用次数: 0

Clustering-based multi-objective optimization considering fairness for multi-workflow scheduling on clouds 基于聚类的多目标优化，考虑云上多工作流调度的公平性

IF 3.4 3区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Parallel and Distributed Computing

Pub Date : 2024-08-23 DOI: 10.1016/j.jpdc.2024.104968

Feng Li , Wen Jun Tan , Moon Gi Seok , Wentong Cai

Distributed computing, such as cloud computing, provides promising platforms for orchestrating scientific workflows' tasks based on their sequences and dependencies. Workflow scheduling plays an important role in optimizing concerned objectives for distributed computing, such as minimizing the makespan and cost. Many researchers have focused on optimizing a specific single workflow with multiple objectives. Currently, there are few studies on multi-workflow scheduling, with most research focusing on objectives such as cost and makespan. However, multi-workflow scheduling requires the design of specific objectives that reflect the unique characteristics of multiple workflows. On the other hand, clustering-based approaches have garnered significant attention in the field of workflow scheduling over distributed computing resources due to their advantage in reducing data communication among tasks. Despite this, the effectiveness of clustering-based algorithms has not been extensively studied and validated in the context of multi-objective multi-workflow scheduling models. Motivated by these factors, we propose an approach for multiple workflows' multi-objective optimization (MOO), considering the new defined metric, fairness. We first mathematically formulate the fairness and define a fairness-involved MOO model. Then, we propose an advanced clustering-based resource optimization strategy in multiple workflow runs. Experimental results show that the proposed approach performs better than the compared algorithms without significant compromise of the overall makespan and cost as well as individual fairness, which can guide the simulation workflow scheduling on clouds.

云计算等分布式计算为根据序列和依赖关系协调科学工作流任务提供了前景广阔的平台。工作流调度在优化分布式计算的相关目标（如最小化时间跨度和成本）方面发挥着重要作用。许多研究人员专注于优化具有多个目标的特定单一工作流。目前，关于多工作流调度的研究很少，大多数研究都集中在成本和有效期等目标上。然而，多工作流调度需要设计特定的目标，以反映多个工作流的独特特征。另一方面，基于聚类的方法在减少任务间数据通信方面具有优势，因此在分布式计算资源上的工作流调度领域备受关注。尽管如此，基于聚类的算法在多目标多工作流调度模型中的有效性还没有得到广泛的研究和验证。在这些因素的推动下，我们提出了一种多工作流多目标优化（MOO）方法，并考虑了新定义的指标--公平性。我们首先从数学角度阐述了公平性，并定义了一个涉及公平性的 MOO 模型。然后，我们在多个工作流运行中提出了一种先进的基于聚类的资源优化策略。实验结果表明，所提方法的性能优于同类算法，且不会明显影响整体工期和成本以及个体公平性，可为云上的仿真工作流调度提供指导。

{"title":"Clustering-based multi-objective optimization considering fairness for multi-workflow scheduling on clouds","authors":"Feng Li , Wen Jun Tan , Moon Gi Seok , Wentong Cai","doi":"10.1016/j.jpdc.2024.104968","DOIUrl":"10.1016/j.jpdc.2024.104968","url":null,"abstract":"<div><p>Distributed computing, such as cloud computing, provides promising platforms for orchestrating scientific workflows' tasks based on their sequences and dependencies. Workflow scheduling plays an important role in optimizing concerned objectives for distributed computing, such as minimizing the makespan and cost. Many researchers have focused on optimizing a specific single workflow with multiple objectives. Currently, there are few studies on multi-workflow scheduling, with most research focusing on objectives such as cost and makespan. However, multi-workflow scheduling requires the design of specific objectives that reflect the unique characteristics of multiple workflows. On the other hand, clustering-based approaches have garnered significant attention in the field of workflow scheduling over distributed computing resources due to their advantage in reducing data communication among tasks. Despite this, the effectiveness of clustering-based algorithms has not been extensively studied and validated in the context of multi-objective multi-workflow scheduling models. Motivated by these factors, we propose an approach for multiple workflows' multi-objective optimization (MOO), considering the new defined metric, fairness. We first mathematically formulate the fairness and define a fairness-involved MOO model. Then, we propose an advanced clustering-based resource optimization strategy in multiple workflow runs. Experimental results show that the proposed approach performs better than the compared algorithms without significant compromise of the overall makespan and cost as well as individual fairness, which can guide the simulation workflow scheduling on clouds.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"194 ","pages":"Article 104968"},"PeriodicalIF":3.4,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142122720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

StarPlat: A versatile DSL for graph analytics StarPlat：图形分析的通用 DSL

IF 3.4 3区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Parallel and Distributed Computing

Pub Date : 2024-08-14 DOI: 10.1016/j.jpdc.2024.104967

Nibedita Behera, Ashwina Kumar, Ebenezer Rajadurai T, Sai Nitish, Rajesh Pandian M, Rupesh Nasre

Graphs model several real-world phenomena. With the growth of unstructured and semi-structured data, parallelization of graph algorithms is inevitable. Unfortunately, due to inherent irregularity of computation, memory access, and communication, graph algorithms are traditionally challenging to parallelize. To tame this challenge, several libraries, frameworks, and domain-specific languages (DSLs) have been proposed to reduce the parallel programming burden of the users, who are often domain experts. However, existing frameworks to model graph algorithms typically target a single architecture. In this paper, we present a graph DSL, named StarPlat, that allows programmers to specify graph algorithms in a high-level format, but generates code for three different backends from the same algorithmic specification. In particular, the DSL compiler generates OpenMP for multi-core systems, MPI for distributed systems, and CUDA for many-core GPUs. Since these three are completely different parallel programming paradigms, binding them together under the same language is challenging. We share our experience with the language design. Central to our compiler is an intermediate representation which allows a common representation of the high-level program, from which individual backend code generations begin. We demonstrate the expressiveness of StarPlat by specifying four graph algorithms: betweenness centrality computation, page rank computation, single-source shortest paths, and triangle counting. Using a suite of ten large graphs, we illustrate the effectiveness of our approach by comparing the performance of the generated codes with that obtained with hand-crafted library codes. We find that the generated code is competitive to library-based codes in many cases. More importantly, we show the feasibility to generate efficient codes for different target architectures from the same algorithmic specification of graph algorithms.

图是若干现实世界现象的模型。随着非结构化和半结构化数据的增长，图算法的并行化不可避免。遗憾的是，由于计算、内存访问和通信本身的不规则性，图算法的并行化历来具有挑战性。为了应对这一挑战，人们提出了一些库、框架和特定领域语言（DSL），以减轻用户（通常是领域专家）的并行编程负担。然而，现有的图算法建模框架通常只针对单一架构。在本文中，我们介绍了一种名为 StarPlat 的图 DSL，它允许程序员以高级格式指定图算法，但可根据相同的算法规范生成适用于三种不同后端的代码。特别是，DSL 编译器可为多核系统生成 OpenMP，为分布式系统生成 MPI，为多核 GPU 生成 CUDA。由于这三种并行编程范式完全不同，将它们绑定在同一种语言下具有挑战性。我们将分享我们在语言设计方面的经验。我们的编译器的核心是一种中间表示法，它允许对高级程序进行通用表示，并以此为基础开始生成各个后端代码。我们通过指定四种图算法来展示 StarPlat 的表现力：间度中心性计算、页等级计算、单源最短路径和三角形计数。我们使用一套十个大型图，通过比较生成代码与手工库代码的性能，说明了我们方法的有效性。我们发现，生成的代码在很多情况下都能与基于库的代码相媲美。更重要的是，我们展示了通过相同的图算法规范为不同目标架构生成高效代码的可行性。

{"title":"StarPlat: A versatile DSL for graph analytics","authors":"Nibedita Behera, Ashwina Kumar, Ebenezer Rajadurai T, Sai Nitish, Rajesh Pandian M, Rupesh Nasre","doi":"10.1016/j.jpdc.2024.104967","DOIUrl":"10.1016/j.jpdc.2024.104967","url":null,"abstract":"<div><p>Graphs model several real-world phenomena. With the growth of unstructured and semi-structured data, parallelization of graph algorithms is inevitable. Unfortunately, due to inherent irregularity of computation, memory access, and communication, graph algorithms are traditionally challenging to parallelize. To tame this challenge, several libraries, frameworks, and domain-specific languages (DSLs) have been proposed to reduce the parallel programming burden of the users, who are often domain experts. However, existing frameworks to model graph algorithms typically target a single architecture. In this paper, we present a graph DSL, named StarPlat, that allows programmers to specify graph algorithms in a high-level format, but generates code for three different backends from the same algorithmic specification. In particular, the DSL compiler generates OpenMP for multi-core systems, MPI for distributed systems, and CUDA for many-core GPUs. Since these three are completely different parallel programming paradigms, binding them together under the same language is challenging. We share our experience with the language design. Central to our compiler is an intermediate representation which allows a common representation of the high-level program, from which individual backend code generations begin. We demonstrate the expressiveness of StarPlat by specifying four graph algorithms: betweenness centrality computation, page rank computation, single-source shortest paths, and triangle counting. Using a suite of ten large graphs, we illustrate the effectiveness of our approach by comparing the performance of the generated codes with that obtained with hand-crafted library codes. We find that the generated code is competitive to library-based codes in many cases. More importantly, we show the feasibility to generate efficient codes for different target architectures from the same algorithmic specification of graph algorithms.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"194 ","pages":"Article 104967"},"PeriodicalIF":3.4,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142044301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MapReduce algorithms for robust center-based clustering in doubling metrics 基于中心聚类的稳健加倍度量 MapReduce 算法

IF 3.4 3区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Parallel and Distributed Computing

Pub Date : 2024-08-02 DOI: 10.1016/j.jpdc.2024.104966

Enrico Dandolo , Alessio Mazzetto , Andrea Pietracaprina , Geppino Pucci

Clustering is a pivotal primitive for unsupervised learning and data analysis. A popular variant is the $(k, ℓ)$ -clustering problem, where, given a pointset P from a metric space, one must determine a subset S of k centers minimizing the sum of the ℓ-th powers of the distances of points in P from their closest centers. This formulation covers the well-studied k-median ( $ℓ = 1$ ) and k-means ( $ℓ = 2$ ) clustering problems. A more general variant, introduced to deal with noisy pointsets, features a further parameter z and allows up to z points of P (outliers) to be disregarded when computing the sum. We present a distributed coreset-based 3-round approximation algorithm for the $(k, ℓ)$ -clustering problem with z outliers, using MapReduce as a computational model. An important feature of our algorithm is that it obliviously adapts to the intrinsic complexity of the dataset, captured by its doubling dimension D. Remarkably, for $D = O (1)$ , our algorithm requires sublinear local memory per reducer, and yields a solution whose approximation ratio is an additive term $O (γ)$ away from the one achievable by the best known sequential (possibly bicriteria) algorithm, where γ can be made arbitrarily small. To the best of our knowledge, no previous distributed approaches were able to attain similar quality-performance tradeoffs for metrics with constant doubling dimension.

聚类是无监督学习和数据分析的关键基础。聚类问题是一个流行的变体，在这个问题中，给定一个度量空间中的点集，必须确定一个中心子集，该中心子集应使各点与其最近中心的距离的-次幂之和最小。这种表述方式涵盖了已被广泛研究的-中值（）和-均值（）聚类问题。为了处理嘈杂的点集，我们引入了一种更通用的变体，其特点是增加了一个参数，并允许在计算总和时忽略（离群值）最多的点。我们以 MapReduce 为计算模型，针对有异常值的聚类问题提出了一种基于分布式核心集的三轮近似算法。我们算法的一个重要特点是，它能无意识地适应数据集的内在复杂性，而数据集的内在复杂性是由其翻倍维度所决定的。值得注意的是，对于，我们的算法每个还原器需要亚线性本地内存，并产生一个解决方案，其近似率与已知最佳顺序算法（可能是双标准算法）的近似率相差一个加法项，而后者的近似率可以任意变小。据我们所知，以前没有一种分布式方法能对具有恒定加倍维度的指标实现类似的质量-性能权衡。

{"title":"MapReduce algorithms for robust center-based clustering in doubling metrics","authors":"Enrico Dandolo , Alessio Mazzetto , Andrea Pietracaprina , Geppino Pucci","doi":"10.1016/j.jpdc.2024.104966","DOIUrl":"10.1016/j.jpdc.2024.104966","url":null,"abstract":"<div><p>Clustering is a pivotal primitive for unsupervised learning and data analysis. A popular variant is the <span><math><mo>(</mo><mi>k</mi><mo>,</mo><mi>ℓ</mi><mo>)</mo></math></span>-clustering problem, where, given a pointset <em>P</em> from a metric space, one must determine a subset <em>S</em> of <em>k</em> centers minimizing the sum of the <em>ℓ</em>-th powers of the distances of points in <em>P</em> from their closest centers. This formulation covers the well-studied <em>k</em>-median (<span><math><mi>ℓ</mi><mo>=</mo><mn>1</mn></math></span>) and <em>k</em>-means (<span><math><mi>ℓ</mi><mo>=</mo><mn>2</mn></math></span>) clustering problems. A more general variant, introduced to deal with noisy pointsets, features a further parameter <em>z</em> and allows up to <em>z</em> points of <em>P</em> (outliers) to be disregarded when computing the sum. We present a distributed coreset-based 3-round approximation algorithm for the <span><math><mo>(</mo><mi>k</mi><mo>,</mo><mi>ℓ</mi><mo>)</mo></math></span>-clustering problem with <em>z</em> outliers, using MapReduce as a computational model. An important feature of our algorithm is that it obliviously adapts to the intrinsic complexity of the dataset, captured by its doubling dimension <em>D</em>. Remarkably, for <span><math><mi>D</mi><mo>=</mo><mi>O</mi><mrow><mo>(</mo><mn>1</mn><mo>)</mo></mrow></math></span>, our algorithm requires sublinear local memory per reducer, and yields a solution whose approximation ratio is an additive term <span><math><mi>O</mi><mo>(</mo><mi>γ</mi><mo>)</mo></math></span> away from the one achievable by the best known sequential (possibly bicriteria) algorithm, where <em>γ</em> can be made arbitrarily small. To the best of our knowledge, no previous distributed approaches were able to attain similar quality-performance tradeoffs for metrics with constant doubling dimension.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"194 ","pages":"Article 104966"},"PeriodicalIF":3.4,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0743731524001308/pdfft?md5=cb18e100c10527217dd5c5739d4b41d9&pid=1-s2.0-S0743731524001308-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141939722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Accelerating memory and I/O intensive HPC applications using hardware compression 利用硬件压缩加速内存和 I/O 密集型高性能计算应用

IF 3.4 3区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Parallel and Distributed Computing

Pub Date : 2024-07-23 DOI: 10.1016/j.jpdc.2024.104955

Saleh AlSaleh , Muhammad E.S. Elrabaa , Aiman El-Maleh , Khaled Daud , Ayman Hroub , Muhamed Mudawar , Thierry Tonellot

Recently, accelerator-based compression/decompression was proposed to hide the storage latency of high-performance computing (HPC) applications that generate/ingest large data that cannot fit a node's memory. In this work, such a scheme has been implemented using a novel FPGA-based lossy compression/decompression scheme that has very low-latency. The proposed scheme completely overlaps the movement of the application's data with its compute kernels on the CPU with minimal impact on these kernels. Experiments showed that it can yield performance levels on-par with utilizing memory-only storage buffers, even though data is actually stored on disk. Experiments also showed that compared to CPU- and GPU-based compression frameworks, it achieves better performance levels at a fraction of the power consumption.

最近，有人提出了基于加速器的压缩/解压缩方案，以隐藏高性能计算（HPC）应用的存储延迟，这些应用会生成/测试无法容纳节点内存的大型数据。在这项工作中，这种方案采用了一种新颖的基于 FPGA 的有损压缩/解压缩方案，具有非常低的延迟。建议的方案将应用数据的移动与 CPU 上的计算内核完全重叠，对这些内核的影响最小。实验表明，尽管数据实际上存储在磁盘上，但该方案的性能水平与仅使用内存存储缓冲区的方案相当。实验还表明，与基于 CPU 和 GPU 的压缩框架相比，它能以极低的功耗实现更高的性能水平。

引用次数: 0

Federated Bayesian optimization XGBoost model for cyberattack detection in internet of medical things 用于医疗物联网网络攻击检测的联合贝叶斯优化 XGBoost 模型

IF 3.4 3区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Parallel and Distributed Computing

Pub Date : 2024-07-23 DOI: 10.1016/j.jpdc.2024.104964

Blessing Guembe , Sanjay Misra , Ambrose Azeta

Background

Hospitals and medical facilities are increasingly concerned about network security and patient data privacy as the Internet of Medical Things (IoMT) infrastructures continue to develop. Researchers have studied customized network security frameworks and cyberattack detection tools driven by Artificial Intelligence (AI) to counter different types of attacks, such as spoofing, data alteration, and botnet attacks. However, carrying out routine IoMT services and tasks during an under-attack scenario is challenging. Machine Learning has been extensively suggested for detecting cyberattacks in IoMT and IoT infrastructures. However, the conventional centralized approach in ML cannot effectively detect newly emerging attacks without compromising patient data privacy and network flow data confidentiality.

Aim

This study discusses a Federated Bayesian Optimization XGBoost framework that employs multimodal sensory signals from patient vital signs and network flow data to detect attack patterns and malicious network traffic in IoMT infrastructure while ensuring data privacy and detecting previously unknown attacks.

Methodology

The proposed model employs a Federated Bayesian Optimisation XGBoost approach, which allows us to search the parameter space quickly and find an optimal solution from each local server while aggregating the model parameters from each local server to the centralised server. The XGBoost algorithm generates a new tree by taking into account the previously estimated value for the tree's input data and then optimizing the prediction gain. This study used a dataset with 44 attributes and 16 318 instances. During the preprocessing phase, 10 features were dropped, and the remaining 34 features were used to evaluate the network flows and biometric data (patient vital signs).

Results

The performance evaluation reveals that the proposed model predicts data alteration, malware, and spoofing attacks in patients' vital signs and network flow data with a prediction accuracy of 0.96. The results obtained from the experiment demonstrate that both the centralized and federated models are synchronized, with the latter occasionally being slightly reduced.

Conclusion

The findings indicate that the suggested model can be incorporated into the IoMT domain to detect malicious patterns while maintaining data privacy and confidentiality efficiently.

随着医疗物联网（IoMT）基础设施的不断发展，医院和医疗机构越来越关注网络安全和患者数据隐私。研究人员研究了由人工智能（AI）驱动的定制网络安全框架和网络攻击检测工具，以应对不同类型的攻击，如欺骗、数据篡改和僵尸网络攻击。然而，在受到攻击的情况下执行常规 IoMT 服务和任务是一项挑战。机器学习已被广泛用于检测 IoMT 和 IoT 基础设施中的网络攻击。然而，传统的集中式 ML 方法无法在不损害患者数据隐私和网络流数据保密性的情况下有效检测新出现的攻击。本研究讨论了一种联邦贝叶斯优化 XGBoost 框架，该框架利用来自患者生命体征和网络流数据的多模态感官信号来检测 IoMT 基础设施中的攻击模式和恶意网络流量，同时确保数据隐私并检测以前未知的攻击。所提出的模型采用了联邦贝叶斯优化 XGBoost 方法，该方法允许我们快速搜索参数空间，并从每个本地服务器找到最佳解决方案，同时将每个本地服务器的模型参数汇总到中央服务器。XGBoost 算法通过考虑树的输入数据的先前估计值生成新树，然后优化预测增益。本研究使用了一个包含 44 个属性和 16 318 个实例的数据集。在预处理阶段，删除了 10 个特征，其余 34 个特征用于评估网络流量和生物特征数据（患者生命体征）。性能评估结果表明，所提出的模型可以预测患者生命体征和网络流数据中的数据篡改、恶意软件和欺骗攻击，预测准确率为 0.96。实验结果表明，集中模型和联盟模型都是同步的，后者偶尔会略有降低。研究结果表明，所建议的模型可用于 IoMT 领域，在有效维护数据隐私和保密性的同时检测恶意模式。

{"title":"Federated Bayesian optimization XGBoost model for cyberattack detection in internet of medical things","authors":"Blessing Guembe , Sanjay Misra , Ambrose Azeta","doi":"10.1016/j.jpdc.2024.104964","DOIUrl":"10.1016/j.jpdc.2024.104964","url":null,"abstract":"<div><h3>Background</h3><p>Hospitals and medical facilities are increasingly concerned about network security and patient data privacy as the Internet of Medical Things (IoMT) infrastructures continue to develop. Researchers have studied customized network security frameworks and cyberattack detection tools driven by Artificial Intelligence (AI) to counter different types of attacks, such as spoofing, data alteration, and botnet attacks. However, carrying out routine IoMT services and tasks during an under-attack scenario is challenging. Machine Learning has been extensively suggested for detecting cyberattacks in IoMT and IoT infrastructures. However, the conventional centralized approach in ML cannot effectively detect newly emerging attacks without compromising patient data privacy and network flow data confidentiality.</p></div><div><h3>Aim</h3><p>This study discusses a Federated Bayesian Optimization XGBoost framework that employs multimodal sensory signals from patient vital signs and network flow data to detect attack patterns and malicious network traffic in IoMT infrastructure while ensuring data privacy and detecting previously unknown attacks.</p></div><div><h3>Methodology</h3><p>The proposed model employs a Federated Bayesian Optimisation XGBoost approach, which allows us to search the parameter space quickly and find an optimal solution from each local server while aggregating the model parameters from each local server to the centralised server. The XGBoost algorithm generates a new tree by taking into account the previously estimated value for the tree's input data and then optimizing the prediction gain. This study used a dataset with 44 attributes and 16 318 instances. During the preprocessing phase, 10 features were dropped, and the remaining 34 features were used to evaluate the network flows and biometric data (patient vital signs).</p></div><div><h3>Results</h3><p>The performance evaluation reveals that the proposed model predicts data alteration, malware, and spoofing attacks in patients' vital signs and network flow data with a prediction accuracy of 0.96. The results obtained from the experiment demonstrate that both the centralized and federated models are synchronized, with the latter occasionally being slightly reduced.</p></div><div><h3>Conclusion</h3><p>The findings indicate that the suggested model can be incorporated into the IoMT domain to detect malicious patterns while maintaining data privacy and confidentiality efficiently.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"193 ","pages":"Article 104964"},"PeriodicalIF":3.4,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S074373152400128X/pdfft?md5=28ef82e7c7c3fa893ed6e8f14bc69244&pid=1-s2.0-S074373152400128X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141785843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A trajectory privacy protection method using cached candidate result sets 使用缓存候选结果集的轨迹隐私保护方法

IF 3.4 3区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Parallel and Distributed Computing

Pub Date : 2024-07-23 DOI: 10.1016/j.jpdc.2024.104965

Zihao Shen , Yuyu Tang , Hui Wang , Peiqian Liu , Zhenqing Zheng

A trajectory privacy protection method using cached candidate result sets (TPP-CCRS) is proposed for the user trajectory privacy leakage problem. First, the user's area is divided into a grid to lock the user's trajectory range, and a cache area is set on the user's mobile side to cache the candidate result sets queried from the user's area. Second, a security center is deployed to register users securely and assign public and private keys for verifying location information. The same user's location information is randomly divided into M copies and sent to multi-anonymizers. Then, the random concurrent k-anonymization mechanism with multi-anonymizers is used to concurrently k-anonymize M copies of location information. Finally, the prefix tree is added on the location-based service (LBS) server side, and the location information is encrypted using the clustered data fusion privacy protection algorithm. The optimal binary tree algorithm queries user interest points. Security analysis and experimental verification show that the TPP-CCRS can effectively protect user trajectory privacy and improve location information query efficiency.

针对用户轨迹隐私泄露问题，提出了一种使用缓存候选结果集的轨迹隐私保护方法（TPP-CCRS）。首先，将用户区域划分成网格，锁定用户的轨迹范围，并在用户移动端设置缓存区域，缓存从用户区域查询到的候选结果集。其次，部署一个安全中心，对用户进行安全注册，并分配用于验证位置信息的公钥和私钥。同一用户的位置信息会被随机分成若干份，发送给多个匿名者。然后，使用多匿名器的随机并发匿名机制对位置信息副本进行并发匿名。最后，在基于位置的服务（LBS）服务器端添加前缀树，并使用聚类数据融合隐私保护算法对位置信息进行加密。最优二叉树算法查询用户兴趣点。安全分析和实验验证表明，TPP-CCRS 能有效保护用户轨迹隐私，提高位置信息查询效率。

{"title":"A trajectory privacy protection method using cached candidate result sets","authors":"Zihao Shen , Yuyu Tang , Hui Wang , Peiqian Liu , Zhenqing Zheng","doi":"10.1016/j.jpdc.2024.104965","DOIUrl":"10.1016/j.jpdc.2024.104965","url":null,"abstract":"<div><p>A trajectory privacy protection method using cached candidate result sets (TPP-CCRS) is proposed for the user trajectory privacy leakage problem. First, the user's area is divided into a grid to lock the user's trajectory range, and a cache area is set on the user's mobile side to cache the candidate result sets queried from the user's area. Second, a security center is deployed to register users securely and assign public and private keys for verifying location information. The same user's location information is randomly divided into <em>M</em> copies and sent to multi-anonymizers. Then, the random concurrent <em>k</em>-anonymization mechanism with multi-anonymizers is used to concurrently <em>k</em>-anonymize <em>M</em> copies of location information. Finally, the prefix tree is added on the location-based service (LBS) server side, and the location information is encrypted using the clustered data fusion privacy protection algorithm. The optimal binary tree algorithm queries user interest points. Security analysis and experimental verification show that the TPP-CCRS can effectively protect user trajectory privacy and improve location information query efficiency.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"193 ","pages":"Article 104965"},"PeriodicalIF":3.4,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141782288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues) 封面 1 - 完整扉页（常规期刊）/特刊扉页（特刊）

IF 3.4 3区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Parallel and Distributed Computing

Pub Date : 2024-07-20 DOI: 10.1016/S0743-7315(24)00124-2

引用次数: 0