首页 > 最新文献

Journal of Parallel and Distributed Computing最新文献

英文 中文
SpEpistasis: A sparse approach for three-way epistasis detection SpEpistasis:检测三向外显率的稀疏方法
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-09-23 DOI: 10.1016/j.jpdc.2024.104989
Epistasis detection is a fundamental application in the areas of bioinformatics and biomedicine, providing important insights regarding the relationship between the human genome and the occurrence of certain diseases. Exhaustive epistasis detection approaches are employed to achieve an accurate and deterministic solution, at the cost of high computational complexity, especially when targeting high-order epistasis. While recent works employ vectorization and cache-blocking techniques to alleviate this burden, these solutions are now limited by the maximum performance of the functional units of computing systems. Thus, to further improve the performance of epistasis detection it is necessary to reduce its number of memory transfers and computations. To tackle this issue, this work proposes SpEpistasis, which performs three-way epistasis detection by relying on sparse features, which by only storing the non-zero elements of the dataset, allows for reducing the number of operations needed for epistasis detection. To achieve this goal, a new hybrid format to represent the input dataset is proposed, which stores a subset of the data in the compressed sparse row format. Moreover, new sparse-aware algorithmic approaches are also proposed in order to leverage both the hybrid format and the vector capabilities of current CPUs from Intel, AMD, and ARM. The experimental results show that SpEpistasis provides a speedup up to 3.7× and average speedups of around 1.8× and 1.33× when compared with other state-of-the-art works.
外显子检测是生物信息学和生物医学领域的一项基本应用,它为了解人类基因组与某些疾病的发生之间的关系提供了重要依据。为了获得精确和确定性的解决方案,人们采用了穷举外显检测方法,但代价是计算复杂度高,尤其是在针对高阶外显时。虽然最近的研究采用了矢量化和高速缓存阻塞技术来减轻这一负担,但这些解决方案目前受到计算系统功能单元最大性能的限制。因此,要进一步提高外显检测的性能,就必须减少内存传输和计算的次数。为了解决这个问题,本研究提出了 SpEpistasis,它依靠稀疏特征进行三向表征检测,通过只存储数据集的非零元素,可以减少表征检测所需的运算量。为了实现这一目标,我们提出了一种新的混合格式来表示输入数据集,它以压缩稀疏行格式存储数据子集。此外,还提出了新的稀疏感知算法方法,以充分利用混合格式和当前英特尔、AMD 和 ARM CPU 的矢量功能。实验结果表明,与其他先进技术相比,SpEpistasis 的速度提高了 3.7 倍,平均速度提高了约 1.8 倍和 1.33 倍。
{"title":"SpEpistasis: A sparse approach for three-way epistasis detection","authors":"","doi":"10.1016/j.jpdc.2024.104989","DOIUrl":"10.1016/j.jpdc.2024.104989","url":null,"abstract":"<div><div>Epistasis detection is a fundamental application in the areas of bioinformatics and biomedicine, providing important insights regarding the relationship between the human genome and the occurrence of certain diseases. Exhaustive epistasis detection approaches are employed to achieve an accurate and deterministic solution, at the cost of high computational complexity, especially when targeting high-order epistasis. While recent works employ vectorization and cache-blocking techniques to alleviate this burden, these solutions are now limited by the maximum performance of the functional units of computing systems. Thus, to further improve the performance of epistasis detection it is necessary to reduce its number of memory transfers and computations. To tackle this issue, this work proposes SpEpistasis, which performs three-way epistasis detection by relying on sparse features, which by only storing the non-zero elements of the dataset, allows for reducing the number of operations needed for epistasis detection. To achieve this goal, a new hybrid format to represent the input dataset is proposed, which stores a subset of the data in the compressed sparse row format. Moreover, new sparse-aware algorithmic approaches are also proposed in order to leverage both the hybrid format and the vector capabilities of current CPUs from Intel, AMD, and ARM. The experimental results show that SpEpistasis provides a speedup up to 3.7× and average speedups of around 1.8× and 1.33× when compared with other state-of-the-art works.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142323939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust and Scalable Federated Learning Framework for Client Data Heterogeneity Based on Optimal Clustering 基于最优聚类的稳健且可扩展的客户端数据异构联合学习框架
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-09-22 DOI: 10.1016/j.jpdc.2024.104990
Federated learning is a promising paradigm for applications across a variety of domains. However, there are some challenges that must be addressed in real-world scenarios, particularly the data heterogeneity among participating clients. Most existing studies primarily focus on the issue of non-independent and identically distributed data, but they do not consider the critical aspect of data quality heterogeneity. When low-quality data is contributed by some clients, the efficacy of models trained through the traditional approaches will be significantly compromised. Therefore, we propose ROSCFL, a robust and scalable federated learning framework for client data heterogeneity based on optimal clustering. We first develop a cluster contribution evaluation strategy based on the optimal clustering to quantify the contribution of each cluster. Next, we design a robust model aggregation strategy, which effectively mitigates the impact of low-quality data on the global model by optimizing weight allocation and client sampling. Finally, we introduce a client incorporation mechanism to enhance the scalability of ROSCFL. Extensive experiments have been conducted, and the results demonstrate that ROSCFL achieves strong robustness and scalability, particularly in scenarios wherein data distribution and quality heterogeneity coexist.
联盟学习是一种前景广阔的范式,适用于各种领域的应用。然而,在实际应用场景中必须应对一些挑战,特别是参与客户之间的数据异构问题。大多数现有研究主要关注非独立和相同分布数据的问题,但没有考虑数据质量异质性这一关键方面。当一些客户提供的数据质量较低时,通过传统方法训练的模型的有效性将大打折扣。因此,我们提出了 ROSCFL,一个基于最优聚类的针对客户端数据异质性的稳健且可扩展的联合学习框架。我们首先开发了一种基于最优聚类的聚类贡献评估策略,以量化每个聚类的贡献。接下来,我们设计了一种稳健的模型聚合策略,通过优化权重分配和客户端采样,有效减轻了低质量数据对全局模型的影响。最后,我们引入了一种客户端合并机制,以增强 ROSCFL 的可扩展性。我们进行了广泛的实验,结果表明 ROSCFL 具有很强的鲁棒性和可扩展性,尤其是在数据分布和质量异质性并存的情况下。
{"title":"Robust and Scalable Federated Learning Framework for Client Data Heterogeneity Based on Optimal Clustering","authors":"","doi":"10.1016/j.jpdc.2024.104990","DOIUrl":"10.1016/j.jpdc.2024.104990","url":null,"abstract":"<div><div>Federated learning is a promising paradigm for applications across a variety of domains. However, there are some challenges that must be addressed in real-world scenarios, particularly the data heterogeneity among participating clients. Most existing studies primarily focus on the issue of non-independent and identically distributed data, but they do not consider the critical aspect of data quality heterogeneity. When low-quality data is contributed by some clients, the efficacy of models trained through the traditional approaches will be significantly compromised. Therefore, we propose ROSCFL, a robust and scalable federated learning framework for client data heterogeneity based on optimal clustering. We first develop a cluster contribution evaluation strategy based on the optimal clustering to quantify the contribution of each cluster. Next, we design a robust model aggregation strategy, which effectively mitigates the impact of low-quality data on the global model by optimizing weight allocation and client sampling. Finally, we introduce a client incorporation mechanism to enhance the scalability of ROSCFL. Extensive experiments have been conducted, and the results demonstrate that ROSCFL achieves strong robustness and scalability, particularly in scenarios wherein data distribution and quality heterogeneity coexist.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2024-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142327797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues) 封面 1 - 完整扉页(常规期刊)/特刊扉页(特刊)
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-09-19 DOI: 10.1016/S0743-7315(24)00146-1
{"title":"Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)","authors":"","doi":"10.1016/S0743-7315(24)00146-1","DOIUrl":"10.1016/S0743-7315(24)00146-1","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0743731524001461/pdfft?md5=4b65d789bc9db964e4fbb6b24c70b8aa&pid=1-s2.0-S0743731524001461-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142274618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Survey of federated learning in intrusion detection 入侵检测中的联合学习调查
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-09-18 DOI: 10.1016/j.jpdc.2024.104976

Intrusion detection methods are crucial means to mitigate network security issues. However, the challenges posed by large-scale complex network environments include local information islands, regional privacy leaks, communication burdens, difficulties in handling heterogeneous data, and storage resource bottlenecks. Federated learning has the potential to address these challenges by leveraging widely distributed and heterogeneous data, achieving load balancing of storage and computing resources across multiple nodes, and reducing the risks of privacy leaks and bandwidth resource demands. This paper reviews the process of constructing federated learning based intrusion detection system from the perspective of intrusion detection. Specifically, it outlines six main aspects: application scenario analysis, federated learning methods, privacy and security protection, selection of classification models, data sources and client data distribution, and evaluation metrics, establishing them as key research content. Subsequently, six research topics are extracted based on these aspects. These topics include expanding application scenarios, enhancing aggregation algorithm, enhancing security, enhancing classification models, personalizing model and utilizing unlabeled data. Furthermore, the paper delves into research content related to each of these topics through in-depth investigation and analysis. Finally, the paper discusses the current challenges faced by research, and suggests promising directions for future exploration.

入侵检测方法是缓解网络安全问题的重要手段。然而,大规模复杂网络环境带来的挑战包括本地信息孤岛、区域隐私泄露、通信负担、异构数据处理困难和存储资源瓶颈。联盟学习可以利用广泛分布的异构数据,在多个节点之间实现存储和计算资源的负载平衡,降低隐私泄露风险和带宽资源需求,从而有可能应对这些挑战。本文从入侵检测的角度回顾了构建基于联合学习的入侵检测系统的过程。具体而言,本文从应用场景分析、联合学习方法、隐私和安全保护、分类模型选择、数据源和客户端数据分布、评估指标六个方面进行了概述,并将其确立为重点研究内容。随后,根据这些内容提炼出六个研究课题。这些课题包括扩展应用场景、增强聚合算法、增强安全性、增强分类模型、个性化模型和利用无标记数据。此外,本文还通过深入调查和分析,探讨了与每个主题相关的研究内容。最后,本文讨论了当前研究面临的挑战,并提出了未来有希望的探索方向。
{"title":"Survey of federated learning in intrusion detection","authors":"","doi":"10.1016/j.jpdc.2024.104976","DOIUrl":"10.1016/j.jpdc.2024.104976","url":null,"abstract":"<div><p>Intrusion detection methods are crucial means to mitigate network security issues. However, the challenges posed by large-scale complex network environments include local information islands, regional privacy leaks, communication burdens, difficulties in handling heterogeneous data, and storage resource bottlenecks. Federated learning has the potential to address these challenges by leveraging widely distributed and heterogeneous data, achieving load balancing of storage and computing resources across multiple nodes, and reducing the risks of privacy leaks and bandwidth resource demands. This paper reviews the process of constructing federated learning based intrusion detection system from the perspective of intrusion detection. Specifically, it outlines six main aspects: application scenario analysis, federated learning methods, privacy and security protection, selection of classification models, data sources and client data distribution, and evaluation metrics, establishing them as key research content. Subsequently, six research topics are extracted based on these aspects. These topics include expanding application scenarios, enhancing aggregation algorithm, enhancing security, enhancing classification models, personalizing model and utilizing unlabeled data. Furthermore, the paper delves into research content related to each of these topics through in-depth investigation and analysis. Finally, the paper discusses the current challenges faced by research, and suggests promising directions for future exploration.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142271035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The analysis of P2P networks with malicious peers and repairable breakdown based on Geo/Geo/1+1 queue 基于 Geo/Geo/1+1 队列的恶意对等网络和可修复故障的 P2P 网络分析
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-09-16 DOI: 10.1016/j.jpdc.2024.104979

The incredible growth of Peer-to-Peer (P2P) networks has brought with it some complex challenges, such as trust issues and high bandwidth consumption. To address these challenges, this paper analyzes the “free-riding” behavior, system energy consumption, and the benefits of requesting and service peers in the network. A Geo/Geo/1+1 queuing model is built with malicious peers which includes several strategies such as repairable breakdown, synchronized multiple working vacations, differentiated service, and waiting threshold. The matrix-geometric solution method is used to obtain steady-state distribution and performance measures. By conducting numerical experiments and analyzing the impact of each parameter, it is possible to optimize the system's performance and reduce energy consumption. With careful adjustments to parameter values, significant cost savings of requesting peers and energy conservation can be achieved. The resulting analysis provides a comprehensive understanding of the behavior of P2P networks, and the strategies proposed in the study can be used to optimize the performance of P2P networks.

点对点(P2P)网络的迅猛发展带来了一些复杂的挑战,如信任问题和高带宽消耗。为应对这些挑战,本文分析了网络中的 "搭便车 "行为、系统能耗以及请求和服务对等方的收益。本文建立了一个包含恶意对等节点的 Geo/Geo/1+1 队列模型,其中包括多种策略,如可修复故障、同步多个工作假期、差异化服务和等待阈值。利用矩阵几何求解法获得稳态分布和性能指标。通过进行数值实验并分析各参数的影响,可以优化系统性能并降低能耗。通过对参数值的精心调整,可以显著节省请求同行的成本并节约能源。由此得出的分析结果让我们对 P2P 网络的行为有了全面的了解,研究中提出的策略可用于优化 P2P 网络的性能。
{"title":"The analysis of P2P networks with malicious peers and repairable breakdown based on Geo/Geo/1+1 queue","authors":"","doi":"10.1016/j.jpdc.2024.104979","DOIUrl":"10.1016/j.jpdc.2024.104979","url":null,"abstract":"<div><p>The incredible growth of Peer-to-Peer (P2P) networks has brought with it some complex challenges, such as trust issues and high bandwidth consumption. To address these challenges, this paper analyzes the “free-riding” behavior, system energy consumption, and the benefits of requesting and service peers in the network. A Geo/Geo/1+1 queuing model is built with malicious peers which includes several strategies such as repairable breakdown, synchronized multiple working vacations, differentiated service, and waiting threshold. The matrix-geometric solution method is used to obtain steady-state distribution and performance measures. By conducting numerical experiments and analyzing the impact of each parameter, it is possible to optimize the system's performance and reduce energy consumption. With careful adjustments to parameter values, significant cost savings of requesting peers and energy conservation can be achieved. The resulting analysis provides a comprehensive understanding of the behavior of P2P networks, and the strategies proposed in the study can be used to optimize the performance of P2P networks.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142242365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
B2DFL: Bringing butterfly to decentralized federated learning assisted with blockchain B2DFL:为区块链辅助的分散式联合学习带来蝴蝶效应
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-09-16 DOI: 10.1016/j.jpdc.2024.104978

We propose a novel decentralized federated learning framework called B2DFL. It decomposes the aggregation process of vanilla FL into layered and serialized sub-aggregation processes and offloads the communication and computation from a single point to distributed nodes, thus addressing the single point of failure issue in centralized FL. The decentralization of B2DFL is based on the Butterfly, a distributed network topology, to organize and orchestrate the order and rules of node aggregation. Additionally, to mitigate potential risks such as dropouts or tampering, we leverage the blockchain and IPFS systems. Specifically, after each node completes its computation (including training and aggregation), it generates a hash value of the results as proof. We maintain a Tamper-evident Data Structure (TDS) on the blockchain, which records these proofs to ensure tamper-proofing and fast verification. To reduce the storage burden on the blockchain and improve throughput, we store the aggregated results on IPFS, a system that enables quick data location through hash values of data, for data backup. We also design a node replacement mechanism for quick dropout handling. We conduct a comprehensive performance evaluation and experimental results demonstrate that B2DFL presents a significant performance improvement while achieving privacy and decentralization.

我们提出了一种名为 B2DFL 的新型分散式联合学习框架。它将虚幻 FL 的聚合过程分解为分层和序列化的子聚合过程,并将通信和计算从单点卸载到分布式节点,从而解决了集中式 FL 中的单点故障问题。B2DFL 的去中心化基于分布式网络拓扑结构 Butterfly,以组织和协调节点聚合的顺序和规则。此外,为了降低掉线或篡改等潜在风险,我们还利用了区块链和 IPFS 系统。具体来说,每个节点完成计算(包括训练和聚合)后,都会生成结果的哈希值作为证明。我们在区块链上维护一个防篡改数据结构(TDS),记录这些证明,以确保防篡改和快速验证。为了减轻区块链的存储负担并提高吞吐量,我们将汇总结果存储在 IPFS 上,该系统可通过数据的哈希值快速定位数据,以便进行数据备份。我们还设计了一种节点替换机制,用于快速处理掉链问题。我们进行了全面的性能评估,实验结果表明,B2DFL 在实现隐私和去中心化的同时,还显著提高了性能。
{"title":"B2DFL: Bringing butterfly to decentralized federated learning assisted with blockchain","authors":"","doi":"10.1016/j.jpdc.2024.104978","DOIUrl":"10.1016/j.jpdc.2024.104978","url":null,"abstract":"<div><p>We propose a novel decentralized federated learning framework called B2DFL. It decomposes the aggregation process of vanilla FL into layered and serialized sub-aggregation processes and offloads the communication and computation from a single point to distributed nodes, thus addressing the single point of failure issue in centralized FL. The decentralization of B2DFL is based on the Butterfly, a distributed network topology, to organize and orchestrate the order and rules of node aggregation. Additionally, to mitigate potential risks such as dropouts or tampering, we leverage the blockchain and IPFS systems. Specifically, after each node completes its computation (including training and aggregation), it generates a hash value of the results as proof. We maintain a Tamper-evident Data Structure (TDS) on the blockchain, which records these proofs to ensure tamper-proofing and fast verification. To reduce the storage burden on the blockchain and improve throughput, we store the aggregated results on IPFS, a system that enables quick data location through hash values of data, for data backup. We also design a node replacement mechanism for quick dropout handling. We conduct a comprehensive performance evaluation and experimental results demonstrate that B2DFL presents a significant performance improvement while achieving privacy and decentralization.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142242362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accelerating Fortran codes: A method for integrating Coarray Fortran with CUDA Fortran and OpenMP 加速 Fortran 代码:将 Coarray Fortran 与 CUDA Fortran 和 OpenMP 集成的方法
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-09-06 DOI: 10.1016/j.jpdc.2024.104977

Fortran's prominence in scientific computing requires strategies to ensure both that legacy codes are efficient on high-performance computing systems, and that the language remains attractive for the development of new high-performance codes. Coarray Fortran (CAF), part of the Fortran 2008 standard introduced for parallel programming, facilitates distributed memory parallelism with a syntax familiar to Fortran programmers, simplifying the transition from single-processor to multi-processor coding. This research focuses on innovating and refining a parallel programming methodology that fuses the strengths of Intel Coarray Fortran, Nvidia CUDA Fortran, and OpenMP for distributed memory parallelism, high-speed GPU acceleration and shared memory parallelism respectively. We consider the management of pageable and pinned memory, CPU-GPU affinity in NUMA multiprocessors, and robust compiler interfacing with speed optimisation. We demonstrate our method through its application to a parallelised Poisson solver and compare the methodology, implementation, and scaling performance to that of the Message Passing Interface (MPI), finding CAF offers similar speeds with easier implementation. For new codes, this approach offers a faster route to optimised parallel computing. For legacy codes, it eases the transition to parallel computing, allowing their transformation into scalable, high-performance computing applications without the need for extensive re-design or additional syntax.

Fortran 在科学计算领域的突出地位要求我们采取策略,既要确保传统代码在高性能计算系统上的效率,又要确保该语言对开发新的高性能代码保持吸引力。Coarray Fortran(CAF)是为并行编程引入的 Fortran 2008 标准的一部分,它以 Fortran 程序员熟悉的语法促进了分布式内存并行性,简化了从单处理器到多处理器编码的过渡。本研究的重点是创新和完善一种并行编程方法,它融合了英特尔 Coarray Fortran、Nvidia CUDA Fortran 和 OpenMP 在分布式内存并行、高速 GPU 加速和共享内存并行方面的优势。我们考虑了可分页内存和针式内存的管理、NUMA 多核处理器中 CPU-GPU 的亲和性以及编译器与速度优化的稳健接口。我们将我们的方法应用于并行泊松求解器,并与消息传递接口(MPI)的方法、实现和扩展性能进行了比较,发现 CAF 提供了类似的速度,且更易于实现。对于新代码而言,这种方法为优化并行计算提供了更快的途径。对于传统代码来说,它简化了向并行计算的过渡,使其能够转变为可扩展的高性能计算应用,而无需大量的重新设计或额外的语法。
{"title":"Accelerating Fortran codes: A method for integrating Coarray Fortran with CUDA Fortran and OpenMP","authors":"","doi":"10.1016/j.jpdc.2024.104977","DOIUrl":"10.1016/j.jpdc.2024.104977","url":null,"abstract":"<div><p>Fortran's prominence in scientific computing requires strategies to ensure both that legacy codes are efficient on high-performance computing systems, and that the language remains attractive for the development of new high-performance codes. Coarray Fortran (CAF), part of the Fortran 2008 standard introduced for parallel programming, facilitates distributed memory parallelism with a syntax familiar to Fortran programmers, simplifying the transition from single-processor to multi-processor coding. This research focuses on innovating and refining a parallel programming methodology that fuses the strengths of Intel Coarray Fortran, Nvidia CUDA Fortran, and OpenMP for distributed memory parallelism, high-speed GPU acceleration and shared memory parallelism respectively. We consider the management of pageable and pinned memory, CPU-GPU affinity in NUMA multiprocessors, and robust compiler interfacing with speed optimisation. We demonstrate our method through its application to a parallelised Poisson solver and compare the methodology, implementation, and scaling performance to that of the Message Passing Interface (MPI), finding CAF offers similar speeds with easier implementation. For new codes, this approach offers a faster route to optimised parallel computing. For legacy codes, it eases the transition to parallel computing, allowing their transformation into scalable, high-performance computing applications without the need for extensive re-design or additional syntax.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0743731524001412/pdfft?md5=69e1ea2ba9c62d46ed1506e701029846&pid=1-s2.0-S0743731524001412-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142172595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues) 封面 1 - 完整扉页(常规期刊)/特刊扉页(特刊)
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-08-24 DOI: 10.1016/S0743-7315(24)00136-9
{"title":"Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)","authors":"","doi":"10.1016/S0743-7315(24)00136-9","DOIUrl":"10.1016/S0743-7315(24)00136-9","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0743731524001369/pdfft?md5=dfe2623c0180f0c77ae8f5870a3416cc&pid=1-s2.0-S0743731524001369-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142048051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clustering-based multi-objective optimization considering fairness for multi-workflow scheduling on clouds 基于聚类的多目标优化,考虑云上多工作流调度的公平性
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-08-23 DOI: 10.1016/j.jpdc.2024.104968

Distributed computing, such as cloud computing, provides promising platforms for orchestrating scientific workflows' tasks based on their sequences and dependencies. Workflow scheduling plays an important role in optimizing concerned objectives for distributed computing, such as minimizing the makespan and cost. Many researchers have focused on optimizing a specific single workflow with multiple objectives. Currently, there are few studies on multi-workflow scheduling, with most research focusing on objectives such as cost and makespan. However, multi-workflow scheduling requires the design of specific objectives that reflect the unique characteristics of multiple workflows. On the other hand, clustering-based approaches have garnered significant attention in the field of workflow scheduling over distributed computing resources due to their advantage in reducing data communication among tasks. Despite this, the effectiveness of clustering-based algorithms has not been extensively studied and validated in the context of multi-objective multi-workflow scheduling models. Motivated by these factors, we propose an approach for multiple workflows' multi-objective optimization (MOO), considering the new defined metric, fairness. We first mathematically formulate the fairness and define a fairness-involved MOO model. Then, we propose an advanced clustering-based resource optimization strategy in multiple workflow runs. Experimental results show that the proposed approach performs better than the compared algorithms without significant compromise of the overall makespan and cost as well as individual fairness, which can guide the simulation workflow scheduling on clouds.

云计算等分布式计算为根据序列和依赖关系协调科学工作流任务提供了前景广阔的平台。工作流调度在优化分布式计算的相关目标(如最小化时间跨度和成本)方面发挥着重要作用。许多研究人员专注于优化具有多个目标的特定单一工作流。目前,关于多工作流调度的研究很少,大多数研究都集中在成本和有效期等目标上。然而,多工作流调度需要设计特定的目标,以反映多个工作流的独特特征。另一方面,基于聚类的方法在减少任务间数据通信方面具有优势,因此在分布式计算资源上的工作流调度领域备受关注。尽管如此,基于聚类的算法在多目标多工作流调度模型中的有效性还没有得到广泛的研究和验证。在这些因素的推动下,我们提出了一种多工作流多目标优化(MOO)方法,并考虑了新定义的指标--公平性。我们首先从数学角度阐述了公平性,并定义了一个涉及公平性的 MOO 模型。然后,我们在多个工作流运行中提出了一种先进的基于聚类的资源优化策略。实验结果表明,所提方法的性能优于同类算法,且不会明显影响整体工期和成本以及个体公平性,可为云上的仿真工作流调度提供指导。
{"title":"Clustering-based multi-objective optimization considering fairness for multi-workflow scheduling on clouds","authors":"","doi":"10.1016/j.jpdc.2024.104968","DOIUrl":"10.1016/j.jpdc.2024.104968","url":null,"abstract":"<div><p>Distributed computing, such as cloud computing, provides promising platforms for orchestrating scientific workflows' tasks based on their sequences and dependencies. Workflow scheduling plays an important role in optimizing concerned objectives for distributed computing, such as minimizing the makespan and cost. Many researchers have focused on optimizing a specific single workflow with multiple objectives. Currently, there are few studies on multi-workflow scheduling, with most research focusing on objectives such as cost and makespan. However, multi-workflow scheduling requires the design of specific objectives that reflect the unique characteristics of multiple workflows. On the other hand, clustering-based approaches have garnered significant attention in the field of workflow scheduling over distributed computing resources due to their advantage in reducing data communication among tasks. Despite this, the effectiveness of clustering-based algorithms has not been extensively studied and validated in the context of multi-objective multi-workflow scheduling models. Motivated by these factors, we propose an approach for multiple workflows' multi-objective optimization (MOO), considering the new defined metric, fairness. We first mathematically formulate the fairness and define a fairness-involved MOO model. Then, we propose an advanced clustering-based resource optimization strategy in multiple workflow runs. Experimental results show that the proposed approach performs better than the compared algorithms without significant compromise of the overall makespan and cost as well as individual fairness, which can guide the simulation workflow scheduling on clouds.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142122720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
StarPlat: A versatile DSL for graph analytics StarPlat:图形分析的通用 DSL
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-08-14 DOI: 10.1016/j.jpdc.2024.104967

Graphs model several real-world phenomena. With the growth of unstructured and semi-structured data, parallelization of graph algorithms is inevitable. Unfortunately, due to inherent irregularity of computation, memory access, and communication, graph algorithms are traditionally challenging to parallelize. To tame this challenge, several libraries, frameworks, and domain-specific languages (DSLs) have been proposed to reduce the parallel programming burden of the users, who are often domain experts. However, existing frameworks to model graph algorithms typically target a single architecture. In this paper, we present a graph DSL, named StarPlat, that allows programmers to specify graph algorithms in a high-level format, but generates code for three different backends from the same algorithmic specification. In particular, the DSL compiler generates OpenMP for multi-core systems, MPI for distributed systems, and CUDA for many-core GPUs. Since these three are completely different parallel programming paradigms, binding them together under the same language is challenging. We share our experience with the language design. Central to our compiler is an intermediate representation which allows a common representation of the high-level program, from which individual backend code generations begin. We demonstrate the expressiveness of StarPlat by specifying four graph algorithms: betweenness centrality computation, page rank computation, single-source shortest paths, and triangle counting. Using a suite of ten large graphs, we illustrate the effectiveness of our approach by comparing the performance of the generated codes with that obtained with hand-crafted library codes. We find that the generated code is competitive to library-based codes in many cases. More importantly, we show the feasibility to generate efficient codes for different target architectures from the same algorithmic specification of graph algorithms.

图是若干现实世界现象的模型。随着非结构化和半结构化数据的增长,图算法的并行化不可避免。遗憾的是,由于计算、内存访问和通信本身的不规则性,图算法的并行化历来具有挑战性。为了应对这一挑战,人们提出了一些库、框架和特定领域语言(DSL),以减轻用户(通常是领域专家)的并行编程负担。然而,现有的图算法建模框架通常只针对单一架构。在本文中,我们介绍了一种名为 StarPlat 的图 DSL,它允许程序员以高级格式指定图算法,但可根据相同的算法规范生成适用于三种不同后端的代码。特别是,DSL 编译器可为多核系统生成 OpenMP,为分布式系统生成 MPI,为多核 GPU 生成 CUDA。由于这三种并行编程范式完全不同,将它们绑定在同一种语言下具有挑战性。我们将分享我们在语言设计方面的经验。我们的编译器的核心是一种中间表示法,它允许对高级程序进行通用表示,并以此为基础开始生成各个后端代码。我们通过指定四种图算法来展示 StarPlat 的表现力:间度中心性计算、页等级计算、单源最短路径和三角形计数。我们使用一套十个大型图,通过比较生成代码与手工库代码的性能,说明了我们方法的有效性。我们发现,生成的代码在很多情况下都能与基于库的代码相媲美。更重要的是,我们展示了通过相同的图算法规范为不同目标架构生成高效代码的可行性。
{"title":"StarPlat: A versatile DSL for graph analytics","authors":"","doi":"10.1016/j.jpdc.2024.104967","DOIUrl":"10.1016/j.jpdc.2024.104967","url":null,"abstract":"<div><p>Graphs model several real-world phenomena. With the growth of unstructured and semi-structured data, parallelization of graph algorithms is inevitable. Unfortunately, due to inherent irregularity of computation, memory access, and communication, graph algorithms are traditionally challenging to parallelize. To tame this challenge, several libraries, frameworks, and domain-specific languages (DSLs) have been proposed to reduce the parallel programming burden of the users, who are often domain experts. However, existing frameworks to model graph algorithms typically target a single architecture. In this paper, we present a graph DSL, named StarPlat, that allows programmers to specify graph algorithms in a high-level format, but generates code for three different backends from the same algorithmic specification. In particular, the DSL compiler generates OpenMP for multi-core systems, MPI for distributed systems, and CUDA for many-core GPUs. Since these three are completely different parallel programming paradigms, binding them together under the same language is challenging. We share our experience with the language design. Central to our compiler is an intermediate representation which allows a common representation of the high-level program, from which individual backend code generations begin. We demonstrate the expressiveness of StarPlat by specifying four graph algorithms: betweenness centrality computation, page rank computation, single-source shortest paths, and triangle counting. Using a suite of ten large graphs, we illustrate the effectiveness of our approach by comparing the performance of the generated codes with that obtained with hand-crafted library codes. We find that the generated code is competitive to library-based codes in many cases. More importantly, we show the feasibility to generate efficient codes for different target architectures from the same algorithmic specification of graph algorithms.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142044301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Parallel and Distributed Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1