首页 > 最新文献

Journal of Parallel and Distributed Computing最新文献

英文 中文
2-edge-Hamilton-connected dragonfly network 2边汉密尔顿连接蜻蜓网络
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-04-23 DOI: 10.1016/j.jpdc.2025.105095
Huimei Guo , Rong-Xia Hao , Jie Wu
The dragonfly networks are being used in the supercomputers of today. It is of interest to study the topological properties of dragonfly networks. Let G=(V(G),E(G)) be a graph. Let X be a subset of {uv:u,vV(G)anduv} such that every component induced by X on V(G) is a path. If, |X|k and after adding all edges in X to G, the resulting graph contains a Hamiltonian cycle that includes all edges in X, then the graph G is called k-edge-Hamilton-connected. This property can be used to design and optimize routing and forwarding algorithms. By finding such Hamiltonian cycle containing specific edges in the network, it can be ensured that every node can act as an intermediate node to forward packets through a specific channel, thus enabling efficient data transmission and routing. For k=2, determining whether a graph is k-edge-Hamilton-connected is a challenging problem, as it is known to be NP-complete. 2-edge-Hamilton-connected is an extension of Hamilton-connected. In this paper, we prove that the relative arrangement dragonfly network, a type of dragonfly network constructed by the global connections based on relative arrangements, is 2-edge-Hamilton-connected, and this property shows that dragonfly networks have strong reliability. In addition, we determined that D(n,h,g) is 1-Hamilton-connected and paired 2-disjoint path coverable with n4 and h2.
蜻蜓网络被用于今天的超级计算机。研究蜻蜓网络的拓扑特性具有重要的意义。设G=(V(G),E(G))是一个图。设X是{uv:u,v∈v (G)且u≠v}的子集,使得X在v (G)上诱导出的每个分量都是一条路径。如果,|X|≤k,将X中的所有边加到G中,得到的图包含一个包含X中所有边的哈密顿循环,则图G称为k边哈密顿连通。此属性可用于设计和优化路由和转发算法。通过在网络中找到这种包含特定边的哈密顿循环,可以保证每个节点都能作为中间节点通过特定通道转发数据包,从而实现高效的数据传输和路由。对于k=2,确定图是否为k边汉密尔顿连通是一个具有挑战性的问题,因为已知它是np完全的。二边哈密顿连通是哈密顿连通的扩展。本文证明了一种基于相对排列的全局连接构建的蜻蜓网络——相对排列蜻蜓网络是2边hamilton连通的,这一性质表明蜻蜓网络具有较强的可靠性。此外,我们确定了D(n,h,g)是1- hamilton连通和配对的2-不相交路径,可被n≥4和h≥2覆盖。
{"title":"2-edge-Hamilton-connected dragonfly network","authors":"Huimei Guo ,&nbsp;Rong-Xia Hao ,&nbsp;Jie Wu","doi":"10.1016/j.jpdc.2025.105095","DOIUrl":"10.1016/j.jpdc.2025.105095","url":null,"abstract":"<div><div>The dragonfly networks are being used in the supercomputers of today. It is of interest to study the topological properties of dragonfly networks. Let <span><math><mi>G</mi><mo>=</mo><mo>(</mo><mi>V</mi><mo>(</mo><mi>G</mi><mo>)</mo><mo>,</mo><mi>E</mi><mo>(</mo><mi>G</mi><mo>)</mo><mo>)</mo></math></span> be a graph. Let <em>X</em> be a subset of <span><math><mo>{</mo><mi>u</mi><mi>v</mi><mo>:</mo><mi>u</mi><mo>,</mo><mi>v</mi><mo>∈</mo><mi>V</mi><mo>(</mo><mi>G</mi><mo>)</mo><mspace></mspace><mtext>and</mtext><mspace></mspace><mi>u</mi><mo>≠</mo><mi>v</mi><mo>}</mo></math></span> such that every component induced by <em>X</em> on <span><math><mi>V</mi><mo>(</mo><mi>G</mi><mo>)</mo></math></span> is a path. If, <span><math><mo>|</mo><mi>X</mi><mo>|</mo><mo>≤</mo><mi>k</mi></math></span> and after adding all edges in <em>X</em> to <em>G</em>, the resulting graph contains a Hamiltonian cycle that includes all edges in <em>X</em>, then the graph <em>G</em> is called <em>k</em>-edge-Hamilton-connected. This property can be used to design and optimize routing and forwarding algorithms. By finding such Hamiltonian cycle containing specific edges in the network, it can be ensured that every node can act as an intermediate node to forward packets through a specific channel, thus enabling efficient data transmission and routing. For <span><math><mi>k</mi><mo>=</mo><mn>2</mn></math></span>, determining whether a graph is <em>k</em>-edge-Hamilton-connected is a challenging problem, as it is known to be NP-complete. 2-edge-Hamilton-connected is an extension of Hamilton-connected. In this paper, we prove that the relative arrangement dragonfly network, a type of dragonfly network constructed by the global connections based on relative arrangements, is 2-edge-Hamilton-connected, and this property shows that dragonfly networks have strong reliability. In addition, we determined that <span><math><mi>D</mi><mo>(</mo><mi>n</mi><mo>,</mo><mi>h</mi><mo>,</mo><mi>g</mi><mo>)</mo></math></span> is 1-Hamilton-connected and paired 2-disjoint path coverable with <span><math><mi>n</mi><mo>≥</mo><mn>4</mn></math></span> and <span><math><mi>h</mi><mo>≥</mo><mn>2</mn></math></span>.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105095"},"PeriodicalIF":3.4,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143895554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advanced resource management: A hands-on master course in HPC and cloud computing 高级资源管理:HPC和云计算的实践硕士课程
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-04-23 DOI: 10.1016/j.jpdc.2025.105091
Lucia Pons, Salvador Petit, Julio Sahuquillo
Resource management has become a major concern in dealing with performance and fairness in recent computing servers, including a wide variety of shared resources. To achieve high-performing and efficient systems, both hardware and software engineers must be thoroughly trained in effective resource management techniques. This paper introduces the GRE master course (Spanish acronym for Resource Management and Performance Evaluation in Cloud and High-Performance Workloads), which is being offered since Fall 2023. The course is taught by instructors with broad research expertise in resource management and performance evaluation. Subjects covered in this course include workload characterization, state-of-the-art resource management approaches, and performance evaluation tools and methodologies used in production systems. Management techniques are studied both in the context of HPC and cloud computing, where resource efficiency is becoming a primary concern. To enhance the learning experience, the course integrates theoretical concepts with a wide set of hands-on tasks carried out on recent real platforms. A real cloud virtualized environment is mimicked using typical software deployed in production systems such as Proxmox Virtual Environment. Students learn to use tools such as Linux Perf and Intel Vtune Profiler, which are commonly employed by researchers and practitioners to carry out typical tasks like performance bottleneck analysis from a microarchitectural perspective. Overall, the GRE course provides students with a solid foundation and skills in resource management by addressing current hot topics both in the industry and academia. Student satisfaction and learning outcomes prove the success of the GRE course and encourage us to continue in this direction.
在最近的计算服务器(包括各种各样的共享资源)中,资源管理已经成为处理性能和公平性的主要关注点。为了实现高性能和高效的系统,硬件和软件工程师都必须在有效的资源管理技术方面进行彻底的培训。本文介绍了GRE硕士课程(西班牙语是云和高性能工作负载中的资源管理和性能评估的首字母缩略词),该课程自2023年秋季开始提供。该课程由在资源管理和绩效评估方面具有广泛研究专长的教师讲授。本课程涵盖的主题包括工作量表征,最先进的资源管理方法,以及生产系统中使用的性能评估工具和方法。管理技术在高性能计算和云计算的背景下进行了研究,其中资源效率正在成为主要关注的问题。为了增强学习体验,本课程将理论概念与近期在真实平台上进行的广泛实践任务相结合。使用部署在生产系统(如Proxmox Virtual environment)中的典型软件来模拟真实的云虚拟化环境。学生将学习使用Linux Perf和Intel Vtune Profiler等工具,这些工具通常被研究人员和从业者用于执行从微架构角度进行性能瓶颈分析等典型任务。总的来说,GRE课程通过解决当前业界和学术界的热门话题,为学生提供了坚实的资源管理基础和技能。学生的满意度和学习成果证明了GRE课程的成功,并鼓励我们继续沿着这个方向前进。
{"title":"Advanced resource management: A hands-on master course in HPC and cloud computing","authors":"Lucia Pons,&nbsp;Salvador Petit,&nbsp;Julio Sahuquillo","doi":"10.1016/j.jpdc.2025.105091","DOIUrl":"10.1016/j.jpdc.2025.105091","url":null,"abstract":"<div><div>Resource management has become a major concern in dealing with performance and fairness in recent computing servers, including a wide variety of shared resources. To achieve high-performing and efficient systems, both hardware and software engineers must be thoroughly trained in effective resource management techniques. This paper introduces the GRE master course (Spanish acronym for Resource Management and Performance Evaluation in Cloud and High-Performance Workloads), which is being offered since Fall 2023. The course is taught by instructors with broad research expertise in resource management and performance evaluation. Subjects covered in this course include workload characterization, state-of-the-art resource management approaches, and performance evaluation tools and methodologies used in production systems. Management techniques are studied both in the context of HPC and cloud computing, where resource efficiency is becoming a primary concern. To enhance the learning experience, the course integrates theoretical concepts with a wide set of hands-on tasks carried out on recent real platforms. A real cloud virtualized environment is mimicked using typical software deployed in production systems such as Proxmox Virtual Environment. Students learn to use tools such as Linux Perf and Intel Vtune Profiler, which are commonly employed by researchers and practitioners to carry out typical tasks like performance bottleneck analysis from a microarchitectural perspective. Overall, the GRE course provides students with a solid foundation and skills in resource management by addressing current hot topics both in the industry and academia. Student satisfaction and learning outcomes prove the success of the GRE course and encourage us to continue in this direction.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105091"},"PeriodicalIF":3.4,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143869352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HIP-RRTMG_SW: Accelerating a shortwave radiative transfer scheme under the heterogeneous-compute interface for portability (HIP) framework HIP- rrtmg_sw:在异构计算接口移植性(HIP)框架下加速短波辐射传输方案
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-04-23 DOI: 10.1016/j.jpdc.2025.105094
Zhenzhen Wang , Yuzhu Wang , Fei Li , Jinrong Jiang , Xiaocong Wang
With the development of higher-resolution atmospheric circulation models, the amount of calculation increases polynomially with resolution, and the calculation accuracy of physical processes is increasing rapidly. The traditional parallel computing methods based on multi-core CPUs can no longer meet the requirements of high efficiency and real-time computing performance of climate models. In order to improve the computational efficiency and scalability of the Atmospheric General Circulation Model, it is urgent to study efficient parallel algorithms and performance optimization methods for radiation physical process with massive calculations. In this paper, a heterogeneous multidimensional acceleration algorithm is proposed for the shortwave radiation transfer model (RRTMG_SW) based on HIP. Then, the HIP version of RRTMG_SW is developed, namely HIP-RRTMG_SW. In addition, combined with the “MPI + HIP” hybrid programming model, a multi-GPU implementation of RRTMG_SW is also proposed, and it makes full use of the multi-node, multi-core CPU and multi-GPU computing capability of a heterogeneous high performance computing system. Experimental results show that HIP-RRTMG_SW achieves 7.05× of acceleration in the climate simulation with 0.25 resolution using 16 AMD GPUs on the ORISE supercomputer compared with RRTMG_SW using 128 CPU cores. When using 1024 AMD GPUs, HIP-RRTMG_SW is 83.94× faster than RRTMG_SW with 128 CPU cores, indicating that the proposed multi-GPU acceleration algorithm has strong scalability.
随着高分辨率大气环流模式的发展,计算量随分辨率呈多项式增长,物理过程的计算精度迅速提高。传统的基于多核cpu的并行计算方法已不能满足气候模型对高效、实时计算性能的要求。为了提高大气环流模式的计算效率和可扩展性,迫切需要研究大规模计算辐射物理过程的高效并行算法和性能优化方法。针对基于HIP的短波辐射传输模型(RRTMG_SW),提出了一种异构多维加速算法。然后,开发了RRTMG_SW的HIP版本,即HIP-RRTMG_SW。此外,结合“MPI + HIP”混合编程模型,提出了RRTMG_SW的多gpu实现方案,充分利用了异构高性能计算系统的多节点、多核CPU和多gpu计算能力。实验结果表明,与使用128个CPU核的RRTMG_SW相比,在使用16个AMD gpu的ORISE超级计算机上,在0.25°分辨率的气候模拟中,该算法的加速度提高了7.05倍。在使用1024 AMD gpu时,比128 CPU核的RRTMG_SW快83.94倍,表明本文提出的多gpu加速算法具有较强的可扩展性。
{"title":"HIP-RRTMG_SW: Accelerating a shortwave radiative transfer scheme under the heterogeneous-compute interface for portability (HIP) framework","authors":"Zhenzhen Wang ,&nbsp;Yuzhu Wang ,&nbsp;Fei Li ,&nbsp;Jinrong Jiang ,&nbsp;Xiaocong Wang","doi":"10.1016/j.jpdc.2025.105094","DOIUrl":"10.1016/j.jpdc.2025.105094","url":null,"abstract":"<div><div>With the development of higher-resolution atmospheric circulation models, the amount of calculation increases polynomially with resolution, and the calculation accuracy of physical processes is increasing rapidly. The traditional parallel computing methods based on multi-core CPUs can no longer meet the requirements of high efficiency and real-time computing performance of climate models. In order to improve the computational efficiency and scalability of the Atmospheric General Circulation Model, it is urgent to study efficient parallel algorithms and performance optimization methods for radiation physical process with massive calculations. In this paper, a heterogeneous multidimensional acceleration algorithm is proposed for the shortwave radiation transfer model (RRTMG_SW) based on HIP. Then, the HIP version of RRTMG_SW is developed, namely HIP-RRTMG_SW. In addition, combined with the “MPI + HIP” hybrid programming model, a multi-GPU implementation of RRTMG_SW is also proposed, and it makes full use of the multi-node, multi-core CPU and multi-GPU computing capability of a heterogeneous high performance computing system. Experimental results show that HIP-RRTMG_SW achieves 7.05× of acceleration in the climate simulation with 0.25<sup>∘</sup> resolution using 16 AMD GPUs on the ORISE supercomputer compared with RRTMG_SW using 128 CPU cores. When using 1024 AMD GPUs, HIP-RRTMG_SW is 83.94× faster than RRTMG_SW with 128 CPU cores, indicating that the proposed multi-GPU acceleration algorithm has strong scalability.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105094"},"PeriodicalIF":3.4,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143869317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editor's note 编者按
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-04-18 DOI: 10.1016/j.jpdc.2025.105089
Ananth Kalyanaraman
{"title":"Editor's note","authors":"Ananth Kalyanaraman","doi":"10.1016/j.jpdc.2025.105089","DOIUrl":"10.1016/j.jpdc.2025.105089","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105089"},"PeriodicalIF":3.4,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143860371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient parameter tuning for a structure-based virtual screening HPC application 基于结构的虚拟筛选HPC应用程序的有效参数调整
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-04-15 DOI: 10.1016/j.jpdc.2025.105087
Bruno Guindani, Davide Gadioli, Roberto Rocco, Danilo Ardagna, Gianluca Palermo
Virtual screening applications are highly parameterized to optimize the balance between quality and execution performance. While output quality is critical, the entire screening process must be completed within a reasonable time. In fact, a slight reduction in output accuracy may be acceptable when dealing with large datasets. Finding the optimal quality-throughput trade-off depends on the specific HPC system used and should be re-evaluated with each new deployment or significant code update. This paper presents two parallel autotuning techniques for constrained optimization in distributed High-Performance Computing (HPC) environments. These techniques extend sequential Bayesian Optimization (BO) with two parallel asynchronous approaches, and they integrate predictions from Machine Learning (ML) models to help comply with constraints. Our target application is LiGen, a real-world virtual screening software for drug discovery. The proposed methods address two relevant challenges: efficient exploration of the parameter space and performance measurement using domain-specific metrics and procedures. We conduct an experimental campaign comparing the two methods with a popular state-of-the-art autotuner. Results show that our methods find configurations that are, on average, up to 35–42% better than the ones found by the autotuner and the default expert-picked LiGen configuration.
虚拟筛选应用程序高度参数化,以优化质量和执行性能之间的平衡。虽然输出质量至关重要,但整个筛选过程必须在合理的时间内完成。事实上,在处理大型数据集时,输出精度的轻微降低是可以接受的。找到最佳的质量-吞吐量权衡取决于所使用的特定HPC系统,并且应该在每次新的部署或重要的代码更新时重新评估。本文提出了分布式高性能计算(HPC)环境下约束优化的两种并行自调优技术。这些技术通过两种并行异步方法扩展了顺序贝叶斯优化(BO),并集成了机器学习(ML)模型的预测,以帮助遵守约束。我们的目标应用程序是LiGen,一个用于药物发现的现实世界虚拟筛选软件。提出的方法解决了两个相关的挑战:有效地探索参数空间和使用特定领域的度量和程序进行性能测量。我们进行了一项实验活动,将这两种方法与流行的最先进的自动调谐器进行比较。结果表明,我们的方法发现的配置平均比自动调谐器和默认专家选择的LiGen配置发现的配置好35-42%。
{"title":"Efficient parameter tuning for a structure-based virtual screening HPC application","authors":"Bruno Guindani,&nbsp;Davide Gadioli,&nbsp;Roberto Rocco,&nbsp;Danilo Ardagna,&nbsp;Gianluca Palermo","doi":"10.1016/j.jpdc.2025.105087","DOIUrl":"10.1016/j.jpdc.2025.105087","url":null,"abstract":"<div><div>Virtual screening applications are highly parameterized to optimize the balance between quality and execution performance. While output quality is critical, the entire screening process must be completed within a reasonable time. In fact, a slight reduction in output accuracy may be acceptable when dealing with large datasets. Finding the optimal quality-throughput trade-off depends on the specific HPC system used and should be re-evaluated with each new deployment or significant code update. This paper presents two parallel autotuning techniques for constrained optimization in distributed High-Performance Computing (HPC) environments. These techniques extend sequential Bayesian Optimization (BO) with two parallel asynchronous approaches, and they integrate predictions from Machine Learning (ML) models to help comply with constraints. Our target application is LiGen, a real-world virtual screening software for drug discovery. The proposed methods address two relevant challenges: efficient exploration of the parameter space and performance measurement using domain-specific metrics and procedures. We conduct an experimental campaign comparing the two methods with a popular state-of-the-art autotuner. Results show that our methods find configurations that are, on average, up to 35–42% better than the ones found by the autotuner and the default expert-picked LiGen configuration.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105087"},"PeriodicalIF":3.4,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143860372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Schedule multi-instance microservices to minimize response time under budget constraint in cloud HPC systems 在云高性能计算系统中,调度多实例微服务以在预算限制下最小化响应时间
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-04-08 DOI: 10.1016/j.jpdc.2025.105086
Dong Wang , Hong Shen , Hui Tian , Yuanhao Yang
In the emerging microservice-based architecture of cloud HPC systems, a challenging problem of critical importance for system service capability is how we can schedule microservices to minimize the end-to-end response time for user requests while keeping cost within the specified budget. We address this problem for multi-instance microservices requested by a single application to which no existing result is known to our knowledge. We propose an effective two-stage solution of first allocating budget (resources) to microservices within the budget constraint and then deploying microservice instances on servers to minimize system operational overhead. For budget allocation, we formulate it as the Discrete Time Cost Tradeoff (DTCT) problem which is NP-hard, present a linear program (LP) based algorithm, and provide a rigorous proof of its worst-case performance guarantee of 4 from the optimal solution. For microservice deployment, we show that it is harder than the NP-hard problem of 1-D binpacking through establishing its mathematical model, and propose a heuristic algorithm of Least First Mapping that greedily places microservice instances on fewest possible servers to minimize system operation cost. The experiment results of extensive simulations on DAG-based applications of different sizes demonstrate the superior performance of our algorithm in comparison with the existing approaches.
在新兴的基于微服务的云高性能计算系统架构中,一个对系统服务能力至关重要的具有挑战性的问题是,我们如何调度微服务以最小化用户请求的端到端响应时间,同时将成本保持在指定的预算范围内。对于单个应用程序请求的多实例微服务,我们解决了这个问题,据我们所知,这些应用程序没有已知的现有结果。我们提出了一个有效的两阶段解决方案,首先在预算约束内为微服务分配预算(资源),然后在服务器上部署微服务实例以最小化系统操作开销。对于预算分配问题,我们将其表述为np困难的离散时间成本权衡(DTCT)问题,提出了一种基于线性规划(LP)的算法,并从最优解给出了其最坏情况性能保证4的严格证明。对于微服务部署,我们通过建立其数学模型,证明了它比一维绑定包装的np困难问题更难,并提出了一种启发式的最小优先映射算法,该算法将微服务实例贪心地放置在尽可能少的服务器上,以最小化系统运行成本。在不同规模的基于dag的应用程序上进行了大量的仿真实验,结果表明,与现有方法相比,我们的算法具有优越的性能。
{"title":"Schedule multi-instance microservices to minimize response time under budget constraint in cloud HPC systems","authors":"Dong Wang ,&nbsp;Hong Shen ,&nbsp;Hui Tian ,&nbsp;Yuanhao Yang","doi":"10.1016/j.jpdc.2025.105086","DOIUrl":"10.1016/j.jpdc.2025.105086","url":null,"abstract":"<div><div>In the emerging microservice-based architecture of cloud HPC systems, a challenging problem of critical importance for system service capability is how we can schedule microservices to minimize the end-to-end response time for user requests while keeping cost within the specified budget. We address this problem for multi-instance microservices requested by a single application to which no existing result is known to our knowledge. We propose an effective two-stage solution of first allocating budget (resources) to microservices within the budget constraint and then deploying microservice instances on servers to minimize system operational overhead. For budget allocation, we formulate it as the Discrete Time Cost Tradeoff (DTCT) problem which is NP-hard, present a linear program (LP) based algorithm, and provide a rigorous proof of its worst-case performance guarantee of 4 from the optimal solution. For microservice deployment, we show that it is harder than the NP-hard problem of 1-D binpacking through establishing its mathematical model, and propose a heuristic algorithm of Least First Mapping that greedily places microservice instances on fewest possible servers to minimize system operation cost. The experiment results of extensive simulations on DAG-based applications of different sizes demonstrate the superior performance of our algorithm in comparison with the existing approaches.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105086"},"PeriodicalIF":3.4,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143839548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues) 封面1 -完整的扉页(每期)/特刊扉页(每期)
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-04-06 DOI: 10.1016/S0743-7315(25)00041-3
{"title":"Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)","authors":"","doi":"10.1016/S0743-7315(25)00041-3","DOIUrl":"10.1016/S0743-7315(25)00041-3","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"200 ","pages":"Article 105074"},"PeriodicalIF":3.4,"publicationDate":"2025-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143785399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep embedded lightweight CNN network for indoor objects detection on FPGA 基于FPGA的室内物体检测的深度嵌入式轻量级CNN网络
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-04-05 DOI: 10.1016/j.jpdc.2025.105085
Mouna Afif , Riadh Ayachi , Yahia Said , Mohamed Atri
Indoor object detection and recognition present an active research axis in computer vision and artificial intelligence fields. Various deep learning-based techniques can be applied to solve object detection problems. With the appearance of deep convolutional neural networks (DCNN) a great breakthrough for various applications was achieved. Indoor object detection presents a primary task that can assist Blind and Visually Impaired persons (BVI) during their navigation. However, building a reliable indoor object detection system used for edge device implementations still presents a serious challenge. To address this problem, we propose in this work to build an indoor object detection system based on DCNN network. Cross-stage partial network (CSPNet) was used for the detection process and a lightweight backbone based on EfficientNet v2 was used as a network backbone. To ensure a lightweight implementation of the proposed work on FPGA devices, various optimization techniques have been applied to compress the model size and reduce its computation complexity. The proposed indoor object detection system was implemented on a Xilinx ZCU 102 board. Training and testing experiments have been conducted on the proposed indoor objects dataset that counts 11,000 images containing 25 landmark classes and in indoor objects detection dataset. The proposed work achieved 82.60 mAP and 28 FPS for the original version and 80.04 with 35 FPS as processing speed for the compressed version.
室内目标检测与识别是计算机视觉和人工智能领域一个活跃的研究方向。各种基于深度学习的技术可以应用于解决目标检测问题。随着深度卷积神经网络(deep convolutional neural networks, DCNN)的出现,在各种应用上取得了很大的突破。室内目标检测是帮助盲人和视障人士(BVI)导航的主要任务。然而,建立一个可靠的室内目标检测系统用于边缘设备的实现仍然是一个严峻的挑战。针对这一问题,本文提出构建一个基于DCNN网络的室内目标检测系统。检测过程采用跨阶段局部网络(CSPNet),采用基于EfficientNet v2的轻量级骨干网作为网络骨干网。为了确保所提出的工作在FPGA器件上的轻量级实现,各种优化技术被应用于压缩模型尺寸并降低其计算复杂度。所提出的室内目标检测系统在Xilinx ZCU 102板上实现。在包含25个地标类的1.1万幅图像的室内目标数据集和室内目标检测数据集上进行了训练和测试实验。提出的工作在原始版本中实现了82.60 mAP和28 FPS,压缩版本实现了80.04和35 FPS的处理速度。
{"title":"Deep embedded lightweight CNN network for indoor objects detection on FPGA","authors":"Mouna Afif ,&nbsp;Riadh Ayachi ,&nbsp;Yahia Said ,&nbsp;Mohamed Atri","doi":"10.1016/j.jpdc.2025.105085","DOIUrl":"10.1016/j.jpdc.2025.105085","url":null,"abstract":"<div><div>Indoor object detection and recognition present an active research axis in computer vision and artificial intelligence fields. Various deep learning-based techniques can be applied to solve object detection problems. With the appearance of deep convolutional neural networks (DCNN) a great breakthrough for various applications was achieved. Indoor object detection presents a primary task that can assist Blind and Visually Impaired persons (BVI) during their navigation. However, building a reliable indoor object detection system used for edge device implementations still presents a serious challenge. To address this problem, we propose in this work to build an indoor object detection system based on DCNN network. Cross-stage partial network (CSPNet) was used for the detection process and a lightweight backbone based on EfficientNet v2 was used as a network backbone. To ensure a lightweight implementation of the proposed work on FPGA devices, various optimization techniques have been applied to compress the model size and reduce its computation complexity. The proposed indoor object detection system was implemented on a Xilinx ZCU 102 board. Training and testing experiments have been conducted on the proposed indoor objects dataset that counts 11,000 images containing 25 landmark classes and in indoor objects detection dataset. The proposed work achieved 82.60 mAP and 28 FPS for the original version and 80.04 with 35 FPS as processing speed for the compressed version.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"201 ","pages":"Article 105085"},"PeriodicalIF":3.4,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143806913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Price-aware resource management for multi-modal DNN inference in collaborative heterogeneous edge environments 协同异构边缘环境下多模态DNN推理的价格感知资源管理
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-04-04 DOI: 10.1016/j.jpdc.2025.105080
Fengyi Huang , Wenhua Wang , Jianxiong Guo , Wentao Fan , Yang Xu , Tian Wang , Jiannong Cao
To address the limitations of ARM64-based AI edge devices, which are energy-efficient but computationally constrained, as well as general-purpose edge servers, this paper proposes a multi-modal CollaborativeHeterogeneous Edge Computing (CHEC) architecture that achieves low latency and enhances computational capabilities. The CHEC framework, which is segmented into an edge private cloud and an edge public cloud, endeavors to optimize the profits of Edge Service Providers (ESPs) through dynamic heterogeneous resource management. In particular, it is achieved by formulating the challenge as a multi-stage Mixed-Integer Nonlinear Programming (MINLP) problem. We introduce a resource collaboration system based on resource leasing incorporating three Economic Payment Models (EPMs), ensuring efficient and profitable resource utilization. To tackle this complex issue, we develop a three-layer Hybrid Deep Reinforcement Learning (HDRL) algorithm with EPMs, HDRL-EPMs, for efficient management of dynamic and heterogeneous resources. Extensive simulations confirm the algorithm's ability to ensure convergence and approximate optimal solutions, significantly outperforming existing methods. Testbed experiments demonstrate that the CHEC architecture reduces latency by up to 21.83% in real-world applications, markedly surpassing previous approaches.
为了解决基于arm64的人工智能边缘设备节能但计算受限以及通用边缘服务器的局限性,本文提出了一种多模态协同异构边缘计算(CHEC)架构,以实现低延迟和增强计算能力。CHEC框架分为边缘私有云和边缘公共云,通过动态异构资源管理,优化边缘服务提供商(esp)的利润。特别地,它是通过将挑战表述为多阶段混合整数非线性规划(MINLP)问题来实现的。我们引入了一个基于资源租赁的资源协作系统,该系统结合了三种经济支付模式(epm),确保了资源的高效和盈利利用。为了解决这个复杂的问题,我们开发了一种带有epm的三层混合深度强化学习(HDRL)算法,HDRL- epm,用于有效管理动态和异构资源。大量的仿真证实了该算法确保收敛和近似最优解的能力,显著优于现有方法。测试平台实验表明,CHEC架构在实际应用中减少了高达21.83%的延迟,明显优于以前的方法。
{"title":"Price-aware resource management for multi-modal DNN inference in collaborative heterogeneous edge environments","authors":"Fengyi Huang ,&nbsp;Wenhua Wang ,&nbsp;Jianxiong Guo ,&nbsp;Wentao Fan ,&nbsp;Yang Xu ,&nbsp;Tian Wang ,&nbsp;Jiannong Cao","doi":"10.1016/j.jpdc.2025.105080","DOIUrl":"10.1016/j.jpdc.2025.105080","url":null,"abstract":"<div><div>To address the limitations of ARM64-based AI edge devices, which are energy-efficient but computationally constrained, as well as general-purpose edge servers, this paper proposes a multi-modal CollaborativeHeterogeneous Edge Computing (CHEC) architecture that achieves low latency and enhances computational capabilities. The CHEC framework, which is segmented into an edge private cloud and an edge public cloud, endeavors to optimize the profits of Edge Service Providers (ESPs) through dynamic heterogeneous resource management. In particular, it is achieved by formulating the challenge as a multi-stage Mixed-Integer Nonlinear Programming (MINLP) problem. We introduce a resource collaboration system based on resource leasing incorporating three Economic Payment Models (EPMs), ensuring efficient and profitable resource utilization. To tackle this complex issue, we develop a three-layer Hybrid Deep Reinforcement Learning (HDRL) algorithm with EPMs, HDRL-EPMs, for efficient management of dynamic and heterogeneous resources. Extensive simulations confirm the algorithm's ability to ensure convergence and approximate optimal solutions, significantly outperforming existing methods. Testbed experiments demonstrate that the CHEC architecture reduces latency by up to 21.83% in real-world applications, markedly surpassing previous approaches.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"201 ","pages":"Article 105080"},"PeriodicalIF":3.4,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143792551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Embedded scaffolding for teaching and assessing inquiry-based hands-on laboratory on distributed systems 嵌入式脚手架用于教学和评估基于探究的分布式系统动手实验
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-04-03 DOI: 10.1016/j.jpdc.2025.105082
Jordi Guitart

Context

Information Technology education must cultivate proficiency on distributed systems, including strong hands-on laboratory skills, to meet the needs of the society and the industry. Given the complexity of distributed systems, any successful methodology to teach them to novice students must be scaffolded appropriately to ensure that the students acquire the required degree of expertise.

Objective

We propose a comprehensive scaffolding approach for inquiry-based hands-on laboratory on a distributed systems course, which guides not only the learning process, but also its assessment. The approach is based mainly on embedded scaffolds, namely explicit coding and experimental milestones and open questions with predefined grades, but also features contingent scaffolds provided by the teacher when additional assistance is needed.

Method

We apply the methodology in the context of the subject ‘Distributed Network Systems’ offered by our university. We compare the students' performance during three academic courses using the proposed methodology with respect to the three previous courses that were still using the former methodology. We use both visual representations and planned Analysis of Variance (ANOVA) tests to verify our hypothesis defined as a complex contrast.

Findings

We find that there is a statistically significant improvement in the students' performance when using the new methodology, both in their grades of the assignments (F(1, 75.364) = 17.770, p=6.85×105) and, more importantly, also in their grades of the exam questions about the practicals (F(1, 123.186) = 13.285, p=3.93×104).

Implications

Our results encourage other instructors to incorporate embedded scaffolds for teaching and assessing their hands-on laboratories on distributed systems.
背景信息技术教育必须培养对分布式系统的熟练程度,包括强大的动手实验技能,以满足社会和行业的需求。考虑到分布式系统的复杂性,任何向新手教授分布式系统的成功方法都必须有适当的框架,以确保学生获得所需的专业知识。目的在分布式系统课程中,我们提出了一种基于探究性实践实验的综合脚手架方法,该方法不仅指导了学习过程,而且指导了评估过程。该方法主要基于嵌入式框架,即明确的编码和实验里程碑以及预定义分数的开放性问题,但也有教师在需要额外帮助时提供的临时框架。方法我们将该方法应用于我校提供的“分布式网络系统”课题。我们比较了学生在三门学术课程中使用所提出的方法的表现,以及仍然使用前一种方法的三门课程。我们使用视觉表示和计划的方差分析(ANOVA)检验来验证我们定义为复杂对比的假设。我们发现,在使用新方法时,学生的表现在统计上有显着的改善,无论是在他们的作业成绩(F(1,75.364) = 17.770, p=6.85×10−5),更重要的是,在他们的实践考试问题的成绩(F(1,123.186) = 13.285, p=3.93×10−4)。我们的结果鼓励其他教师将嵌入式支架纳入分布式系统的教学和评估实践实验。
{"title":"Embedded scaffolding for teaching and assessing inquiry-based hands-on laboratory on distributed systems","authors":"Jordi Guitart","doi":"10.1016/j.jpdc.2025.105082","DOIUrl":"10.1016/j.jpdc.2025.105082","url":null,"abstract":"<div><h3>Context</h3><div>Information Technology education must cultivate proficiency on distributed systems, including strong hands-on laboratory skills, to meet the needs of the society and the industry. Given the complexity of distributed systems, any successful methodology to teach them to novice students must be scaffolded appropriately to ensure that the students acquire the required degree of expertise.</div></div><div><h3>Objective</h3><div>We propose a comprehensive scaffolding approach for inquiry-based hands-on laboratory on a distributed systems course, which guides not only the learning process, but also its assessment. The approach is based mainly on embedded scaffolds, namely explicit coding and experimental milestones and open questions with predefined grades, but also features contingent scaffolds provided by the teacher when additional assistance is needed.</div></div><div><h3>Method</h3><div>We apply the methodology in the context of the subject ‘Distributed Network Systems’ offered by our university. We compare the students' performance during three academic courses using the proposed methodology with respect to the three previous courses that were still using the former methodology. We use both visual representations and planned Analysis of Variance (ANOVA) tests to verify our hypothesis defined as a complex contrast.</div></div><div><h3>Findings</h3><div>We find that there is a statistically significant improvement in the students' performance when using the new methodology, both in their grades of the assignments (<em>F</em>(1, 75.364) = 17.770, <span><math><mi>p</mi><mo>=</mo><mn>6.85</mn><mo>×</mo><msup><mrow><mn>10</mn></mrow><mrow><mo>−</mo><mn>5</mn></mrow></msup></math></span>) and, more importantly, also in their grades of the exam questions about the practicals (<em>F</em>(1, 123.186) = 13.285, <span><math><mi>p</mi><mo>=</mo><mn>3.93</mn><mo>×</mo><msup><mrow><mn>10</mn></mrow><mrow><mo>−</mo><mn>4</mn></mrow></msup></math></span>).</div></div><div><h3>Implications</h3><div>Our results encourage other instructors to incorporate embedded scaffolds for teaching and assessing their hands-on laboratories on distributed systems.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"201 ","pages":"Article 105082"},"PeriodicalIF":3.4,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143792549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Parallel and Distributed Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1