首页 > 最新文献

Journal of Parallel and Distributed Computing最新文献

英文 中文
Line formation and scattering in silent programmable matter 无声可编程物质中的线形成和散射
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-06-11 DOI: 10.1016/j.jpdc.2025.105129
Alfredo Navarra , Francesco Piselli , Giuseppe Prencipe
Programmable Matter (PM) has been widely investigated in recent years. It refers to some kind of substance with the ability to change its physical properties (e.g., shape or color) in a programmable way. In this paper, we refer to the SILBOT model, where the particles live and move on a triangular grid, are asynchronous in their computations and movements, and do not possess any direct means of communication (silent) or memory of past events (oblivious).
Within SILBOT, we aim at studying Spanning problems, i.e., problems where the particles are required to suitably span all over the grid. We first address the Line Formation problem where the particles are required to end up in a configuration where they all lie on a line, i.e., they are aligned and connected. Secondly, we deal with the more general Scattering problem: starting from any initial configuration, we aim at reaching a final one where no particles occupy neighboring nodes. Furthermore, we investigate configurations where some nodes of the grid can be occupied by unmovable elements (i.e., obstacles) from both theoretical and experimental view points.
可编程物质(PM)近年来得到了广泛的研究。它指的是某种能够以可编程的方式改变其物理特性(例如形状或颜色)的物质。在本文中,我们引用了SILBOT模型,其中粒子在三角形网格上生活和移动,它们的计算和运动是异步的,并且不具有任何直接的通信手段(沉默)或过去事件的记忆(遗忘)。在SILBOT中,我们的目标是研究跨越问题,即要求粒子适当地跨越整个网格的问题。我们首先解决线的形成问题,其中粒子被要求最终在一个配置,他们都躺在一条线上,即,他们是对齐和连接。其次,我们处理更一般的散射问题:从任何初始配置开始,我们的目标是达到没有粒子占据邻近节点的最终配置。此外,我们从理论和实验的角度研究了网格的一些节点可以被不可移动的元素(即障碍物)占用的配置。
{"title":"Line formation and scattering in silent programmable matter","authors":"Alfredo Navarra ,&nbsp;Francesco Piselli ,&nbsp;Giuseppe Prencipe","doi":"10.1016/j.jpdc.2025.105129","DOIUrl":"10.1016/j.jpdc.2025.105129","url":null,"abstract":"<div><div>Programmable Matter (PM) has been widely investigated in recent years. It refers to some kind of substance with the ability to change its physical properties (e.g., shape or color) in a programmable way. In this paper, we refer to the <span><math><mi>SILBOT</mi></math></span> model, where the particles live and move on a triangular grid, are asynchronous in their computations and movements, and do not possess any direct means of communication (silent) or memory of past events (oblivious).</div><div>Within <span><math><mi>SILBOT</mi></math></span>, we aim at studying <em>Spanning</em> problems, i.e., problems where the particles are required to suitably span all over the grid. We first address the <span>Line Formation</span> problem where the particles are required to end up in a configuration where they all lie on a line, i.e., they are aligned and connected. Secondly, we deal with the more general <span>Scattering</span> problem: starting from any initial configuration, we aim at reaching a final one where no particles occupy neighboring nodes. Furthermore, we investigate configurations where some nodes of the grid can be occupied by unmovable elements (i.e., obstacles) from both theoretical and experimental view points.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"204 ","pages":"Article 105129"},"PeriodicalIF":3.4,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144271562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mitigating DDoS attacks in containerized environments: A comparative analysis of Docker and Kubernetes 减轻容器化环境中的DDoS攻击:Docker和Kubernetes的比较分析
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-06-11 DOI: 10.1016/j.jpdc.2025.105130
Yung-Ting Chuang, Chih-Han Tu
Containerization has become the primary method for deploying applications, with web services being the most prevalent. However, exposing server IP addresses to external connections renders containerized services vulnerable to DDoS attacks, which can deplete server resources and hinder legitimate user access. To address this issue, we implement twelve different mitigation strategies, test them across three common types of web services, and conduct experiments on both Docker and Kubernetes deployment platforms. Furthermore, this study introduces a cross-platform, orchestration-aware evaluation framework that simulates realistic multi-service workloads and analyzes defense strategy performance under varying concurrency conditions. Experimental results indicate that Docker excels in managing white-listed traffic and delaying attacker responses, while Kubernetes achieves low completion times, minimum response times, and low failure rates by processing all requests simultaneously. Based on these findings, we provide actionable insights for selecting appropriate mitigation strategies tailored to different orchestration environments and workload patterns, offering practical guidance for securing containerized deployments against low-rate DDoS threats. Our work not only provides empirical performance evaluations but also reveals deployment-specific trade-offs, offering strategic recommendations for building resilient cloud-native infrastructures.
容器化已经成为部署应用程序的主要方法,其中web服务最为流行。但是,将服务器IP地址暴露给外部连接会使容器化服务容易受到DDoS攻击,从而耗尽服务器资源并阻碍合法用户访问。为了解决这个问题,我们实施了12种不同的缓解策略,在三种常见的web服务类型上进行了测试,并在Docker和Kubernetes部署平台上进行了实验。此外,本研究引入了一个跨平台的、编排感知的评估框架,该框架模拟了现实的多服务工作负载,并分析了不同并发条件下的防御策略性能。实验结果表明,Docker在管理白名单流量和延迟攻击者响应方面表现出色,而Kubernetes通过同时处理所有请求,实现了低完成时间、最小响应时间和低故障率。基于这些发现,我们为选择适合不同编排环境和工作负载模式的适当缓解策略提供了可操作的见解,并为保护容器化部署免受低速率DDoS威胁提供了实用指导。我们的工作不仅提供了经验性能评估,还揭示了部署特定的权衡,为构建弹性云原生基础设施提供了战略建议。
{"title":"Mitigating DDoS attacks in containerized environments: A comparative analysis of Docker and Kubernetes","authors":"Yung-Ting Chuang,&nbsp;Chih-Han Tu","doi":"10.1016/j.jpdc.2025.105130","DOIUrl":"10.1016/j.jpdc.2025.105130","url":null,"abstract":"<div><div>Containerization has become the primary method for deploying applications, with web services being the most prevalent. However, exposing server IP addresses to external connections renders containerized services vulnerable to DDoS attacks, which can deplete server resources and hinder legitimate user access. To address this issue, we implement twelve different mitigation strategies, test them across three common types of web services, and conduct experiments on both Docker and Kubernetes deployment platforms. Furthermore, this study introduces a cross-platform, orchestration-aware evaluation framework that simulates realistic multi-service workloads and analyzes defense strategy performance under varying concurrency conditions. Experimental results indicate that Docker excels in managing white-listed traffic and delaying attacker responses, while Kubernetes achieves low completion times, minimum response times, and low failure rates by processing all requests simultaneously. Based on these findings, we provide actionable insights for selecting appropriate mitigation strategies tailored to different orchestration environments and workload patterns, offering practical guidance for securing containerized deployments against low-rate DDoS threats. Our work not only provides empirical performance evaluations but also reveals deployment-specific trade-offs, offering strategic recommendations for building resilient cloud-native infrastructures.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"204 ","pages":"Article 105130"},"PeriodicalIF":3.4,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144280939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging Multi-Instance GPUs through moldable task scheduling 通过可建模的任务调度利用多实例gpu
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-06-06 DOI: 10.1016/j.jpdc.2025.105128
Jorge Villarrubia, Luis Costero, Francisco D. Igual, Katzalin Olcoz
NVIDIA MIG (Multi-Instance GPU) allows partitioning a physical GPU into multiple logical instances with fully-isolated resources, which can be dynamically reconfigured. This work highlights the untapped potential of MIG through moldable task scheduling with dynamic reconfigurations. Specifically, we propose a makespan minimization problem for multi-task execution under MIG constraints. Our profiling shows that assuming monotonicity in task work with respect to resources is not viable, as is usual in multicore scheduling. Relying on a state-of-the-art proposal that does not require such an assumption, we present FAR, a 3-phase algorithm to solve the problem. Phase 1 of FAR builds on a classical task moldability method, phase 2 combines Longest Processing Time First and List Scheduling with a novel repartitioning tree heuristic tailored to MIG constraints, and phase 3 employs local search via task moves and swaps. FAR schedules tasks in batches offline, concatenating their schedules on the fly in an improved way that favors resource reuse. Excluding reconfiguration costs, the List Scheduling proof shows an approximation factor of 7/4 on the NVIDIA A30 model. We adapt the technique to the particular constraints of an NVIDIA A100/H100 to obtain an approximation factor of 2. Including the reconfiguration cost, our real-world experiments reveal a makespan with respect to the optimum no worse than 1.22× for a well-known suite of benchmarks, and 1.10× for synthetic inputs inspired by real kernels. We obtain good experimental results for each batch of tasks, but also in the concatenation of batches, with large improvements over the state-of-the-art and proposals without GPU reconfiguration. Moreover, we show that the proposed heuristics allow a correct adaptation to tasks of very different characteristics. Beyond the specific algorithm, the paper demonstrates the research potential of the MIG technology and suggests useful metrics, workload characterizations and evaluation techniques for future work in this field.
NVIDIA MIG (Multi-Instance GPU)允许将一个物理GPU划分为多个逻辑实例,这些实例具有完全隔离的资源,可以动态重新配置。这项工作通过动态重新配置的可建模任务调度突出了MIG尚未开发的潜力。具体来说,我们提出了在MIG约束下多任务执行的最大完成时间最小化问题。我们的分析表明,假设任务工作相对于资源是单调的,这在多核调度中是不可行的。依靠最先进的建议,不需要这样的假设,我们提出FAR,一个三阶段算法来解决这个问题。FAR的第一阶段建立在经典的任务可塑性方法之上,第二阶段结合了最长处理时间优先和列表调度以及针对MIG约束的新颖的重新划分树启发式方法,第三阶段通过任务移动和交换使用本地搜索。FAR脱机分批调度任务,以一种有利于资源重用的改进方式动态地连接它们的调度。排除重新配置成本,列表调度证明显示NVIDIA A30模型上的近似因子为7/4。我们将该技术应用于NVIDIA A100/H100的特定约束,以获得近似因子2。包括重新配置成本在内,我们的真实世界实验表明,对于一组著名的基准测试,相对于最优的makespan不低于1.22倍,对于由真实内核启发的合成输入,makespan不低于1.10倍。我们对每批任务都获得了良好的实验结果,而且在批的串联中也获得了良好的实验结果,比最新的技术和建议有了很大的改进,而不需要重新配置GPU。此外,我们表明,提出的启发式允许正确的适应任务非常不同的特点。除了具体的算法之外,本文还展示了MIG技术的研究潜力,并为该领域的未来工作提出了有用的度量、工作量表征和评估技术。
{"title":"Leveraging Multi-Instance GPUs through moldable task scheduling","authors":"Jorge Villarrubia,&nbsp;Luis Costero,&nbsp;Francisco D. Igual,&nbsp;Katzalin Olcoz","doi":"10.1016/j.jpdc.2025.105128","DOIUrl":"10.1016/j.jpdc.2025.105128","url":null,"abstract":"<div><div>NVIDIA MIG (Multi-Instance GPU) allows partitioning a physical GPU into multiple logical instances with fully-isolated resources, which can be dynamically reconfigured. This work highlights the untapped potential of MIG through moldable task scheduling with dynamic reconfigurations. Specifically, we propose a makespan minimization problem for multi-task execution under MIG constraints. Our profiling shows that assuming monotonicity in task work with respect to resources is not viable, as is usual in multicore scheduling. Relying on a state-of-the-art proposal that does not require such an assumption, we present <span>FAR</span>, a 3-phase algorithm to solve the problem. Phase 1 of FAR builds on a classical task moldability method, phase 2 combines Longest Processing Time First and List Scheduling with a novel repartitioning tree heuristic tailored to MIG constraints, and phase 3 employs local search via task moves and swaps. <span>FAR</span> schedules tasks in batches offline, concatenating their schedules on the fly in an improved way that favors resource reuse. Excluding reconfiguration costs, the List Scheduling proof shows an approximation factor of 7/4 on the NVIDIA A30 model. We adapt the technique to the particular constraints of an NVIDIA A100/H100 to obtain an approximation factor of 2. Including the reconfiguration cost, our real-world experiments reveal a makespan with respect to the optimum no worse than 1.22× for a well-known suite of benchmarks, and 1.10× for synthetic inputs inspired by real kernels. We obtain good experimental results for each batch of tasks, but also in the concatenation of batches, with large improvements over the state-of-the-art and proposals without GPU reconfiguration. Moreover, we show that the proposed heuristics allow a correct adaptation to tasks of very different characteristics. Beyond the specific algorithm, the paper demonstrates the research potential of the MIG technology and suggests useful metrics, workload characterizations and evaluation techniques for future work in this field.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"204 ","pages":"Article 105128"},"PeriodicalIF":3.4,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144254815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Privacy-enabled academic certificate authentication and deep learning-based student performance prediction system using hyperledger blockchain technology 支持隐私的学术证书认证和基于深度学习的学生成绩预测系统,使用超级账本区块链技术
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-06-05 DOI: 10.1016/j.jpdc.2025.105119
Sangeetha A․S , Shunmugan S
Blockchain systems do not rely on trust for electronic transactions and it emerged as a popular technology due to its attributes like immutability, transparency, distributed storage, and decentralized control. Student certificates and skill verification play crucial roles in job applications and other purposes. In traditional systems, certificate forgery is a common problem, especially in online education. Processes, such as issuing and verifying student certifications along with student performance prediction for higher education or job recruitment are often lengthy and time-consuming. Integrating blockchain into certificate verification protocols offers authenticity and significantly reduces processing times. Hence, this research introduced a novel secure privacy preservation-based academic certificate authentication system (CertAuthSystem) for verifying the academic certificates of students. The CertAuthSystem contains different entities, such as Student, System, University, Blockchain, and Company. The university issues certificates to students, which are stored in Blockchain, and when the student applies for a job/scholarship, he/she transmits the certificate and the blockID to the organization, based on which verification is performed. Moreover, the student’s performance is predicted by a classifier named Deep Long Short-Term Memory (DLSTM). Then, CertAuthSystem is examined for its superiority considering measures, like validation time, memory, throughput and execution time and has achieved values of 53.412 ms, 86.6 MB, 94.876 Mbps, and 73.57 ms, correspondingly for block size 7. Finally, the prediction analysis of the DLSTM classifier is done based on evaluation metrics, such as precision, recall and F measure, which attained superior values of 90.77 %, 92.99 %, and 91.86 %.
区块链系统不依赖于电子交易的信任,由于其不变性、透明度、分布式存储和分散控制等属性,它成为一种流行的技术。学生证书和技能验证在工作申请和其他目的中起着至关重要的作用。在传统的教育系统中,证书伪造是一个常见的问题,特别是在网络教育中。诸如颁发和验证学生证书以及高等教育或工作招聘的学生表现预测等过程通常是漫长而耗时的。将区块链集成到证书验证协议中提供了真实性,并大大缩短了处理时间。因此,本研究提出了一种基于安全隐私保护的新型学历证书认证系统(CertAuthSystem),用于对学生的学历证书进行验证。CertAuthSystem包含不同的实体,如Student、System、University、区块链和Company。大学向学生颁发证书,这些证书存储在区块链中,当学生申请工作/奖学金时,他/她将证书和blockID传送给组织,根据该组织进行验证。此外,学生的表现是由一个分类器称为深长短期记忆(DLSTM)预测。然后,考虑验证时间、内存、吞吐量和执行时间等指标,对CertAuthSystem的优越性进行了检验,在块大小为7的情况下,CertAuthSystem的值分别为53.412 ms、86.6 MB、94.876 Mbps和73.57 ms。最后,基于准确率、召回率和F度量等评价指标对DLSTM分类器进行预测分析,得到了90.77%、92.99%和91.86%的优值。
{"title":"Privacy-enabled academic certificate authentication and deep learning-based student performance prediction system using hyperledger blockchain technology","authors":"Sangeetha A․S ,&nbsp;Shunmugan S","doi":"10.1016/j.jpdc.2025.105119","DOIUrl":"10.1016/j.jpdc.2025.105119","url":null,"abstract":"<div><div>Blockchain systems do not rely on trust for electronic transactions and it emerged as a popular technology due to its attributes like immutability, transparency, distributed storage, and decentralized control. Student certificates and skill verification play crucial roles in job applications and other purposes. In traditional systems, certificate forgery is a common problem, especially in online education. Processes, such as issuing and verifying student certifications along with student performance prediction for higher education or job recruitment are often lengthy and time-consuming. Integrating blockchain into certificate verification protocols offers authenticity and significantly reduces processing times. Hence, this research introduced a novel secure privacy preservation-based academic certificate authentication system (CertAuthSystem) for verifying the academic certificates of students. The CertAuthSystem contains different entities, such as Student, System, University, Blockchain, and Company. The university issues certificates to students, which are stored in Blockchain, and when the student applies for a job/scholarship, he/she transmits the certificate and the blockID to the organization, based on which verification is performed. Moreover, the student’s performance is predicted by a classifier named Deep Long Short-Term Memory (DLSTM). Then, CertAuthSystem is examined for its superiority considering measures, like validation time, memory, throughput and execution time and has achieved values of 53.412 ms, 86.6 MB, 94.876 Mbps, and 73.57 ms, correspondingly for block size 7. Finally, the prediction analysis of the DLSTM classifier is done based on evaluation metrics, such as precision, recall and F measure, which attained superior values of 90.77 %, 92.99 %, and 91.86 %.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"204 ","pages":"Article 105119"},"PeriodicalIF":3.4,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144289001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues) 封面1 -完整的扉页(每期)/特刊扉页(每期)
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-06-05 DOI: 10.1016/S0743-7315(25)00089-9
{"title":"Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)","authors":"","doi":"10.1016/S0743-7315(25)00089-9","DOIUrl":"10.1016/S0743-7315(25)00089-9","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"203 ","pages":"Article 105122"},"PeriodicalIF":3.4,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144213164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Power, energy, and performance analysis of single- and multi-threaded applications in the ARM ThunderX2 ARM ThunderX2中单线程和多线程应用程序的功耗、能源和性能分析
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-06-02 DOI: 10.1016/j.jpdc.2025.105118
Ibai Calero, Salvador Petit, María E. Gómez, Julio Sahuquillo
Energy efficiency has been a major concern in data centers, and the problem is exacerbated as its size continues to rise. However, the lack of tools to measure and handle this energy at a fine granularity (e.g., processor core or last-level cache) has translated into slow research advances in this topic. Understanding where (i.e., which components) and when (the point in time) energy consumption translates into minor performance improvements is of paramount importance to design any energy-aware scheduler. This paper characterizes the relationship between energy consumption and performance in a 28-core ARM ThunderX2 processor for both single-threaded and multi-threaded applications.
This paper shows that single-threaded applications with high CPU activity maintain their performance in spite of the inter-application interference at shared resources, but this comes at the expense of higher power consumption. Conversely, applications that heavily utilize the L3 cache and memory consume less power but suffer significant performance degradation as interference levels rise.
In contrast, multi-threaded applications show two distinct behaviors. On the one hand, some of them experience significant performance gains when they execute in a higher number of cores with more threads, which outweighs the increase in power consumption, leading to high energy efficiency.
能源效率一直是数据中心的一个主要问题,随着数据中心规模的不断扩大,这个问题变得更加严重。然而,由于缺乏精确测量和处理这些能量的工具(例如,处理器核心或最后一级缓存),导致该主题的研究进展缓慢。了解能耗在哪里(即哪些组件)以及何时(时间点)转化为较小的性能改进,对于设计任何能感知能耗的调度器都是至关重要的。本文描述了28核ARM ThunderX2处理器在单线程和多线程应用中的能耗与性能之间的关系。本文表明,尽管在共享资源上存在应用程序间的干扰,具有高CPU活动的单线程应用程序仍能保持其性能,但这是以更高的功耗为代价的。相反,大量使用L3缓存和内存的应用程序消耗较少的功率,但随着干扰水平的提高,性能会显著下降。相反,多线程应用程序表现出两种不同的行为。一方面,当它们在更多的内核和更多的线程中执行时,其中一些会获得显着的性能提升,这超过了功耗的增加,从而实现高能效。
{"title":"Power, energy, and performance analysis of single- and multi-threaded applications in the ARM ThunderX2","authors":"Ibai Calero,&nbsp;Salvador Petit,&nbsp;María E. Gómez,&nbsp;Julio Sahuquillo","doi":"10.1016/j.jpdc.2025.105118","DOIUrl":"10.1016/j.jpdc.2025.105118","url":null,"abstract":"<div><div>Energy efficiency has been a major concern in data centers, and the problem is exacerbated as its size continues to rise. However, the lack of tools to measure and handle this energy at a fine granularity (e.g., processor core or last-level cache) has translated into slow research advances in this topic. Understanding where (i.e., which components) and when (the point in time) energy consumption translates into minor performance improvements is of paramount importance to design any energy-aware scheduler. This paper characterizes the relationship between energy consumption and performance in a 28-core ARM ThunderX2 processor for both single-threaded and multi-threaded applications.</div><div>This paper shows that single-threaded applications with high CPU activity maintain their performance in spite of the inter-application interference at shared resources, but this comes at the expense of higher power consumption. Conversely, applications that heavily utilize the L3 cache and memory consume less power but suffer significant performance degradation as interference levels rise.</div><div>In contrast, multi-threaded applications show two distinct behaviors. On the one hand, some of them experience significant performance gains when they execute in a higher number of cores with more threads, which outweighs the increase in power consumption, leading to high energy efficiency.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"204 ","pages":"Article 105118"},"PeriodicalIF":3.4,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144242749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues) 封面1 -完整的扉页(每期)/特刊扉页(每期)
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-05-21 DOI: 10.1016/S0743-7315(25)00079-6
{"title":"Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)","authors":"","doi":"10.1016/S0743-7315(25)00079-6","DOIUrl":"10.1016/S0743-7315(25)00079-6","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105112"},"PeriodicalIF":3.4,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144105472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ConCeal: A Winograd convolution code template for optimising GCU in parallel 一个Winograd卷积代码模板,用于并行优化GCU
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-05-21 DOI: 10.1016/j.jpdc.2025.105108
Tian Chen , Yu-an Tan , Thar Baker , Haokai Wu , Qiuyu Zhang , Yuanzhang Li
By minimising arithmetic operations, Winograd convolution substantially reduces the computational complexity of convolution, a pivotal operation in the training and inference stages of Convolutional Neural Networks (CNNs). This study leverages the hardware architecture and capabilities of Shanghai Enflame Technology's AI accelerator, the General Computing Unit (GCU). We develop a code template named ConCeal for Winograd convolution with 3 × 3 kernels, employing a set of interrelated optimisations, including task partitioning, memory layout design, and parallelism. These optimisations fully exploit GCU's computing resources by optimising dataflow and parallelizing the execution of tasks on GCU cores, thereby enhancing Winograd convolution. Moreover, the integrated optimisations in the template are efficiently applicable to other operators, such as max pooling. Using this template, we implement and assess the performance of four Winograd convolution operators on GCU. The experimental results showcase that Conceal operators achieve a maximum of 2.04× and an average of 1.49× speedup compared to the fastest GEMM-based convolution implementations on GCU. Additionally, the ConCeal operators demonstrate competitive or superior computing resource utilisation in certain ResNet and VGG convolution layers when compared to cuDNN on RTX2080.
通过最小化算术运算,Winograd卷积大大降低了卷积的计算复杂度,卷积是卷积神经网络(cnn)训练和推理阶段的关键操作。本研究利用了上海恩焰科技人工智能加速器通用计算单元(GCU)的硬件架构和功能。我们开发了一个名为“隐藏”的代码模板,用于3x3内核的Winograd卷积,采用了一组相关的优化,包括任务分区、内存布局设计和并行性。这些优化充分利用了GCU的计算资源,优化了数据流,并在GCU核心上并行执行任务,从而增强了Winograd卷积。此外,模板中的集成优化可以有效地应用于其他操作,例如最大池。使用该模板,我们在GCU上实现并评估了四个Winograd卷积算子的性能。实验结果表明,与GCU上最快的基于gem的卷积实现相比,隐蔽算子的最大加速速度为2.04倍,平均加速速度为1.49倍。此外,与RTX2080上的cuDNN相比,在某些ResNet和VGG卷积层中,hide运算符显示出具有竞争力或更高的计算资源利用率。
{"title":"ConCeal: A Winograd convolution code template for optimising GCU in parallel","authors":"Tian Chen ,&nbsp;Yu-an Tan ,&nbsp;Thar Baker ,&nbsp;Haokai Wu ,&nbsp;Qiuyu Zhang ,&nbsp;Yuanzhang Li","doi":"10.1016/j.jpdc.2025.105108","DOIUrl":"10.1016/j.jpdc.2025.105108","url":null,"abstract":"<div><div>By minimising arithmetic operations, Winograd convolution substantially reduces the computational complexity of convolution, a pivotal operation in the training and inference stages of Convolutional Neural Networks (CNNs). This study leverages the hardware architecture and capabilities of Shanghai Enflame Technology's AI accelerator, the General Computing Unit (GCU). We develop a code template named ConCeal for Winograd convolution with 3 × 3 kernels, employing a set of interrelated optimisations, including task partitioning, memory layout design, and parallelism. These optimisations fully exploit GCU's computing resources by optimising dataflow and parallelizing the execution of tasks on GCU cores, thereby enhancing Winograd convolution. Moreover, the integrated optimisations in the template are efficiently applicable to other operators, such as max pooling. Using this template, we implement and assess the performance of four Winograd convolution operators on GCU. The experimental results showcase that Conceal operators achieve a maximum of 2.04× and an average of 1.49× speedup compared to the fastest GEMM-based convolution implementations on GCU. Additionally, the ConCeal operators demonstrate competitive or superior computing resource utilisation in certain ResNet and VGG convolution layers when compared to cuDNN on RTX2080.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"203 ","pages":"Article 105108"},"PeriodicalIF":3.4,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144114726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Thermal modeling and optimal allocation of avionics safety-critical tasks on heterogeneous MPSoCs 异构mpsoc上航空电子安全关键任务的热建模和优化分配
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-05-20 DOI: 10.1016/j.jpdc.2025.105107
Zdeněk Hanzálek , Ondřej Benedikt , Přemysl Šůcha , Pavel Zaykov , Michal Sojka
Multi-Processor Systems-on-Chip (MPSoC) can deliver high performance needed in many industrial domains, including aerospace. However, their high power consumption, combined with avionics safety standards, brings new thermal management challenges. This paper investigates techniques for offline thermal-aware allocation of periodic tasks on heterogeneous MPSoCs running at a fixed clock frequency, as required in avionics. The goal is to find the assignment of tasks to (i) cores and (ii) temporal isolation windows, as required in ARINC 653 standard, while minimizing the MPSoC temperature. To achieve that, we formulate a new optimization problem, we derive its NP-hardness, and we identify its subproblem solvable in polynomial time. Furthermore, we propose and analyze three power models, and integrate them within several novel optimization approaches based on heuristics, a black-box optimizer, and Integer Linear Programming (ILP). We perform the experimental evaluation on three popular MPSoC platforms (NXP i.MX8QM MEK, NXP i.MX8QM Ixora, NVIDIA TX2) and observe a difference of up to 5.5 °C among the tested methods (corresponding to a 22% reduction w.r.t. the ambient temperature). We also show that our method, integrating the empirical power model with the ILP, outperforms the other methods on all tested platforms.
多处理器片上系统(MPSoC)可以提供包括航空航天在内的许多工业领域所需的高性能。然而,它们的高功耗,加上航空电子安全标准,带来了新的热管理挑战。本文研究了在航空电子设备中需要的以固定时钟频率运行的异构mpsoc上的周期性任务的离线热感知分配技术。目标是找到任务分配到(i)核心和(ii)时间隔离窗口,如ARINC 653标准所要求的,同时最小化MPSoC温度。为了实现这一目标,我们提出了一个新的优化问题,我们推导了它的np -硬度,并确定了它的子问题在多项式时间内可解。此外,我们提出并分析了三种幂模型,并将它们集成到基于启发式、黑盒优化器和整数线性规划(ILP)的几种新型优化方法中。我们在三种流行的MPSoC平台(NXP i.MX8QM MEK, NXP i.MX8QM Ixora, NVIDIA TX2)上进行了实验评估,并观察到测试方法之间的差异高达5.5°C(对应于环境温度降低22%)。我们还表明,我们的方法将经验功率模型与ILP相结合,在所有测试平台上都优于其他方法。
{"title":"Thermal modeling and optimal allocation of avionics safety-critical tasks on heterogeneous MPSoCs","authors":"Zdeněk Hanzálek ,&nbsp;Ondřej Benedikt ,&nbsp;Přemysl Šůcha ,&nbsp;Pavel Zaykov ,&nbsp;Michal Sojka","doi":"10.1016/j.jpdc.2025.105107","DOIUrl":"10.1016/j.jpdc.2025.105107","url":null,"abstract":"<div><div>Multi-Processor Systems-on-Chip (MPSoC) can deliver high performance needed in many industrial domains, including aerospace. However, their high power consumption, combined with avionics safety standards, brings new thermal management challenges. This paper investigates techniques for offline thermal-aware allocation of periodic tasks on heterogeneous MPSoCs running at a fixed clock frequency, as required in avionics. The goal is to find the assignment of tasks to (i) cores and (ii) temporal isolation windows, as required in ARINC 653 standard, while minimizing the MPSoC temperature. To achieve that, we formulate a new optimization problem, we derive its NP-hardness, and we identify its subproblem solvable in polynomial time. Furthermore, we propose and analyze three power models, and integrate them within several novel optimization approaches based on heuristics, a black-box optimizer, and Integer Linear Programming (ILP). We perform the experimental evaluation on three popular MPSoC platforms (NXP i.MX8QM MEK, NXP i.MX8QM Ixora, NVIDIA TX2) and observe a difference of up to 5.5<!--> <!-->°C among the tested methods (corresponding to a 22% reduction w.r.t. the ambient temperature). We also show that our method, integrating the empirical power model with the ILP, outperforms the other methods on all tested platforms.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"203 ","pages":"Article 105107"},"PeriodicalIF":3.4,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144114761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal scheduling algorithms for software-defined radio pipelined and replicated task chains on multicore architectures 多核架构下软件定义无线电流水线和复制任务链的优化调度算法
IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-05-16 DOI: 10.1016/j.jpdc.2025.105106
Diane Orhan , Laércio Lima Pilla , Denis Barthou , Adrien Cassagne , Olivier Aumage , Romain Tajan , Christophe Jégo , Camille Leroux
Software-Defined Radio (SDR) represents a move from dedicated hardware to software implementations of digital communication standards. This approach offers flexibility, shorter time to market, maintainability, and lower costs, but it requires an optimized distribution tasks in order to meet performance requirements. Thus, we study the problem of scheduling SDR linear task chains of stateless and stateful tasks for streaming processing. We model this problem as a pipelined workflow scheduling problem based on pipelined and replicated parallelism on homogeneous resources. We propose an optimal dynamic programming solution and an optimal greedy algorithm named OTAC for maximizing throughput while also minimizing resource utilization. Moreover, the optimality of the proposed scheduling algorithm is proved. We evaluate our solutions and compare their execution times and schedules to other algorithms using synthetic task chains and an implementation of the DVB-S2 communication standard on the AFF3CT SDR Domain Specific Language. Our results demonstrate how OTAC quickly finds optimal schedules, leading consistently to better results than other algorithms, or equivalent results with fewer resources.
软件定义无线电(SDR)代表了数字通信标准从专用硬件到软件实现的转变。这种方法提供了灵活性、更短的上市时间、可维护性和更低的成本,但是为了满足性能需求,它需要优化的分发任务。因此,我们研究了流处理中无状态和有状态任务的SDR线性任务链调度问题。我们将此问题建模为基于同构资源上的流水线并行和复制并行的流水线工作流调度问题。提出了一种最优动态规划方案和最优贪心算法OTAC,以实现吞吐量最大化和资源利用率最小化。此外,还证明了所提调度算法的最优性。我们评估了我们的解决方案,并使用合成任务链和在AFF3CT SDR域特定语言上实现DVB-S2通信标准,将其执行时间和时间表与其他算法进行比较。我们的结果展示了OTAC如何快速找到最佳调度,从而始终比其他算法获得更好的结果,或者用更少的资源获得相同的结果。
{"title":"Optimal scheduling algorithms for software-defined radio pipelined and replicated task chains on multicore architectures","authors":"Diane Orhan ,&nbsp;Laércio Lima Pilla ,&nbsp;Denis Barthou ,&nbsp;Adrien Cassagne ,&nbsp;Olivier Aumage ,&nbsp;Romain Tajan ,&nbsp;Christophe Jégo ,&nbsp;Camille Leroux","doi":"10.1016/j.jpdc.2025.105106","DOIUrl":"10.1016/j.jpdc.2025.105106","url":null,"abstract":"<div><div>Software-Defined Radio (SDR) represents a move from dedicated hardware to software implementations of digital communication standards. This approach offers flexibility, shorter time to market, maintainability, and lower costs, but it requires an optimized distribution tasks in order to meet performance requirements. Thus, we study the problem of scheduling SDR linear task chains of stateless and stateful tasks for streaming processing. We model this problem as a pipelined workflow scheduling problem based on pipelined and replicated parallelism on homogeneous resources. We propose an optimal dynamic programming solution and an optimal greedy algorithm named OTAC for maximizing throughput while also minimizing resource utilization. Moreover, the optimality of the proposed scheduling algorithm is proved. We evaluate our solutions and compare their execution times and schedules to other algorithms using synthetic task chains and an implementation of the DVB-S2 communication standard on the AFF3CT SDR Domain Specific Language. Our results demonstrate how OTAC quickly finds optimal schedules, leading consistently to better results than other algorithms, or equivalent results with fewer resources.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"204 ","pages":"Article 105106"},"PeriodicalIF":3.4,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144195960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Parallel and Distributed Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1