Pub Date : 2025-06-11DOI: 10.1016/j.jpdc.2025.105129
Alfredo Navarra , Francesco Piselli , Giuseppe Prencipe
Programmable Matter (PM) has been widely investigated in recent years. It refers to some kind of substance with the ability to change its physical properties (e.g., shape or color) in a programmable way. In this paper, we refer to the model, where the particles live and move on a triangular grid, are asynchronous in their computations and movements, and do not possess any direct means of communication (silent) or memory of past events (oblivious).
Within , we aim at studying Spanning problems, i.e., problems where the particles are required to suitably span all over the grid. We first address the Line Formation problem where the particles are required to end up in a configuration where they all lie on a line, i.e., they are aligned and connected. Secondly, we deal with the more general Scattering problem: starting from any initial configuration, we aim at reaching a final one where no particles occupy neighboring nodes. Furthermore, we investigate configurations where some nodes of the grid can be occupied by unmovable elements (i.e., obstacles) from both theoretical and experimental view points.
{"title":"Line formation and scattering in silent programmable matter","authors":"Alfredo Navarra , Francesco Piselli , Giuseppe Prencipe","doi":"10.1016/j.jpdc.2025.105129","DOIUrl":"10.1016/j.jpdc.2025.105129","url":null,"abstract":"<div><div>Programmable Matter (PM) has been widely investigated in recent years. It refers to some kind of substance with the ability to change its physical properties (e.g., shape or color) in a programmable way. In this paper, we refer to the <span><math><mi>SILBOT</mi></math></span> model, where the particles live and move on a triangular grid, are asynchronous in their computations and movements, and do not possess any direct means of communication (silent) or memory of past events (oblivious).</div><div>Within <span><math><mi>SILBOT</mi></math></span>, we aim at studying <em>Spanning</em> problems, i.e., problems where the particles are required to suitably span all over the grid. We first address the <span>Line Formation</span> problem where the particles are required to end up in a configuration where they all lie on a line, i.e., they are aligned and connected. Secondly, we deal with the more general <span>Scattering</span> problem: starting from any initial configuration, we aim at reaching a final one where no particles occupy neighboring nodes. Furthermore, we investigate configurations where some nodes of the grid can be occupied by unmovable elements (i.e., obstacles) from both theoretical and experimental view points.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"204 ","pages":"Article 105129"},"PeriodicalIF":3.4,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144271562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-11DOI: 10.1016/j.jpdc.2025.105130
Yung-Ting Chuang, Chih-Han Tu
Containerization has become the primary method for deploying applications, with web services being the most prevalent. However, exposing server IP addresses to external connections renders containerized services vulnerable to DDoS attacks, which can deplete server resources and hinder legitimate user access. To address this issue, we implement twelve different mitigation strategies, test them across three common types of web services, and conduct experiments on both Docker and Kubernetes deployment platforms. Furthermore, this study introduces a cross-platform, orchestration-aware evaluation framework that simulates realistic multi-service workloads and analyzes defense strategy performance under varying concurrency conditions. Experimental results indicate that Docker excels in managing white-listed traffic and delaying attacker responses, while Kubernetes achieves low completion times, minimum response times, and low failure rates by processing all requests simultaneously. Based on these findings, we provide actionable insights for selecting appropriate mitigation strategies tailored to different orchestration environments and workload patterns, offering practical guidance for securing containerized deployments against low-rate DDoS threats. Our work not only provides empirical performance evaluations but also reveals deployment-specific trade-offs, offering strategic recommendations for building resilient cloud-native infrastructures.
{"title":"Mitigating DDoS attacks in containerized environments: A comparative analysis of Docker and Kubernetes","authors":"Yung-Ting Chuang, Chih-Han Tu","doi":"10.1016/j.jpdc.2025.105130","DOIUrl":"10.1016/j.jpdc.2025.105130","url":null,"abstract":"<div><div>Containerization has become the primary method for deploying applications, with web services being the most prevalent. However, exposing server IP addresses to external connections renders containerized services vulnerable to DDoS attacks, which can deplete server resources and hinder legitimate user access. To address this issue, we implement twelve different mitigation strategies, test them across three common types of web services, and conduct experiments on both Docker and Kubernetes deployment platforms. Furthermore, this study introduces a cross-platform, orchestration-aware evaluation framework that simulates realistic multi-service workloads and analyzes defense strategy performance under varying concurrency conditions. Experimental results indicate that Docker excels in managing white-listed traffic and delaying attacker responses, while Kubernetes achieves low completion times, minimum response times, and low failure rates by processing all requests simultaneously. Based on these findings, we provide actionable insights for selecting appropriate mitigation strategies tailored to different orchestration environments and workload patterns, offering practical guidance for securing containerized deployments against low-rate DDoS threats. Our work not only provides empirical performance evaluations but also reveals deployment-specific trade-offs, offering strategic recommendations for building resilient cloud-native infrastructures.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"204 ","pages":"Article 105130"},"PeriodicalIF":3.4,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144280939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-06DOI: 10.1016/j.jpdc.2025.105128
Jorge Villarrubia, Luis Costero, Francisco D. Igual, Katzalin Olcoz
NVIDIA MIG (Multi-Instance GPU) allows partitioning a physical GPU into multiple logical instances with fully-isolated resources, which can be dynamically reconfigured. This work highlights the untapped potential of MIG through moldable task scheduling with dynamic reconfigurations. Specifically, we propose a makespan minimization problem for multi-task execution under MIG constraints. Our profiling shows that assuming monotonicity in task work with respect to resources is not viable, as is usual in multicore scheduling. Relying on a state-of-the-art proposal that does not require such an assumption, we present FAR, a 3-phase algorithm to solve the problem. Phase 1 of FAR builds on a classical task moldability method, phase 2 combines Longest Processing Time First and List Scheduling with a novel repartitioning tree heuristic tailored to MIG constraints, and phase 3 employs local search via task moves and swaps. FAR schedules tasks in batches offline, concatenating their schedules on the fly in an improved way that favors resource reuse. Excluding reconfiguration costs, the List Scheduling proof shows an approximation factor of 7/4 on the NVIDIA A30 model. We adapt the technique to the particular constraints of an NVIDIA A100/H100 to obtain an approximation factor of 2. Including the reconfiguration cost, our real-world experiments reveal a makespan with respect to the optimum no worse than 1.22× for a well-known suite of benchmarks, and 1.10× for synthetic inputs inspired by real kernels. We obtain good experimental results for each batch of tasks, but also in the concatenation of batches, with large improvements over the state-of-the-art and proposals without GPU reconfiguration. Moreover, we show that the proposed heuristics allow a correct adaptation to tasks of very different characteristics. Beyond the specific algorithm, the paper demonstrates the research potential of the MIG technology and suggests useful metrics, workload characterizations and evaluation techniques for future work in this field.
{"title":"Leveraging Multi-Instance GPUs through moldable task scheduling","authors":"Jorge Villarrubia, Luis Costero, Francisco D. Igual, Katzalin Olcoz","doi":"10.1016/j.jpdc.2025.105128","DOIUrl":"10.1016/j.jpdc.2025.105128","url":null,"abstract":"<div><div>NVIDIA MIG (Multi-Instance GPU) allows partitioning a physical GPU into multiple logical instances with fully-isolated resources, which can be dynamically reconfigured. This work highlights the untapped potential of MIG through moldable task scheduling with dynamic reconfigurations. Specifically, we propose a makespan minimization problem for multi-task execution under MIG constraints. Our profiling shows that assuming monotonicity in task work with respect to resources is not viable, as is usual in multicore scheduling. Relying on a state-of-the-art proposal that does not require such an assumption, we present <span>FAR</span>, a 3-phase algorithm to solve the problem. Phase 1 of FAR builds on a classical task moldability method, phase 2 combines Longest Processing Time First and List Scheduling with a novel repartitioning tree heuristic tailored to MIG constraints, and phase 3 employs local search via task moves and swaps. <span>FAR</span> schedules tasks in batches offline, concatenating their schedules on the fly in an improved way that favors resource reuse. Excluding reconfiguration costs, the List Scheduling proof shows an approximation factor of 7/4 on the NVIDIA A30 model. We adapt the technique to the particular constraints of an NVIDIA A100/H100 to obtain an approximation factor of 2. Including the reconfiguration cost, our real-world experiments reveal a makespan with respect to the optimum no worse than 1.22× for a well-known suite of benchmarks, and 1.10× for synthetic inputs inspired by real kernels. We obtain good experimental results for each batch of tasks, but also in the concatenation of batches, with large improvements over the state-of-the-art and proposals without GPU reconfiguration. Moreover, we show that the proposed heuristics allow a correct adaptation to tasks of very different characteristics. Beyond the specific algorithm, the paper demonstrates the research potential of the MIG technology and suggests useful metrics, workload characterizations and evaluation techniques for future work in this field.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"204 ","pages":"Article 105128"},"PeriodicalIF":3.4,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144254815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-05DOI: 10.1016/j.jpdc.2025.105119
Sangeetha A․S , Shunmugan S
Blockchain systems do not rely on trust for electronic transactions and it emerged as a popular technology due to its attributes like immutability, transparency, distributed storage, and decentralized control. Student certificates and skill verification play crucial roles in job applications and other purposes. In traditional systems, certificate forgery is a common problem, especially in online education. Processes, such as issuing and verifying student certifications along with student performance prediction for higher education or job recruitment are often lengthy and time-consuming. Integrating blockchain into certificate verification protocols offers authenticity and significantly reduces processing times. Hence, this research introduced a novel secure privacy preservation-based academic certificate authentication system (CertAuthSystem) for verifying the academic certificates of students. The CertAuthSystem contains different entities, such as Student, System, University, Blockchain, and Company. The university issues certificates to students, which are stored in Blockchain, and when the student applies for a job/scholarship, he/she transmits the certificate and the blockID to the organization, based on which verification is performed. Moreover, the student’s performance is predicted by a classifier named Deep Long Short-Term Memory (DLSTM). Then, CertAuthSystem is examined for its superiority considering measures, like validation time, memory, throughput and execution time and has achieved values of 53.412 ms, 86.6 MB, 94.876 Mbps, and 73.57 ms, correspondingly for block size 7. Finally, the prediction analysis of the DLSTM classifier is done based on evaluation metrics, such as precision, recall and F measure, which attained superior values of 90.77 %, 92.99 %, and 91.86 %.
{"title":"Privacy-enabled academic certificate authentication and deep learning-based student performance prediction system using hyperledger blockchain technology","authors":"Sangeetha A․S , Shunmugan S","doi":"10.1016/j.jpdc.2025.105119","DOIUrl":"10.1016/j.jpdc.2025.105119","url":null,"abstract":"<div><div>Blockchain systems do not rely on trust for electronic transactions and it emerged as a popular technology due to its attributes like immutability, transparency, distributed storage, and decentralized control. Student certificates and skill verification play crucial roles in job applications and other purposes. In traditional systems, certificate forgery is a common problem, especially in online education. Processes, such as issuing and verifying student certifications along with student performance prediction for higher education or job recruitment are often lengthy and time-consuming. Integrating blockchain into certificate verification protocols offers authenticity and significantly reduces processing times. Hence, this research introduced a novel secure privacy preservation-based academic certificate authentication system (CertAuthSystem) for verifying the academic certificates of students. The CertAuthSystem contains different entities, such as Student, System, University, Blockchain, and Company. The university issues certificates to students, which are stored in Blockchain, and when the student applies for a job/scholarship, he/she transmits the certificate and the blockID to the organization, based on which verification is performed. Moreover, the student’s performance is predicted by a classifier named Deep Long Short-Term Memory (DLSTM). Then, CertAuthSystem is examined for its superiority considering measures, like validation time, memory, throughput and execution time and has achieved values of 53.412 ms, 86.6 MB, 94.876 Mbps, and 73.57 ms, correspondingly for block size 7. Finally, the prediction analysis of the DLSTM classifier is done based on evaluation metrics, such as precision, recall and F measure, which attained superior values of 90.77 %, 92.99 %, and 91.86 %.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"204 ","pages":"Article 105119"},"PeriodicalIF":3.4,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144289001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-05DOI: 10.1016/S0743-7315(25)00089-9
{"title":"Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)","authors":"","doi":"10.1016/S0743-7315(25)00089-9","DOIUrl":"10.1016/S0743-7315(25)00089-9","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"203 ","pages":"Article 105122"},"PeriodicalIF":3.4,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144213164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-02DOI: 10.1016/j.jpdc.2025.105118
Ibai Calero, Salvador Petit, María E. Gómez, Julio Sahuquillo
Energy efficiency has been a major concern in data centers, and the problem is exacerbated as its size continues to rise. However, the lack of tools to measure and handle this energy at a fine granularity (e.g., processor core or last-level cache) has translated into slow research advances in this topic. Understanding where (i.e., which components) and when (the point in time) energy consumption translates into minor performance improvements is of paramount importance to design any energy-aware scheduler. This paper characterizes the relationship between energy consumption and performance in a 28-core ARM ThunderX2 processor for both single-threaded and multi-threaded applications.
This paper shows that single-threaded applications with high CPU activity maintain their performance in spite of the inter-application interference at shared resources, but this comes at the expense of higher power consumption. Conversely, applications that heavily utilize the L3 cache and memory consume less power but suffer significant performance degradation as interference levels rise.
In contrast, multi-threaded applications show two distinct behaviors. On the one hand, some of them experience significant performance gains when they execute in a higher number of cores with more threads, which outweighs the increase in power consumption, leading to high energy efficiency.
{"title":"Power, energy, and performance analysis of single- and multi-threaded applications in the ARM ThunderX2","authors":"Ibai Calero, Salvador Petit, María E. Gómez, Julio Sahuquillo","doi":"10.1016/j.jpdc.2025.105118","DOIUrl":"10.1016/j.jpdc.2025.105118","url":null,"abstract":"<div><div>Energy efficiency has been a major concern in data centers, and the problem is exacerbated as its size continues to rise. However, the lack of tools to measure and handle this energy at a fine granularity (e.g., processor core or last-level cache) has translated into slow research advances in this topic. Understanding where (i.e., which components) and when (the point in time) energy consumption translates into minor performance improvements is of paramount importance to design any energy-aware scheduler. This paper characterizes the relationship between energy consumption and performance in a 28-core ARM ThunderX2 processor for both single-threaded and multi-threaded applications.</div><div>This paper shows that single-threaded applications with high CPU activity maintain their performance in spite of the inter-application interference at shared resources, but this comes at the expense of higher power consumption. Conversely, applications that heavily utilize the L3 cache and memory consume less power but suffer significant performance degradation as interference levels rise.</div><div>In contrast, multi-threaded applications show two distinct behaviors. On the one hand, some of them experience significant performance gains when they execute in a higher number of cores with more threads, which outweighs the increase in power consumption, leading to high energy efficiency.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"204 ","pages":"Article 105118"},"PeriodicalIF":3.4,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144242749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-21DOI: 10.1016/S0743-7315(25)00079-6
{"title":"Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)","authors":"","doi":"10.1016/S0743-7315(25)00079-6","DOIUrl":"10.1016/S0743-7315(25)00079-6","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105112"},"PeriodicalIF":3.4,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144105472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-21DOI: 10.1016/j.jpdc.2025.105108
Tian Chen , Yu-an Tan , Thar Baker , Haokai Wu , Qiuyu Zhang , Yuanzhang Li
By minimising arithmetic operations, Winograd convolution substantially reduces the computational complexity of convolution, a pivotal operation in the training and inference stages of Convolutional Neural Networks (CNNs). This study leverages the hardware architecture and capabilities of Shanghai Enflame Technology's AI accelerator, the General Computing Unit (GCU). We develop a code template named ConCeal for Winograd convolution with 3 × 3 kernels, employing a set of interrelated optimisations, including task partitioning, memory layout design, and parallelism. These optimisations fully exploit GCU's computing resources by optimising dataflow and parallelizing the execution of tasks on GCU cores, thereby enhancing Winograd convolution. Moreover, the integrated optimisations in the template are efficiently applicable to other operators, such as max pooling. Using this template, we implement and assess the performance of four Winograd convolution operators on GCU. The experimental results showcase that Conceal operators achieve a maximum of 2.04× and an average of 1.49× speedup compared to the fastest GEMM-based convolution implementations on GCU. Additionally, the ConCeal operators demonstrate competitive or superior computing resource utilisation in certain ResNet and VGG convolution layers when compared to cuDNN on RTX2080.
{"title":"ConCeal: A Winograd convolution code template for optimising GCU in parallel","authors":"Tian Chen , Yu-an Tan , Thar Baker , Haokai Wu , Qiuyu Zhang , Yuanzhang Li","doi":"10.1016/j.jpdc.2025.105108","DOIUrl":"10.1016/j.jpdc.2025.105108","url":null,"abstract":"<div><div>By minimising arithmetic operations, Winograd convolution substantially reduces the computational complexity of convolution, a pivotal operation in the training and inference stages of Convolutional Neural Networks (CNNs). This study leverages the hardware architecture and capabilities of Shanghai Enflame Technology's AI accelerator, the General Computing Unit (GCU). We develop a code template named ConCeal for Winograd convolution with 3 × 3 kernels, employing a set of interrelated optimisations, including task partitioning, memory layout design, and parallelism. These optimisations fully exploit GCU's computing resources by optimising dataflow and parallelizing the execution of tasks on GCU cores, thereby enhancing Winograd convolution. Moreover, the integrated optimisations in the template are efficiently applicable to other operators, such as max pooling. Using this template, we implement and assess the performance of four Winograd convolution operators on GCU. The experimental results showcase that Conceal operators achieve a maximum of 2.04× and an average of 1.49× speedup compared to the fastest GEMM-based convolution implementations on GCU. Additionally, the ConCeal operators demonstrate competitive or superior computing resource utilisation in certain ResNet and VGG convolution layers when compared to cuDNN on RTX2080.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"203 ","pages":"Article 105108"},"PeriodicalIF":3.4,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144114726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-20DOI: 10.1016/j.jpdc.2025.105107
Zdeněk Hanzálek , Ondřej Benedikt , Přemysl Šůcha , Pavel Zaykov , Michal Sojka
Multi-Processor Systems-on-Chip (MPSoC) can deliver high performance needed in many industrial domains, including aerospace. However, their high power consumption, combined with avionics safety standards, brings new thermal management challenges. This paper investigates techniques for offline thermal-aware allocation of periodic tasks on heterogeneous MPSoCs running at a fixed clock frequency, as required in avionics. The goal is to find the assignment of tasks to (i) cores and (ii) temporal isolation windows, as required in ARINC 653 standard, while minimizing the MPSoC temperature. To achieve that, we formulate a new optimization problem, we derive its NP-hardness, and we identify its subproblem solvable in polynomial time. Furthermore, we propose and analyze three power models, and integrate them within several novel optimization approaches based on heuristics, a black-box optimizer, and Integer Linear Programming (ILP). We perform the experimental evaluation on three popular MPSoC platforms (NXP i.MX8QM MEK, NXP i.MX8QM Ixora, NVIDIA TX2) and observe a difference of up to 5.5 °C among the tested methods (corresponding to a 22% reduction w.r.t. the ambient temperature). We also show that our method, integrating the empirical power model with the ILP, outperforms the other methods on all tested platforms.
{"title":"Thermal modeling and optimal allocation of avionics safety-critical tasks on heterogeneous MPSoCs","authors":"Zdeněk Hanzálek , Ondřej Benedikt , Přemysl Šůcha , Pavel Zaykov , Michal Sojka","doi":"10.1016/j.jpdc.2025.105107","DOIUrl":"10.1016/j.jpdc.2025.105107","url":null,"abstract":"<div><div>Multi-Processor Systems-on-Chip (MPSoC) can deliver high performance needed in many industrial domains, including aerospace. However, their high power consumption, combined with avionics safety standards, brings new thermal management challenges. This paper investigates techniques for offline thermal-aware allocation of periodic tasks on heterogeneous MPSoCs running at a fixed clock frequency, as required in avionics. The goal is to find the assignment of tasks to (i) cores and (ii) temporal isolation windows, as required in ARINC 653 standard, while minimizing the MPSoC temperature. To achieve that, we formulate a new optimization problem, we derive its NP-hardness, and we identify its subproblem solvable in polynomial time. Furthermore, we propose and analyze three power models, and integrate them within several novel optimization approaches based on heuristics, a black-box optimizer, and Integer Linear Programming (ILP). We perform the experimental evaluation on three popular MPSoC platforms (NXP i.MX8QM MEK, NXP i.MX8QM Ixora, NVIDIA TX2) and observe a difference of up to 5.5<!--> <!-->°C among the tested methods (corresponding to a 22% reduction w.r.t. the ambient temperature). We also show that our method, integrating the empirical power model with the ILP, outperforms the other methods on all tested platforms.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"203 ","pages":"Article 105107"},"PeriodicalIF":3.4,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144114761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Software-Defined Radio (SDR) represents a move from dedicated hardware to software implementations of digital communication standards. This approach offers flexibility, shorter time to market, maintainability, and lower costs, but it requires an optimized distribution tasks in order to meet performance requirements. Thus, we study the problem of scheduling SDR linear task chains of stateless and stateful tasks for streaming processing. We model this problem as a pipelined workflow scheduling problem based on pipelined and replicated parallelism on homogeneous resources. We propose an optimal dynamic programming solution and an optimal greedy algorithm named OTAC for maximizing throughput while also minimizing resource utilization. Moreover, the optimality of the proposed scheduling algorithm is proved. We evaluate our solutions and compare their execution times and schedules to other algorithms using synthetic task chains and an implementation of the DVB-S2 communication standard on the AFF3CT SDR Domain Specific Language. Our results demonstrate how OTAC quickly finds optimal schedules, leading consistently to better results than other algorithms, or equivalent results with fewer resources.
{"title":"Optimal scheduling algorithms for software-defined radio pipelined and replicated task chains on multicore architectures","authors":"Diane Orhan , Laércio Lima Pilla , Denis Barthou , Adrien Cassagne , Olivier Aumage , Romain Tajan , Christophe Jégo , Camille Leroux","doi":"10.1016/j.jpdc.2025.105106","DOIUrl":"10.1016/j.jpdc.2025.105106","url":null,"abstract":"<div><div>Software-Defined Radio (SDR) represents a move from dedicated hardware to software implementations of digital communication standards. This approach offers flexibility, shorter time to market, maintainability, and lower costs, but it requires an optimized distribution tasks in order to meet performance requirements. Thus, we study the problem of scheduling SDR linear task chains of stateless and stateful tasks for streaming processing. We model this problem as a pipelined workflow scheduling problem based on pipelined and replicated parallelism on homogeneous resources. We propose an optimal dynamic programming solution and an optimal greedy algorithm named OTAC for maximizing throughput while also minimizing resource utilization. Moreover, the optimality of the proposed scheduling algorithm is proved. We evaluate our solutions and compare their execution times and schedules to other algorithms using synthetic task chains and an implementation of the DVB-S2 communication standard on the AFF3CT SDR Domain Specific Language. Our results demonstrate how OTAC quickly finds optimal schedules, leading consistently to better results than other algorithms, or equivalent results with fewer resources.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"204 ","pages":"Article 105106"},"PeriodicalIF":3.4,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144195960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}