首页 > 最新文献

Journal of Systems Architecture最新文献

英文 中文
Efficient column-wise N:M pruning on RISC-V CPU RISC-V CPU上高效的按列N:M修剪
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-12-26 DOI: 10.1016/j.sysarc.2025.103667
Chi-Wei Chu, Ding-Yong Hong, Jan-Jan Wu
In deep learning frameworks, weight pruning is a widely used technique for improving computational efficiency by reducing the size of large models. This is especially critical for convolutional operators, which often act as performance bottlenecks in convolutional neural networks (CNNs). However, the effectiveness of pruning heavily depends on how it is implemented, as different methods can significantly impact both computational performance and memory footprint. In this work, we propose a column-wise N:M pruning strategy applied at the tile level and modify XNNPACK to enable efficient execution of pruned models on the RISC-V vector architecture. Additionally, we propose fusing the operations of im2col and data packing to minimize redundant memory accesses and memory overhead. To further optimize performance, we incorporate AITemplate’s profiling technique to identify the optimal implementation for each convolutional operator. Our proposed approach effectively increases ResNet inference throughput by as much as 4×, and preserves ImageNet top-1 accuracy within 2.1% of the dense baseline. The code of our framework is publicly available at https://github.com/wewe5215/AI_template_RVV_backend
在深度学习框架中,权值修剪是一种广泛使用的技术,通过减少大型模型的大小来提高计算效率。这对于卷积算子来说尤其重要,卷积算子经常成为卷积神经网络(cnn)的性能瓶颈。然而,剪枝的有效性在很大程度上取决于它是如何实现的,因为不同的方法会显著影响计算性能和内存占用。在这项工作中,我们提出了一种在tile级别应用的列式N:M修剪策略,并修改了XNNPACK,以便在RISC-V矢量架构上有效地执行修剪模型。此外,我们建议融合im2col和数据打包的操作,以尽量减少冗余内存访问和内存开销。为了进一步优化性能,我们结合了AITemplate的分析技术来确定每个卷积算子的最佳实现。我们提出的方法有效地将ResNet推理吞吐量提高了4倍,并将ImageNet top-1精度保持在密集基线的2.1%以内。我们的框架的代码可以在https://github.com/wewe5215/AI_template_RVV_backend上公开获得
{"title":"Efficient column-wise N:M pruning on RISC-V CPU","authors":"Chi-Wei Chu,&nbsp;Ding-Yong Hong,&nbsp;Jan-Jan Wu","doi":"10.1016/j.sysarc.2025.103667","DOIUrl":"10.1016/j.sysarc.2025.103667","url":null,"abstract":"<div><div>In deep learning frameworks, weight pruning is a widely used technique for improving computational efficiency by reducing the size of large models. This is especially critical for convolutional operators, which often act as performance bottlenecks in convolutional neural networks (CNNs). However, the effectiveness of pruning heavily depends on how it is implemented, as different methods can significantly impact both computational performance and memory footprint. In this work, we propose a column-wise N:M pruning strategy applied at the tile level and modify XNNPACK to enable efficient execution of pruned models on the RISC-V vector architecture. Additionally, we propose fusing the operations of im2col and data packing to minimize redundant memory accesses and memory overhead. To further optimize performance, we incorporate AITemplate’s profiling technique to identify the optimal implementation for each convolutional operator. Our proposed approach effectively increases ResNet inference throughput by as much as <span><math><mrow><mn>4</mn><mo>×</mo></mrow></math></span>, and preserves ImageNet top-1 accuracy within 2.1% of the dense baseline. The code of our framework is publicly available at <span><span>https://github.com/wewe5215/AI_template_RVV_backend</span><svg><path></path></svg></span></div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"172 ","pages":"Article 103667"},"PeriodicalIF":4.1,"publicationDate":"2025-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MxGPU: Efficient and safe communication between GPGPU applications in an OS-controlled GPGPU multiplexing environment MxGPU:在操作系统控制的GPGPU多路复用环境中,实现GPGPU应用间高效、安全的通信
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-12-24 DOI: 10.1016/j.sysarc.2025.103669
Marcel Lütke Dreimann, Olaf Spinczyk
With the growing demand for artificial intelligence and other data-intensive applications, the demand for graphics processing units (GPUs) has also increased. Even though there are many approaches on multiplexing GPUs, none of the approaches known to us enable the operating system to coherently integrate GPU resources alongside CPU resources into a holistic resource management. Due to the history of GPUs, GPU drivers are still a large, isolated part within the driver stack of operating systems. This paper aims to conduct a case study on how a multiplexing solution for GPGPUs could look like, where the OS is able to define scheduling policies for GPGPU tasks and manage GPU memory. OS-controlled GPU memory management can especially be helpful for efficient and safe communication between GPGPU applications. We will discuss and evaluate the architecture of MxGPU, which offers software-based multiplexing of integrated Intel GPUs. MxGPU has a tiny code base, which is a precondition for formal verification approaches and usage in safety-critical environments. Experiments with our prototype show that MxGPU can grant the operating system control over GPU resources while allowing more GPU sessions. Furthermore, MxGPU allows for execution of GPGPU tasks with less latency compared to Linux and enables efficient and safe communication between GPU applications.
随着对人工智能和其他数据密集型应用的需求不断增长,对图形处理单元(gpu)的需求也在增加。尽管在多路GPU上有很多方法,但我们所知道的方法都不能使操作系统将GPU资源和CPU资源一致地集成到一个整体的资源管理中。由于GPU的历史,GPU驱动程序仍然是操作系统驱动程序堆栈中一个很大的、孤立的部分。本文旨在对GPGPU的多路复用解决方案进行案例研究,其中操作系统能够为GPGPU任务定义调度策略并管理GPU内存。操作系统控制的GPU内存管理对于GPGPU应用程序之间高效和安全的通信特别有帮助。我们将讨论和评估MxGPU的架构,它提供基于软件的集成英特尔gpu的多路复用。MxGPU的代码库很小,这是正式验证方法和在安全关键环境中使用的先决条件。我们的原型实验表明,MxGPU可以在允许更多GPU会话的同时授予操作系统对GPU资源的控制。此外,与Linux相比,MxGPU允许以更少的延迟执行GPGPU任务,并在GPU应用程序之间实现高效和安全的通信。
{"title":"MxGPU: Efficient and safe communication between GPGPU applications in an OS-controlled GPGPU multiplexing environment","authors":"Marcel Lütke Dreimann,&nbsp;Olaf Spinczyk","doi":"10.1016/j.sysarc.2025.103669","DOIUrl":"10.1016/j.sysarc.2025.103669","url":null,"abstract":"<div><div>With the growing demand for artificial intelligence and other data-intensive applications, the demand for graphics processing units (GPUs) has also increased. Even though there are many approaches on multiplexing GPUs, none of the approaches known to us enable the operating system to coherently integrate GPU resources alongside CPU resources into a holistic resource management. Due to the history of GPUs, GPU drivers are still a large, isolated part within the driver stack of operating systems. This paper aims to conduct a case study on how a multiplexing solution for GPGPUs could look like, where the OS is able to define scheduling policies for GPGPU tasks and manage GPU memory. OS-controlled GPU memory management can especially be helpful for efficient and safe communication between GPGPU applications. We will discuss and evaluate the architecture of <span>MxGPU</span>, which offers software-based multiplexing of integrated Intel GPUs. <span>MxGPU</span> has a tiny code base, which is a precondition for formal verification approaches and usage in safety-critical environments. Experiments with our prototype show that <span>MxGPU</span> can grant the operating system control over GPU resources while allowing more GPU sessions. Furthermore, <span>MxGPU</span> allows for execution of GPGPU tasks with less latency compared to Linux and enables efficient and safe communication between GPU applications.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"172 ","pages":"Article 103669"},"PeriodicalIF":4.1,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145841942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Thwarting gradient inversion in federated learning via generative shadow mapping defense 通过生成阴影映射防御阻止联邦学习中的梯度反转
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-12-22 DOI: 10.1016/j.sysarc.2025.103671
Hui Zhou , Yuling Chen , Zheng Qin , Xin Deng , Ziyu Peng
Federated learning (FL) has garnered significant attention in the Artificial Intelligence of Things (AIoT) domain. It enables collaborative learning across distributed, privacy-sensitive devices without compromising their local data. However, existing research indicates that adversaries can still reconstruct the raw data by the observed gradient, resulting in a privacy breach. To further strengthen privacy in FL, various defense measures have been proposed, ranging from encryption-based and perturbation-based methods to advanced adaptive strategies. However, nearly all such defenses are applied directly to raw data or gradients, where the private information inherently resides. This intrinsic presence of sensitive data inevitably leaves FL vulnerable to privacy leakage. Thus, a new defense that can “erase” private information is urgently needed. In this paper, we propose Shade, a shadow mapping defense framework against gradient inversion attack using generative models in FL. We implement two instances of manifold defense methods based on a generative adversarial networks and diffusion models, ShadeGAN and ShadeDiff. In particular, we first generate alternative shadow data to involve in model training. Subsequently, we construct surrogate model to replace the raw model, eliminating the memory of the raw model. Finally, an optional gradient protection mechanism is provided, which operates by mapping raw gradients to their shadow counterparts. Extensive experiment demonstrates that our scheme can prevent adversaries from reconstructing raw data, effectively reducing the risk of FL privacy disclosure.
联邦学习(FL)在物联网人工智能(AIoT)领域引起了广泛的关注。它支持跨分布式、隐私敏感设备的协作学习,而不会损害其本地数据。然而,现有研究表明,攻击者仍然可以通过观察到的梯度重建原始数据,从而导致隐私泄露。为了进一步加强FL中的隐私,人们提出了各种防御措施,从基于加密和基于扰动的方法到高级自适应策略。然而,几乎所有此类防御都直接应用于原始数据或梯度,其中包含私有信息。这种敏感数据的固有存在不可避免地使FL容易受到隐私泄露的影响。因此,迫切需要一种可以“擦除”私人信息的新防御手段。在本文中,我们提出了阴影映射防御框架Shade,使用FL中的生成模型来对抗梯度反转攻击。我们实现了两个基于生成对抗网络和扩散模型的流形防御方法实例,ShadeGAN和ShadeDiff。特别是,我们首先生成替代阴影数据来参与模型训练。随后,我们构建代理模型来代替原始模型,消除了原始模型的内存。最后,提供了一个可选的渐变保护机制,它通过将原始渐变映射到对应的阴影渐变来操作。大量实验表明,我们的方案可以防止对手重构原始数据,有效降低FL隐私泄露的风险。
{"title":"Thwarting gradient inversion in federated learning via generative shadow mapping defense","authors":"Hui Zhou ,&nbsp;Yuling Chen ,&nbsp;Zheng Qin ,&nbsp;Xin Deng ,&nbsp;Ziyu Peng","doi":"10.1016/j.sysarc.2025.103671","DOIUrl":"10.1016/j.sysarc.2025.103671","url":null,"abstract":"<div><div>Federated learning (FL) has garnered significant attention in the Artificial Intelligence of Things (AIoT) domain. It enables collaborative learning across distributed, privacy-sensitive devices without compromising their local data. However, existing research indicates that adversaries can still reconstruct the raw data by the observed gradient, resulting in a privacy breach. To further strengthen privacy in FL, various defense measures have been proposed, ranging from encryption-based and perturbation-based methods to advanced adaptive strategies. However, nearly all such defenses are applied directly to raw data or gradients, where the private information inherently resides. This intrinsic presence of sensitive data inevitably leaves FL vulnerable to privacy leakage. Thus, a new defense that can <em>“erase”</em> private information is urgently needed. In this paper, we propose <em>Shade</em>, a shadow mapping defense framework against gradient inversion attack using generative models in FL. We implement two instances of manifold defense methods based on a generative adversarial networks and diffusion models, <em>ShadeGAN</em> and <em>ShadeDiff</em>. In particular, we first generate alternative shadow data to involve in model training. Subsequently, we construct surrogate model to replace the raw model, eliminating the memory of the raw model. Finally, an optional gradient protection mechanism is provided, which operates by mapping raw gradients to their shadow counterparts. Extensive experiment demonstrates that our scheme can prevent adversaries from reconstructing raw data, effectively reducing the risk of FL privacy disclosure.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"172 ","pages":"Article 103671"},"PeriodicalIF":4.1,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145841939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ARAS: Adaptive low-cost ReRAM-based accelerator for DNNs ARAS:用于深度神经网络的自适应低成本rram加速器
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-12-20 DOI: 10.1016/j.sysarc.2025.103668
Mohammad Sabri, Marc Riera, Antonio González
Processing Using Memory (PUM) accelerators have the potential to perform Deep Neural Network (DNN) inference by using arrays of memory cells as computation engines. Among various memory technologies, ReRAM crossbars show promising performance in computing dot-product operations in the analog domain. Nevertheless, the expensive writing procedure of ReRAM cells has led researchers to design accelerators whose crossbars have enough capacity to store the full DNN. Given the tremendous and continuous increase in DNN model sizes, this approach is unfeasible for some networks, or inefficient due to the huge hardware requirements. Those accelerators lack the flexibility to adapt to any given DNN model, facing an adaptability challenge.
To address this issue we introduce ARAS, a cost-effective ReRAM-based accelerator that employs an offline scheduler to adapt different DNNs to the resource-limited hardware. ARAS also overlaps the computation of a layer with the weight writing of several layers to mitigate the high writing latency of ReRAM. Furthermore, ARAS introduces three optimizations aimed at reducing the energy overheads of writing in ReRAM. Our key optimization capitalizes on the observation that DNN weights can be re-encoded to augment their similarity between layers, increasing the amount of bitwise values that are equal or similar when overwriting ReRAM cells and, hence, reducing the amount of energy required to update the cells. Overall, ARAS greatly reduces the ReRAM writing activity. We evaluate ARAS on a popular set of DNNs. ARAS provides up to 2.2× speedup and 45% energy savings over a baseline PUM accelerator without any optimization. Compared to a TPU-like accelerator, ARAS provides up to 1.5× speedup and 62% energy savings.
使用内存处理(PUM)加速器有可能通过使用存储单元阵列作为计算引擎来执行深度神经网络(DNN)推理。在各种存储技术中,ReRAM交叉棒在模拟域计算点积运算方面表现出良好的性能。然而,昂贵的ReRAM细胞写入过程促使研究人员设计出具有足够容量存储完整深度神经网络的加速器。考虑到DNN模型规模的巨大和持续增长,这种方法对于某些网络是不可行的,或者由于巨大的硬件要求而效率低下。这些加速器缺乏适应任何给定DNN模型的灵活性,面临适应性挑战。为了解决这个问题,我们引入了ARAS,这是一种经济高效的基于reram的加速器,它采用离线调度器来适应不同的dnn以适应资源有限的硬件。ARAS还将一个层的计算与多个层的权重写入重叠,以减轻ReRAM的高写入延迟。此外,ARAS引入了三种优化,旨在减少在ReRAM中写入的能量开销。我们的关键优化利用了DNN权重可以被重新编码以增加层之间的相似性的观察结果,增加了覆盖ReRAM单元时相等或相似的位值的数量,从而减少了更新单元所需的能量。总的来说,ARAS大大减少了ReRAM写入活动。我们在一组流行的dnn上评估ARAS。在没有任何优化的情况下,与基准PUM加速器相比,ARAS提供高达2.2倍的加速和45%的节能。与类似tpu的加速器相比,ARAS提供高达1.5倍的加速和62%的节能。
{"title":"ARAS: Adaptive low-cost ReRAM-based accelerator for DNNs","authors":"Mohammad Sabri,&nbsp;Marc Riera,&nbsp;Antonio González","doi":"10.1016/j.sysarc.2025.103668","DOIUrl":"10.1016/j.sysarc.2025.103668","url":null,"abstract":"<div><div>Processing Using Memory (PUM) accelerators have the potential to perform Deep Neural Network (DNN) inference by using arrays of memory cells as computation engines. Among various memory technologies, ReRAM crossbars show promising performance in computing dot-product operations in the analog domain. Nevertheless, the expensive writing procedure of ReRAM cells has led researchers to design accelerators whose crossbars have enough capacity to store the full DNN. Given the tremendous and continuous increase in DNN model sizes, this approach is unfeasible for some networks, or inefficient due to the huge hardware requirements. Those accelerators lack the flexibility to adapt to any given DNN model, facing an <em>adaptability</em> challenge.</div><div>To address this issue we introduce ARAS, a cost-effective ReRAM-based accelerator that employs an offline scheduler to adapt different DNNs to the resource-limited hardware. ARAS also overlaps the computation of a layer with the weight writing of several layers to mitigate the high writing latency of ReRAM. Furthermore, ARAS introduces three optimizations aimed at reducing the energy overheads of writing in ReRAM. Our key optimization capitalizes on the observation that DNN weights can be re-encoded to augment their similarity between layers, increasing the amount of bitwise values that are equal or similar when overwriting ReRAM cells and, hence, reducing the amount of energy required to update the cells. Overall, ARAS greatly reduces the ReRAM writing activity. We evaluate ARAS on a popular set of DNNs. ARAS provides up to <span><math><mrow><mn>2</mn><mo>.</mo><mn>2</mn><mo>×</mo></mrow></math></span> speedup and 45% energy savings over a baseline PUM accelerator without any optimization. Compared to a TPU-like accelerator, ARAS provides up to <span><math><mrow><mn>1</mn><mo>.</mo><mn>5</mn><mo>×</mo></mrow></math></span> speedup and 62% energy savings.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"172 ","pages":"Article 103668"},"PeriodicalIF":4.1,"publicationDate":"2025-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Effective reinforcement learning-based dynamic flexible job shop scheduling using two-stage dispatching 基于两阶段调度的有效强化学习动态柔性车间调度
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-12-19 DOI: 10.1016/j.sysarc.2025.103664
Jiepin Ding , Jun Xia , Yutong Ye, Mingsong Chen
Deep Reinforcement Learning (DRL) has been recognized as a promising means for solving the Dynamic Flexible Job Shop Scheduling Problem (DFJSP), where involved jobs have both distinct start time and due dates. However, due to the improper DRL modeling of scheduling components, it is hard to guarantee the quality (e.g., makespan, resource utilization) of job-to-machine dispatching solutions for DFJSP. This is mainly because (i) most existing DRL-based methods design actions as composite rules by combining both the processes of operation sequencing and machine assignment together, which will inevitably limit their adaptability to ever-changing scheduling scenarios; and (ii) without considering knowledge sharing among DRL network nodes, the learned policy networks with bounded sizes cannot be applied to complex large-scale scheduling problems. To address this problem, this paper introduces a novel DRL-based two-stage dispatching method that can effectively solve the DFJSP to achieve scheduling solutions of better quality. In our approach, the first stage utilizes a graph neural network-based policy network to facilitate optimal operation selection at each dispatching point. Since the policy network is size-agnostic and can share knowledge among DRL network nodes through graph embedding, it can handle DFJSP instances of varying scales. For the second stage, by decoupling the dependencies between operations and machines, we propose an effective machine selection heuristic that can derive more dispatching rules to improve the adaptability of DRL to various complex dynamic scheduling scenarios. Comprehensive experimental results demonstrate the superiority of our approach over state-of-the-art methods from both the perspective of scheduling solution quality and the adaptability of learned DRL models.
深度强化学习(DRL)被认为是解决动态灵活作业车间调度问题(DFJSP)的一种有前途的方法,其中涉及的作业具有不同的开始时间和截止日期。然而,由于调度组件的DRL建模不当,使得DFJSP的job-to-machine调度解决方案的质量(如makespan、资源利用率)难以保证。这主要是因为(1)大多数现有的基于drl的方法将操作排序和机器分配过程结合在一起,将动作设计为复合规则,这将不可避免地限制其对不断变化的调度场景的适应性;(ii)由于不考虑DRL网络节点之间的知识共享,所学习的有界大小策略网络无法应用于复杂的大规模调度问题。针对这一问题,本文提出了一种新的基于drl的两阶段调度方法,该方法可以有效地解决DFJSP问题,从而获得更高质量的调度方案。在我们的方法中,第一阶段利用基于图神经网络的策略网络来促进每个调度点的最优操作选择。由于策略网络是大小不可知的,并且可以通过图嵌入在DRL网络节点之间共享知识,因此可以处理不同规模的DFJSP实例。第二阶段,通过解耦操作与机器之间的依赖关系,提出了一种有效的机器选择启发式算法,该算法可以推导出更多的调度规则,以提高DRL对各种复杂动态调度场景的适应性。综合实验结果表明,我们的方法在调度解决方案质量和学习到的DRL模型的适应性方面都优于目前最先进的方法。
{"title":"Effective reinforcement learning-based dynamic flexible job shop scheduling using two-stage dispatching","authors":"Jiepin Ding ,&nbsp;Jun Xia ,&nbsp;Yutong Ye,&nbsp;Mingsong Chen","doi":"10.1016/j.sysarc.2025.103664","DOIUrl":"10.1016/j.sysarc.2025.103664","url":null,"abstract":"<div><div>Deep Reinforcement Learning (DRL) has been recognized as a promising means for solving the Dynamic Flexible Job Shop Scheduling Problem (DFJSP), where involved jobs have both distinct start time and due dates. However, due to the improper DRL modeling of scheduling components, it is hard to guarantee the quality (e.g., makespan, resource utilization) of job-to-machine dispatching solutions for DFJSP. This is mainly because (i) most existing DRL-based methods design actions as composite rules by combining both the processes of operation sequencing and machine assignment together, which will inevitably limit their adaptability to ever-changing scheduling scenarios; and (ii) without considering knowledge sharing among DRL network nodes, the learned policy networks with bounded sizes cannot be applied to complex large-scale scheduling problems. To address this problem, this paper introduces a novel DRL-based two-stage dispatching method that can effectively solve the DFJSP to achieve scheduling solutions of better quality. In our approach, the first stage utilizes a graph neural network-based policy network to facilitate optimal operation selection at each dispatching point. Since the policy network is size-agnostic and can share knowledge among DRL network nodes through graph embedding, it can handle DFJSP instances of varying scales. For the second stage, by decoupling the dependencies between operations and machines, we propose an effective machine selection heuristic that can derive more dispatching rules to improve the adaptability of DRL to various complex dynamic scheduling scenarios. Comprehensive experimental results demonstrate the superiority of our approach over state-of-the-art methods from both the perspective of scheduling solution quality and the adaptability of learned DRL models.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"172 ","pages":"Article 103664"},"PeriodicalIF":4.1,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145841941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
WASP: Stack protection for WebAssembly WASP: WebAssembly的堆栈保护
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-12-17 DOI: 10.1016/j.sysarc.2025.103666
Ewan Massey, Pierre Olivier
WebAssembly is a binary executable format designed as a compilation target enabling high-level language code to be run natively in web browsers, JavaScript runtimes, and standalone interpreters. Previous work has highlighted WebAssembly’s vulnerability to traditional memory exploits, such as stack smashing (stack-based buffer overflows), when compiled from memory-unsafe languages. Such vulnerabilities are used as a component in impactful end-to-end exploits, hence the design and implementation in WebAssembly of mitigations against memory exploits, such as stack canaries, is needed. We present WASP, an implementation of stack-based buffer overflow protection using stack canaries within Emscripten, the leading C and C++ to WebAssembly compiler. Further, we provide an extension to the standard stack smashing protection design, offering extra security against canary leak attacks by randomizing the canary on a per-function call basis. We verify WASP’s effectiveness against proof-of-concept exploits. Evaluation results show that the overheads brought by WASP on execution time, executable binary size, and compilation workflow are negligible to low in all platforms considered: the Chromium web browser, the Node.js JavaScript runtime, as well as the standalone WebAssembly runtimes Wasmer and WAVM.
WebAssembly是一种二进制可执行格式,设计为编译目标,使高级语言代码能够在web浏览器、JavaScript运行时和独立解释器中本机运行。以前的工作强调了WebAssembly在使用内存不安全的语言编译时容易受到传统内存漏洞的攻击,例如堆栈破坏(基于堆栈的缓冲区溢出)。这些漏洞被用作有影响的端到端攻击的组件,因此需要在WebAssembly中设计和实现针对内存攻击的缓解措施,例如堆栈金丝雀。我们提出了WASP,这是一个基于堆栈的缓冲区溢出保护的实现,使用Emscripten中的堆栈金丝雀,Emscripten是领先的C和c++到WebAssembly编译器。此外,我们提供了对标准堆栈破坏保护设计的扩展,通过在每个函数调用的基础上随机化金丝雀,提供额外的安全性来防止金丝雀泄漏攻击。我们针对概念验证漏洞验证WASP的有效性。评估结果表明,WASP在执行时间、可执行二进制文件大小和编译工作流方面带来的开销在所有考虑的平台上都可以忽略不计,甚至很低:Chromium web浏览器、Node.js JavaScript运行时,以及独立的WebAssembly运行时Wasmer和WAVM。
{"title":"WASP: Stack protection for WebAssembly","authors":"Ewan Massey,&nbsp;Pierre Olivier","doi":"10.1016/j.sysarc.2025.103666","DOIUrl":"10.1016/j.sysarc.2025.103666","url":null,"abstract":"<div><div>WebAssembly is a binary executable format designed as a compilation target enabling high-level language code to be run natively in web browsers, JavaScript runtimes, and standalone interpreters. Previous work has highlighted WebAssembly’s vulnerability to traditional memory exploits, such as stack smashing (stack-based buffer overflows), when compiled from memory-unsafe languages. Such vulnerabilities are used as a component in impactful end-to-end exploits, hence the design and implementation in WebAssembly of mitigations against memory exploits, such as stack canaries, is needed. We present WASP, an implementation of stack-based buffer overflow protection using stack canaries within Emscripten, the leading C and C<span>++</span> to WebAssembly compiler. Further, we provide an extension to the standard stack smashing protection design, offering extra security against canary leak attacks by randomizing the canary on a per-function call basis. We verify WASP’s effectiveness against proof-of-concept exploits. Evaluation results show that the overheads brought by WASP on execution time, executable binary size, and compilation workflow are negligible to low in all platforms considered: the Chromium web browser, the Node.js JavaScript runtime, as well as the standalone WebAssembly runtimes Wasmer and WAVM.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"172 ","pages":"Article 103666"},"PeriodicalIF":4.1,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145841940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ECDPA: An enhanced concurrent differentially private algorithm in electric vehicles for parallel queries ECDPA:一种用于电动汽车并行查询的增强并发差分私有算法
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-12-16 DOI: 10.1016/j.sysarc.2025.103665
Mohsin Ali , Muneeb Ul Hassan , Pei-Wei Tsai , Jinjun Chen
As the adoption of electric vehicles (EVs) has skyrocketed in the past few decades, data-dependent services integrated into charging stations (CS) raise additional alarming concerns. Adversaries exploiting the privacy of individuals have been taken care of extensively by deploying techniques such as differential privacy (DP) and encryption-based approaches. However, these previous approaches worked effectively with sequential or single query, but were not useful for parallel queries. This paper proposed a novel and interactive approach termed CDP-INT, which aimed to tackle the multiple queries targeted at the same dataset, precluding exploitation of sensitive information of the user. This proposed mechanism is effectively tailored for EVs and CS in which the total privacy budget ϵ is distributed among a number of parallel queries. This research ensures the robust protection of privacy in response to multiple queries, maintaining the optimum trade-off between utility and privacy by implementing dynamic allocation of the ϵ in a concurrent model. Furthermore, the experimental evaluation section showcased the efficacy of CDP-INT in comparison to other approaches working on the sequential mechanism to tackle the queries. Thus, the experimental evaluation has also vouched that CDP-INT is a viable solution offering privacy to sensitive information in response to multiple queries.
在过去的几十年里,随着电动汽车(ev)的普及,与充电站(CS)相结合的数据依赖服务引发了额外的担忧。利用个人隐私的攻击者已经通过部署差分隐私(DP)和基于加密的方法等技术得到了广泛的关注。但是,这些以前的方法对顺序查询或单个查询有效,但对并行查询无效。本文提出了一种新的交互式方法,称为CDP-INT,旨在解决针对同一数据集的多个查询,防止利用用户的敏感信息。这种提议的机制有效地为ev和CS量身定制,其中总隐私预算λ分布在许多并行查询中。本研究确保了在响应多个查询时对隐私的鲁棒保护,通过在并发模型中实现动态分配的御柱,保持了效用和隐私之间的最佳权衡。此外,实验评估部分展示了与使用顺序机制处理查询的其他方法相比,CDP-INT的有效性。因此,实验评估也证明了CDP-INT是一种可行的解决方案,可以在响应多个查询时为敏感信息提供隐私。
{"title":"ECDPA: An enhanced concurrent differentially private algorithm in electric vehicles for parallel queries","authors":"Mohsin Ali ,&nbsp;Muneeb Ul Hassan ,&nbsp;Pei-Wei Tsai ,&nbsp;Jinjun Chen","doi":"10.1016/j.sysarc.2025.103665","DOIUrl":"10.1016/j.sysarc.2025.103665","url":null,"abstract":"<div><div>As the adoption of electric vehicles (EVs) has skyrocketed in the past few decades, data-dependent services integrated into charging stations (CS) raise additional alarming concerns. Adversaries exploiting the privacy of individuals have been taken care of extensively by deploying techniques such as differential privacy (DP) and encryption-based approaches. However, these previous approaches worked effectively with sequential or single query, but were not useful for parallel queries. This paper proposed a novel and interactive approach termed <em>CDP-INT</em>, which aimed to tackle the multiple queries targeted at the same dataset, precluding exploitation of sensitive information of the user. This proposed mechanism is effectively tailored for EVs and CS in which the total privacy budget <span><math><mi>ϵ</mi></math></span> is distributed among a number of parallel queries. This research ensures the robust protection of privacy in response to multiple queries, maintaining the optimum trade-off between utility and privacy by implementing dynamic allocation of the <span><math><mi>ϵ</mi></math></span> in a concurrent model. Furthermore, the experimental evaluation section showcased the efficacy of CDP-INT in comparison to other approaches working on the sequential mechanism to tackle the queries. Thus, the experimental evaluation has also vouched that CDP-INT is a viable solution offering privacy to sensitive information in response to multiple queries.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"172 ","pages":"Article 103665"},"PeriodicalIF":4.1,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145760624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GainP: A Gain Cell Embedded DRAM-based associative in-memory processor GainP:一种基于增益单元嵌入式dram的内存关联处理器
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-12-15 DOI: 10.1016/j.sysarc.2025.103632
Yaniv Levi, Odem Harel, Adam Teman, Leonid Yavits
Associative processors (APs) are massively-parallel in-memory SIMD accelerators. While fairly well-known, APs have been revisited in recent years due to the proliferation of data-centric computing, and specifically, processing using memory. APs are based on Content Addressable Memory and utilize its unique ability to simultaneously search the entire memory content for a query pattern to implement massively parallel computations in memory. Several memory infrastructures have been considered for associative processing, including static CMOS, resistive, magnetoresistive, ferroelectric and even NAND flash memories. While all of these have certain merits (speed and low energy consumption for static CMOS, density for resistive and ferroelectric memories), they also face challenges (low density for static CMOS and magnetoresistive, limited write endurance and high write energy for resistive and ferroelectric memories), which limit the scalability and usefulness of APs. This work introduces GainP, an AP based on silicon-proven Gain Cell embedded DRAM (GCeDRAM). The latter combines relatively high density (compared to static CMOS memory) with low energy, high speed, practically unlimited endurance and low production costs (compared to emerging memory technologies). Using sparse-by-sparse matrix multiplication, we show that GainP outperforms high-performance CPU and GPU by 825× and 41×. We also show that GainP outperforms state-of-the-art processing-in-memory sparse matrix multiplication accelerators GAS, OuterSPACE and MatRaptor by 128×, 125× and 16×, respectively, and provides average energy benefits of 96×, 95× and 15×, respectively.
关联处理器(ap)是内存中大规模并行的SIMD加速器。虽然众所周知,但近年来,由于以数据为中心的计算(特别是使用内存的处理)的普及,ap重新受到了关注。ap基于内容可寻址内存,并利用其独特的能力来同时搜索整个内存内容以获得查询模式,从而在内存中实现大规模并行计算。一些存储基础设施已被考虑用于联想处理,包括静态CMOS,电阻,磁阻,铁电甚至NAND闪存。虽然所有这些都有一定的优点(静态CMOS的速度和低能耗,电阻和铁电存储器的密度),但它们也面临挑战(静态CMOS和磁阻存储器的低密度,电阻和铁电存储器的有限写入耐久性和高写入能量),这限制了ap的可扩展性和实用性。这项工作介绍了GainP,一种基于硅验证的增益单元嵌入式DRAM (GCeDRAM)的AP。后者结合了相对较高的密度(与静态CMOS存储器相比),低能量,高速度,几乎无限的耐用性和低生产成本(与新兴存储器技术相比)。使用稀疏矩阵乘法,我们表明GainP比高性能CPU和GPU分别高出825倍和41倍。我们还表明,GainP比最先进的内存中稀疏矩阵乘法处理加速器GAS、OuterSPACE和MatRaptor分别高出128倍、125倍和16倍,平均能量效益分别为96倍、95倍和15倍。
{"title":"GainP: A Gain Cell Embedded DRAM-based associative in-memory processor","authors":"Yaniv Levi,&nbsp;Odem Harel,&nbsp;Adam Teman,&nbsp;Leonid Yavits","doi":"10.1016/j.sysarc.2025.103632","DOIUrl":"10.1016/j.sysarc.2025.103632","url":null,"abstract":"<div><div>Associative processors (APs) are massively-parallel in-memory SIMD accelerators. While fairly well-known, APs have been revisited in recent years due to the proliferation of data-centric computing, and specifically, processing using memory. APs are based on Content Addressable Memory and utilize its unique ability to simultaneously search the entire memory content for a query pattern to implement massively parallel computations in memory. Several memory infrastructures have been considered for associative processing, including static CMOS, resistive, magnetoresistive, ferroelectric and even NAND flash memories. While all of these have certain merits (speed and low energy consumption for static CMOS, density for resistive and ferroelectric memories), they also face challenges (low density for static CMOS and magnetoresistive, limited write endurance and high write energy for resistive and ferroelectric memories), which limit the scalability and usefulness of APs. This work introduces GainP, an AP based on silicon-proven Gain Cell embedded DRAM (GCeDRAM). The latter combines relatively high density (compared to static CMOS memory) with low energy, high speed, practically unlimited endurance and low production costs (compared to emerging memory technologies). Using sparse-by-sparse matrix multiplication, we show that GainP outperforms high-performance CPU and GPU by <span><math><mrow><mn>825</mn><mo>×</mo></mrow></math></span> and <span><math><mrow><mn>41</mn><mo>×</mo></mrow></math></span>. We also show that GainP outperforms state-of-the-art processing-in-memory sparse matrix multiplication accelerators GAS, OuterSPACE and MatRaptor by <span><math><mrow><mn>128</mn><mo>×</mo></mrow></math></span>, <span><math><mrow><mn>125</mn><mo>×</mo></mrow></math></span> and <span><math><mrow><mn>16</mn><mo>×</mo></mrow></math></span>, respectively, and provides average energy benefits of <span><math><mrow><mn>96</mn><mo>×</mo></mrow></math></span>, <span><math><mrow><mn>95</mn><mo>×</mo></mrow></math></span> and <span><math><mrow><mn>15</mn><mo>×</mo></mrow></math></span>, respectively.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"172 ","pages":"Article 103632"},"PeriodicalIF":4.1,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145841943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeSpa: Heterogeneous multi-core accelerators for energy-efficient dense and sparse computation at the tile level in Deep Neural Networks 基于异构多核加速器的深度神经网络层级高效密集稀疏计算
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-12-12 DOI: 10.1016/j.sysarc.2025.103650
Hyungjun Jang , Dongho Ha , Hyunwuk Lee , Won Woo Ro
The rapid evolution of Deep Neural Networks (DNNs) has driven significant advances in Domain-Specific Accelerators (DSAs). However, efficiently exploiting DSAs across diverse workloads remains challenging because complementary techniques—from sparsity-aware computation to system-level innovations such as multi-core architectures—have progressed independently. Our analysis reveals pronounced tile-level sparsity variations within the DNNs, which cause efficiency fluctuations on homogeneous accelerators built solely from dense or sparsity-oriented cores. To address this challenge, we present DeSpa, a novel heterogeneous multi-core accelerator architecture that integrates both dense and sparse cores to dynamically adapt to tile-level sparsity variations. DeSpa is paired with a heterogeneity-aware scheduler that employs a tile-stealing mechanism to maximize core utilization and minimize idle time. Compared to a homogeneous sparse multi-core baseline, DeSpa reduces energy consumption by 33% and improves energy-delay product (EDP) by 14%, albeit at the cost of a 35% latency increase. Relative to a homogeneous dense baseline, it reduces EDP by 44%, cuts energy consumption by 42%, and delivers a 1.34× speed-up.
深度神经网络(dnn)的快速发展推动了特定领域加速器(dsa)的重大进展。然而,跨不同工作负载有效地利用dsa仍然具有挑战性,因为互补技术——从稀疏感知计算到系统级创新(如多核体系结构)——已经独立发展。我们的分析揭示了dnn内明显的瓷砖级稀疏性变化,这导致仅由密集或稀疏性导向核心构建的均匀加速器的效率波动。为了应对这一挑战,我们提出了DeSpa,这是一种新型的异构多核加速器架构,它集成了密集核和稀疏核,以动态适应瓷砖级稀疏度变化。DeSpa与一个异构感知调度器配合使用,该调度器使用一种磁片窃取机制来最大化核心利用率并最小化空闲时间。与同质稀疏多核基线相比,DeSpa减少了33%的能耗,并将能量延迟产品(EDP)提高了14%,但代价是延迟增加了35%。相对于均匀的密集基线,它可以降低44%的EDP,降低42%的能耗,并提供1.34倍的加速。
{"title":"DeSpa: Heterogeneous multi-core accelerators for energy-efficient dense and sparse computation at the tile level in Deep Neural Networks","authors":"Hyungjun Jang ,&nbsp;Dongho Ha ,&nbsp;Hyunwuk Lee ,&nbsp;Won Woo Ro","doi":"10.1016/j.sysarc.2025.103650","DOIUrl":"10.1016/j.sysarc.2025.103650","url":null,"abstract":"<div><div>The rapid evolution of Deep Neural Networks (DNNs) has driven significant advances in Domain-Specific Accelerators (DSAs). However, efficiently exploiting DSAs across diverse workloads remains challenging because complementary techniques—from sparsity-aware computation to system-level innovations such as multi-core architectures—have progressed independently. Our analysis reveals pronounced tile-level sparsity variations within the DNNs, which cause efficiency fluctuations on homogeneous accelerators built solely from dense or sparsity-oriented cores. To address this challenge, we present DeSpa, a novel heterogeneous multi-core accelerator architecture that integrates both dense and sparse cores to dynamically adapt to tile-level sparsity variations. DeSpa is paired with a heterogeneity-aware scheduler that employs a tile-stealing mechanism to maximize core utilization and minimize idle time. Compared to a homogeneous sparse multi-core baseline, DeSpa reduces energy consumption by 33% and improves energy-delay product (EDP) by 14%, albeit at the cost of a 35% latency increase. Relative to a homogeneous dense baseline, it reduces EDP by 44%, cuts energy consumption by 42%, and delivers a <span><math><mrow><mn>1</mn><mo>.</mo><mn>34</mn><mo>×</mo></mrow></math></span> speed-up.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"172 ","pages":"Article 103650"},"PeriodicalIF":4.1,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145792174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dependency-aware microservices offloading in ICN-based edge computing testbed 基于icn边缘计算试验台的依赖感知微服务卸载
IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-12-08 DOI: 10.1016/j.sysarc.2025.103663
Muhammad Nadeem Ali , Ihsan Ullah , Muhammad Imran , Muhammad Salah ud din , Byung-Seo Kim
Information-Centric Networking (ICN)-based edge computing has demonstrated remarkable potential in meeting ultra-low latency and reliable communication for offloading compute-intensive applications. Such applications are often composed of interdependent microservices that demand abundant communication and intensive computing resources. To avoid dependency conflict, these microservices are typically arranged in a predefined sequence prior to offloading; however, this introduces waiting time for each microservice in the sequence. This paper presents an ICN-edge computing-based testbed framework to demonstrate the practical applicability of a study named IFCNS, which proposes a unique solution to reduce the offloading time of dependent microservices compared to an existing scheme, named OTOOA. In the testbed, the IFCNS and OTOOA schemes are implemented on the Raspberry Pi devices, Named Data Network (NDN) codebase in a Python script. Furthermore, this paper outlined the comprehensive testbed development procedure, including hardware and software configuration. To evaluate the effectiveness of the IFCNS scheme, modifications are applied to the NDN naming, microservice tracking functions, and forwarding strategy. The experimental results corroborate the effectiveness of the IFCNS as compared to OTOOA, demonstrating superior performance in time consumption, average interest satisfaction delay, energy consumption, FIB table load, and average naming overhead.
基于信息中心网络(ICN)的边缘计算在满足超低延迟和可靠通信以卸载计算密集型应用方面显示出巨大的潜力。此类应用程序通常由相互依赖的微服务组成,这些微服务需要大量的通信和密集的计算资源。为了避免依赖冲突,这些微服务通常在卸载之前按照预定义的顺序排列;但是,这会引入序列中每个微服务的等待时间。本文提出了一个基于icn边缘计算的测试平台框架,以证明名为IFCNS的研究的实际适用性,该研究提出了一种独特的解决方案,与现有的名为OTOOA的方案相比,可以减少依赖微服务的卸载时间。在测试平台中,IFCNS和OTOOA方案在树莓派设备上实现,命名数据网络(NDN)代码库在Python脚本中实现。此外,本文还概述了综合试验台的开发过程,包括硬件配置和软件配置。为了评估IFCNS方案的有效性,对NDN命名、微服务跟踪功能和转发策略进行了修改。实验结果证实了IFCNS与OTOOA相比的有效性,在时间消耗、平均兴趣满足延迟、能量消耗、FIB表负载和平均命名开销方面表现出更高的性能。
{"title":"Dependency-aware microservices offloading in ICN-based edge computing testbed","authors":"Muhammad Nadeem Ali ,&nbsp;Ihsan Ullah ,&nbsp;Muhammad Imran ,&nbsp;Muhammad Salah ud din ,&nbsp;Byung-Seo Kim","doi":"10.1016/j.sysarc.2025.103663","DOIUrl":"10.1016/j.sysarc.2025.103663","url":null,"abstract":"<div><div>Information-Centric Networking (ICN)-based edge computing has demonstrated remarkable potential in meeting ultra-low latency and reliable communication for offloading compute-intensive applications. Such applications are often composed of interdependent microservices that demand abundant communication and intensive computing resources. To avoid dependency conflict, these microservices are typically arranged in a predefined sequence prior to offloading; however, this introduces waiting time for each microservice in the sequence. This paper presents an ICN-edge computing-based testbed framework to demonstrate the practical applicability of a study named IFCNS, which proposes a unique solution to reduce the offloading time of dependent microservices compared to an existing scheme, named OTOOA. In the testbed, the IFCNS and OTOOA schemes are implemented on the Raspberry Pi devices, Named Data Network (NDN) codebase in a Python script. Furthermore, this paper outlined the comprehensive testbed development procedure, including hardware and software configuration. To evaluate the effectiveness of the IFCNS scheme, modifications are applied to the NDN naming, microservice tracking functions, and forwarding strategy. The experimental results corroborate the effectiveness of the IFCNS as compared to OTOOA, demonstrating superior performance in time consumption, average interest satisfaction delay, energy consumption, FIB table load, and average naming overhead.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"171 ","pages":"Article 103663"},"PeriodicalIF":4.1,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145748386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Systems Architecture
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1