Journal of Systems Architecture最新文献_第4页

BR-FEEL: A backdoor resilient approach for federated edge learning with fragment-sharing BR-FEEL：利用片段共享实现联合边缘学习的后门弹性方法

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Systems Architecture

Pub Date : 2024-08-09 DOI: 10.1016/j.sysarc.2024.103258

Senmao Qi , Hao Ma , Yifei Zou , Yuan Yuan , Peng Li , Dongxiao Yu

In the resource-constrained federated edge learning (FEEL) systems, fragment-sharing is an efficient approach for multiple clients to cooperatively train a giant model with billions of parameters. Compared with the classical federated learning schemes where the local model is fully trained and exchanged by each client, the fragment-sharing only requires each client to optionally choose a parameter-fragment to train and share, according to its storage, computing, and networking abilities. However, when the full model is no longer delivered in fragment-sharing, the backdoor attacks hidden behind the fragments become harder to be detected, which introduces formidable challenge for the security of FEEL systems. In this paper, we firstly show that the existing fragment-sharing works suffer a lot from the backdoor attacks. Then, a Backdoor-Resilient approach, named BR-FEEL, is introduced to defend against the potential backdoor attacks. Specifically, a twin model is built by each benign client to integrate the parameter-fragments from others. A knowledge distillation process is designed on each client to transfer the clean knowledge from its twin model to local model. With the twin model and knowledge distillation process, our BR-FEEL approach makes sure that the local models of the benign clients will not be backdoored. Experiments on CIFAR-10 and GTSRB datasets with MobileNetV2 and ResNet-34 are conducted. The numerical results demonstrate the efficacy of BR-FEEL on reducing attack success rates by over 90% compared to other baselines under various attack methods.

在资源受限的联合边缘学习（FEEL）系统中，片段共享是多个客户端合作训练一个拥有数十亿参数的巨型模型的有效方法。与传统的联合学习方案（每个客户端都要对本地模型进行完全训练和交换）相比，片段共享只要求每个客户端根据其存储、计算和网络能力，有选择地选择一个参数片段进行训练和共享。然而，当片段共享不再提供完整模型时，隐藏在片段背后的后门攻击就变得难以察觉，这给 FEEL 系统的安全性带来了巨大挑战。在本文中，我们首先指出现有的片段共享技术存在大量后门攻击。然后，我们提出了一种名为 BR-FEEL 的后门弹性方法来抵御潜在的后门攻击。具体来说，每个良性客户端都会建立一个孪生模型，以整合来自其他客户端的参数片段。每个客户端都设计了一个知识蒸馏过程，将孪生模型中的干净知识转移到本地模型中。有了孪生模型和知识蒸馏过程，我们的 BR-FEEL 方法就能确保良性客户机的本地模型不会被回溯。我们使用 MobileNetV2 和 ResNet-34 在 CIFAR-10 和 GTSRB 数据集上进行了实验。数值结果表明，在各种攻击方法下，与其他基线相比，BR-FEEL 能有效降低 90% 以上的攻击成功率。

{"title":"BR-FEEL: A backdoor resilient approach for federated edge learning with fragment-sharing","authors":"Senmao Qi , Hao Ma , Yifei Zou , Yuan Yuan , Peng Li , Dongxiao Yu","doi":"10.1016/j.sysarc.2024.103258","DOIUrl":"10.1016/j.sysarc.2024.103258","url":null,"abstract":"<div><p>In the resource-constrained federated edge learning (FEEL) systems, fragment-sharing is an efficient approach for multiple clients to cooperatively train a giant model with billions of parameters. Compared with the classical federated learning schemes where the local model is fully trained and exchanged by each client, the fragment-sharing only requires each client to optionally choose a parameter-fragment to train and share, according to its storage, computing, and networking abilities. However, when the full model is no longer delivered in fragment-sharing, the backdoor attacks hidden behind the fragments become harder to be detected, which introduces formidable challenge for the security of FEEL systems. In this paper, we firstly show that the existing fragment-sharing works suffer a lot from the backdoor attacks. Then, a Backdoor-Resilient approach, named BR-FEEL, is introduced to defend against the potential backdoor attacks. Specifically, a twin model is built by each benign client to integrate the parameter-fragments from others. A knowledge distillation process is designed on each client to transfer the clean knowledge from its twin model to local model. With the twin model and knowledge distillation process, our BR-FEEL approach makes sure that the local models of the benign clients will not be backdoored. Experiments on CIFAR-10 and GTSRB datasets with MobileNetV2 and ResNet-34 are conducted. The numerical results demonstrate the efficacy of BR-FEEL on reducing attack success rates by over 90% compared to other baselines under various attack methods.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"155 ","pages":"Article 103258"},"PeriodicalIF":3.7,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141979158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A novel real-time calculus for arbitrary job patterns and deadlines 适用于任意工作模式和截止日期的新型实时计算方法

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Systems Architecture

Pub Date : 2024-08-08 DOI: 10.1016/j.sysarc.2024.103248

Iwan Feras Fattohi , Christian Prehofer , Frank Slomka

In schedulability analysis, the timing of all tasks of a real-time system is verified by finding the worst-case behavior. Two well-studied methods in this field are the demand bound test and the real-time calculus. However, the former is only applicable to specific task models whereas the latter does not formalize concrete task models of complex job patterns. This work presents a new approach to formally describe and analyze the worst-case of any complex job pattern. The approach consists of a task model that reduces all kinds of job patterns to a vector space of jobs. Furthermore, the worst-case analysis searches for local maxima to find the worst-case of any job pattern by differentiating any cumulative function of the real-time calculus. Therefore, the analysis in this work implies an algorithm to compute request as well as demand bounds by construction. This formal approach allows the integration of mathematical results from real-time calculus into real-time scheduling theory. In fact, this is, to our knowledge, the first method to compute a demand bound function with arbitrary deadlines by a min-plus deconvolution in the real-time calculus. This now allows the analysis of complex task models as the generalized multiframe model in the real-time calculus.

在可调度性分析中，实时系统所有任务的时序是通过寻找最坏情况行为来验证的。在这一领域，需求约束测试和实时微积分是两种经过深入研究的方法。然而，前者只适用于特定的任务模型，而后者并不能形式化复杂作业模式的具体任务模型。这项工作提出了一种新方法，用于正式描述和分析任何复杂工作模式的最坏情况。该方法由一个任务模型组成，该任务模型可将各种作业模式简化为作业的向量空间。此外，最坏情况分析通过微分实时微积分的任何累积函数，搜索局部最大值，从而找到任何作业模式的最坏情况。因此，这项工作中的分析意味着一种通过构造计算请求和需求界限的算法。这种形式化的方法可以将实时微积分的数学结果与实时调度理论相结合。事实上，据我们所知，这是第一种在实时微积分中通过最小加解卷计算任意截止时间的需求约束函数的方法。现在，我们可以分析复杂的任务模型，如实时微积分中的广义多帧模型。

{"title":"A novel real-time calculus for arbitrary job patterns and deadlines","authors":"Iwan Feras Fattohi , Christian Prehofer , Frank Slomka","doi":"10.1016/j.sysarc.2024.103248","DOIUrl":"10.1016/j.sysarc.2024.103248","url":null,"abstract":"<div><p>In schedulability analysis, the timing of all tasks of a real-time system is verified by finding the worst-case behavior. Two well-studied methods in this field are the demand bound test and the real-time calculus. However, the former is only applicable to specific task models whereas the latter does not formalize concrete task models of complex job patterns. This work presents a new approach to formally describe and analyze the worst-case of any complex job pattern. The approach consists of a task model that reduces all kinds of job patterns to a vector space of jobs. Furthermore, the worst-case analysis searches for local maxima to find the worst-case of any job pattern by differentiating any cumulative function of the real-time calculus. Therefore, the analysis in this work implies an algorithm to compute request as well as demand bounds by construction. This formal approach allows the integration of mathematical results from real-time calculus into real-time scheduling theory. In fact, this is, to our knowledge, the first method to compute a demand bound function with arbitrary deadlines by a min-plus deconvolution in the real-time calculus. This now allows the analysis of complex task models as the generalized multiframe model in the real-time calculus.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"155 ","pages":"Article 103248"},"PeriodicalIF":3.7,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1383762124001851/pdfft?md5=c2ce41a5141c0b1f1aa13dbf7d7c8a2b&pid=1-s2.0-S1383762124001851-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142021255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A survey of FPGA and ASIC designs for transformer inference acceleration and optimization 用于变压器推理加速和优化的 FPGA 和 ASIC 设计概览

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Systems Architecture

Pub Date : 2024-08-07 DOI: 10.1016/j.sysarc.2024.103247

Beom Jin Kang , Hae In Lee , Seok Kyu Yoon, Young Chan Kim, Sang Beom Jeong, Seong Jun O, Hyun Kim

Recently, transformer-based models have achieved remarkable success in various fields, such as computer vision, speech recognition, and natural language processing. However, transformer models require a substantially higher number of parameters and computational operations than conventional neural networks (e.g., recurrent neural networks, long-short-term memory, and convolutional neural networks). Transformer models are typically processed on graphics processing unit (GPU) platforms specialized for high-performance memory and parallel processing. However, the high power consumption of GPUs poses significant challenges for their deployment in edge device environments with limited battery capacity. To address these issues, research on using field-programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs) to drive transformer models with low power consumption is underway. FPGAs offer a high level of flexibility, whereas ASICs are beneficial for optimizing throughput and power. Therefore, both platforms are highly suitable for efficiently optimizing matrix multiplication operations, constituting a significant portion of transformer models. In addition, FPGAs and ASICs consume less power than GPUs, making them ideal energy-efficient platforms. This study investigates and analyzes the model compression methods, various optimization techniques, and architectures of accelerators related to FPGA- and ASIC-based transformer designs. We expect this study to serve as a valuable guide for hardware research in the transformer field.

最近，基于变压器的模型在计算机视觉、语音识别和自然语言处理等多个领域取得了显著的成功。然而，与传统神经网络（递归神经网络、长短期记忆和卷积神经网络）相比，变换器模型需要更多的参数和计算操作。变压器模型通常在图形处理器（GPU）平台上进行处理，该平台专门用于高性能内存和并行处理。然而，GPU 的高功耗给其在电池容量有限的边缘设备环境中的部署带来了巨大挑战。为了解决这些问题，利用现场可编程门阵列（FPGA）和特定应用集成电路（ASIC）驱动低功耗变压器模型的研究正在进行中。FPGA 具有高度灵活性，而 ASIC 则有利于优化吞吐量和功耗。因此，这两种平台都非常适合高效优化矩阵乘法运算，而矩阵乘法运算在变压器模型中占很大比重。此外，FPGA 和 ASIC 的功耗低于 GPU，是理想的节能平台。本研究调查并分析了与基于 FPGA 和 ASIC 的变压器设计相关的模型压缩方法、各种优化技术和加速器架构。我们希望本研究能为变压器领域的硬件研究提供有价值的指导。

{"title":"A survey of FPGA and ASIC designs for transformer inference acceleration and optimization","authors":"Beom Jin Kang , Hae In Lee , Seok Kyu Yoon, Young Chan Kim, Sang Beom Jeong, Seong Jun O, Hyun Kim","doi":"10.1016/j.sysarc.2024.103247","DOIUrl":"10.1016/j.sysarc.2024.103247","url":null,"abstract":"<div><p>Recently, transformer-based models have achieved remarkable success in various fields, such as computer vision, speech recognition, and natural language processing. However, transformer models require a substantially higher number of parameters and computational operations than conventional neural networks (<em>e.g.</em>, recurrent neural networks, long-short-term memory, and convolutional neural networks). Transformer models are typically processed on graphics processing unit (GPU) platforms specialized for high-performance memory and parallel processing. However, the high power consumption of GPUs poses significant challenges for their deployment in edge device environments with limited battery capacity. To address these issues, research on using field-programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs) to drive transformer models with low power consumption is underway. FPGAs offer a high level of flexibility, whereas ASICs are beneficial for optimizing throughput and power. Therefore, both platforms are highly suitable for efficiently optimizing matrix multiplication operations, constituting a significant portion of transformer models. In addition, FPGAs and ASICs consume less power than GPUs, making them ideal energy-efficient platforms. This study investigates and analyzes the model compression methods, various optimization techniques, and architectures of accelerators related to FPGA- and ASIC-based transformer designs. We expect this study to serve as a valuable guide for hardware research in the transformer field.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"155 ","pages":"Article 103247"},"PeriodicalIF":3.7,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141943670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Design-time methodology for optimizing mixed-precision CPU architectures on FPGA 在 FPGA 上优化混合精度 CPU 架构的设计时方法学

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Systems Architecture

Pub Date : 2024-08-07 DOI: 10.1016/j.sysarc.2024.103257

Lev Denisov, Andrea Galimberti, Daniele Cattaneo, Giovanni Agosta, Davide Zoni

Approximate computing can significantly reduce the energy consumption of computing systems. Mixed-precision hardware architectures and precision-tuning tools for software provide the ability to introduce approximations, but when applied separately, they do not give complete control over the accuracy-energy trade-off. The co-optimization of approximations in hardware and software is a complex task, but it promises considerable benefits. We present a methodology for the fast design-time selection of mixed-precision hardware-software combinations that minimize the energy consumption and the area of the target FPGA-based softcore CPUs with configurable support for floating-point and fixed-point arithmetic. Our approach can evaluate configurations more than 2000 times faster than the alternative approach of using gate-level simulation. On benchmarks from the PolyBench suite the identified hardware-software configurations showed improvement of the energy-to-solution metric ranging from 20% to 95%.

近似计算可以大大降低计算系统的能耗。混合精度硬件架构和软件精度调整工具提供了引入近似计算的能力，但单独应用时，它们无法完全控制精度与能耗之间的权衡。在硬件和软件中共同优化近似值是一项复杂的任务，但却能带来可观的收益。我们提出了一种在设计时快速选择混合精度软硬件组合的方法，这种组合能最大限度地降低基于 FPGA 的软核 CPU 的能耗和面积，并可配置浮点和定点算术支持。与使用门级仿真的替代方法相比，我们的方法评估配置的速度快 2000 多倍。在 PolyBench 套件的基准测试中，所确定的软硬件配置显示出 20% 至 95% 的能耗比改进。

{"title":"Design-time methodology for optimizing mixed-precision CPU architectures on FPGA","authors":"Lev Denisov, Andrea Galimberti, Daniele Cattaneo, Giovanni Agosta, Davide Zoni","doi":"10.1016/j.sysarc.2024.103257","DOIUrl":"10.1016/j.sysarc.2024.103257","url":null,"abstract":"<div><p>Approximate computing can significantly reduce the energy consumption of computing systems. Mixed-precision hardware architectures and precision-tuning tools for software provide the ability to introduce approximations, but when applied separately, they do not give complete control over the accuracy-energy trade-off. The co-optimization of approximations in hardware and software is a complex task, but it promises considerable benefits. We present a methodology for the fast design-time selection of mixed-precision hardware-software combinations that minimize the energy consumption and the area of the target FPGA-based softcore CPUs with configurable support for floating-point and fixed-point arithmetic. Our approach can evaluate configurations more than 2000 times faster than the alternative approach of using gate-level simulation. On benchmarks from the PolyBench suite the identified hardware-software configurations showed improvement of the energy-to-solution metric ranging from 20% to 95%.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"155 ","pages":"Article 103257"},"PeriodicalIF":3.7,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1383762124001942/pdfft?md5=55022e3533486fbfd6ddfb763e97f61b&pid=1-s2.0-S1383762124001942-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141943671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Privacy-Preserving Three-Factor Authentication System for IoT-Enabled Wireless Sensor Networks 物联网无线传感器网络的隐私保护三因素身份验证系统

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Systems Architecture

Pub Date : 2024-07-26 DOI: 10.1016/j.sysarc.2024.103245

Garima Thakur , Sunil Prajapat , Pankaj Kumar , Chien-Ming Chen

Recently, Sahoo et al. introduced a three-factor authentication scheme for Wireless Sensor Networks (WSNs) based on an elliptic curve cryptosystem. Nonetheless, upon closer examination, we have identified critical vulnerabilities in their scheme, including susceptibility to user impersonation, gateway impersonation, sensor node impersonation attacks, and a breach in the three-factor security aspect. Further, the scheme fails to withstand offline sensor node identity guessing attacks, man-in-the-middle attacks, and known session-specific temporary information attacks. Intending to elevate both security and efficiency, we propose a novel three-factor authentication scheme that capitalizes on the strengths of a fuzzy extractor and a cryptographic one-way hash function. The proposed scheme’s security has been rigorously assessed using the SCYTHER tool, confirming its validity under the real-or-random (ROR) model. Moreover, a heuristic analysis exemplifies that the scheme effectively withstands various known cryptographic attacks. Consequently, the performance comparisons establish the superiority of our scheme over related approaches in terms of security and efficiency. Additionally, its suitability for WSNs is evident due to the minimal overhead on the sensor nodes, making it a highly promising solution for real-world implementation.

最近，Sahoo 等人提出了一种基于椭圆曲线密码系统的无线传感器网络（WSN）三因素身份验证方案。然而，经过仔细研究，我们发现他们的方案存在严重漏洞，包括易受用户假冒、网关假冒、传感器节点假冒攻击，以及三因素安全方面的漏洞。此外，该方案还无法抵御离线传感器节点身份猜测攻击、中间人攻击和已知会话特定临时信息攻击。为了提高安全性和效率，我们提出了一种新型三因素认证方案，该方案充分利用了模糊提取器和加密单向哈希函数的优势。我们使用 SCYTHER 工具对所提方案的安全性进行了严格评估，证实了它在真实或随机（ROR）模型下的有效性。此外，启发式分析表明，该方案能有效抵御各种已知的加密攻击。因此，通过性能比较，我们的方案在安全性和效率方面都优于相关方法。此外，由于对传感器节点的开销极小，它对 WSN 的适用性也显而易见，这使它成为一种极有希望在现实世界中实施的解决方案。

{"title":"A Privacy-Preserving Three-Factor Authentication System for IoT-Enabled Wireless Sensor Networks","authors":"Garima Thakur , Sunil Prajapat , Pankaj Kumar , Chien-Ming Chen","doi":"10.1016/j.sysarc.2024.103245","DOIUrl":"10.1016/j.sysarc.2024.103245","url":null,"abstract":"<div><p>Recently, Sahoo et al. introduced a three-factor authentication scheme for Wireless Sensor Networks (WSNs) based on an elliptic curve cryptosystem. Nonetheless, upon closer examination, we have identified critical vulnerabilities in their scheme, including susceptibility to user impersonation, gateway impersonation, sensor node impersonation attacks, and a breach in the three-factor security aspect. Further, the scheme fails to withstand offline sensor node identity guessing attacks, man-in-the-middle attacks, and known session-specific temporary information attacks. Intending to elevate both security and efficiency, we propose a novel three-factor authentication scheme that capitalizes on the strengths of a fuzzy extractor and a cryptographic one-way hash function. The proposed scheme’s security has been rigorously assessed using the SCYTHER tool, confirming its validity under the real-or-random (ROR) model. Moreover, a heuristic analysis exemplifies that the scheme effectively withstands various known cryptographic attacks. Consequently, the performance comparisons establish the superiority of our scheme over related approaches in terms of security and efficiency. Additionally, its suitability for WSNs is evident due to the minimal overhead on the sensor nodes, making it a highly promising solution for real-world implementation.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"154 ","pages":"Article 103245"},"PeriodicalIF":3.7,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141853384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Structured segment rescaling with Gaussian processes for parameter efficient ConvNets 利用高斯过程进行结构化分段重定标，实现参数高效 ConvNets

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Systems Architecture

Pub Date : 2024-07-25 DOI: 10.1016/j.sysarc.2024.103246

Bilal Siddiqui , Adel Alaeddini , Dakai Zhu

We introduce a novel mechanism for structured pruning on ConvNet blocks and channels. Our mechanism, Structured Segment Rescaling (SSR) down-samples a ConvNet’s dimensions using depth and width modifiers that respectively remove whole blocks and channels. SSR is a systematic approach for constructing ConvNets that can replace arbitrary design heuristics. The SSR modifiers rescale logical partitions (segments) of a ConvNet with grouped layers. Different modifiers on segments yield many different architectures with unique rescales for their blocks. This diversity of architectures is then systemically explored using a Gaussian Process (GP) that optimizes for modifiers that maintain accuracy and reduce parameters. We analyze SSR in the context of resource constrained environments using ResNets trained on the CIFAR datasets. An initial set of depth and width modifiers explore extreme rescales of ResNet segments, where we find up to 70% parameter reduction. The GP then generalizes on these initial rescales by being trained on them and then predicts the accuracy of other rescaled ConvNet given their segment modifiers. SSR produces over $1 0^{5}$ ConvNets that can be trained selectively based on their GP predicted accuracy. The GP enabled SSR pushes compression to over 80% with minimal accuracy impact. While both depth and width modifiers can reduce parameters, we show reducing blocks is better for reducing latency with up to 80% faster ConvNets. Using our mechanism, we can efficiently customize ConvNets using their parameter-accuracy trade-offs. SSR only requires $1 0^{1}$ GPU hours and modest engineering to yield efficient new ConvNets that can facilitate edge inference.

我们介绍了一种对 ConvNet 块和通道进行结构化修剪的新机制。我们的机制--结构化分段重缩放（SSR）--使用深度和宽度修改器对 ConvNet 的维度进行向下采样，分别删除整个块和通道。SSR 是一种构建 ConvNet 的系统方法，可以取代任意的设计启发式方法。SSR 修饰符可对具有分组层的 ConvNet 的逻辑分区（段）进行重新缩放。段上的不同修改器会产生许多不同的架构，其区块具有独特的重定标。然后，使用高斯过程 (GP) 系统地探索这种架构的多样性，优化修改器，以保持准确性并减少参数。我们使用在 CIFAR 数据集上训练的 ResNets 分析了资源受限环境下的 SSR。一组初始的深度和宽度修改器探索了 ResNet 片段的极端重新缩放，我们发现最多可减少 70% 的参数。然后，GP 在这些初始重尺度上进行泛化训练，然后预测其他重尺度 ConvNet 的准确性，并给出它们的段修改器。SSR 可生成超过 105 个 ConvNet，这些 ConvNet 可根据 GP 预测的准确性进行选择性训练。支持 GP 的 SSR 将压缩率提高到 80% 以上，而对准确性的影响却微乎其微。虽然深度和宽度修改器都能减少参数，但我们发现减少块更能减少延迟，ConvNets 的速度最多可提高 80%。利用我们的机制，我们可以通过参数-精度权衡有效地定制 ConvNets。SSR 只需要 101 个 GPU 小时和适度的工程设计，就能产生高效的新 ConvNets，从而促进边缘推理。

{"title":"Structured segment rescaling with Gaussian processes for parameter efficient ConvNets","authors":"Bilal Siddiqui , Adel Alaeddini , Dakai Zhu","doi":"10.1016/j.sysarc.2024.103246","DOIUrl":"10.1016/j.sysarc.2024.103246","url":null,"abstract":"<div><p>We introduce a novel mechanism for structured pruning on ConvNet blocks and channels. Our mechanism, <em>Structured Segment Rescaling (SSR)</em> down-samples a ConvNet’s dimensions using depth and width modifiers that respectively remove whole blocks and channels. SSR is a systematic approach for constructing ConvNets that can replace arbitrary design heuristics. The SSR modifiers rescale logical partitions (segments) of a ConvNet with grouped layers. Different modifiers on segments yield many different architectures with unique rescales for their blocks. This diversity of architectures is then systemically explored using a Gaussian Process (GP) that optimizes for modifiers that maintain accuracy and reduce parameters. We analyze SSR in the context of resource constrained environments using ResNets trained on the CIFAR datasets. An initial set of <em>depth</em> and <em>width</em> modifiers explore extreme rescales of ResNet segments, where we find up to 70% parameter reduction. The GP then generalizes on these initial rescales by being trained on them and then predicts the accuracy of other rescaled ConvNet given their segment modifiers. SSR produces over <span><math><mrow><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mn>5</mn></mrow></msup></mrow></math></span> ConvNets that can be trained selectively based on their GP predicted accuracy. The GP enabled SSR pushes compression to over 80% with minimal accuracy impact. While both depth and width modifiers can reduce parameters, we show reducing blocks is better for reducing latency with up to 80% faster ConvNets. Using our mechanism, we can efficiently customize ConvNets using their parameter-accuracy trade-offs. SSR only requires <span><math><mrow><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mn>1</mn></mrow></msup></mrow></math></span> GPU hours and modest engineering to yield efficient new ConvNets that can facilitate edge inference.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"154 ","pages":"Article 103246"},"PeriodicalIF":3.7,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141844898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dynamic computation scheduling for hybrid energy mobile edge computing networks 混合能源移动边缘计算网络的动态计算调度

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Systems Architecture

Pub Date : 2024-07-25 DOI: 10.1016/j.sysarc.2024.103241

Ran Bi , Liang Sun , Yiwei Sun , Meng Han , Qingxu Deng

The rapid expansion of Mobile Edge Computing (MEC) is driven by the growing demand for resource-intensive applications within the Internet of Things. Computation offloading allows these applications to be executed at the network edge, but it leads to significant electricity expenses for network operators. To mitigate these costs, one promising solution is to power base stations (BSs) with a hybrid energy supply that combines unpredictable harvested energy with stable energy from the smart grid. This paper investigates joint computation scheduling for mobile devices (MDs) and resource allocation in a MEC network incorporating hybrid energy sources. Our objective is to maximize long-term time-averaged service utility by optimizing parameters such as BS battery supply, harvestable energy, CPU frequency, transmission power, task-partition factor, and MD-BS associations. To tackle this complex problem, we exploit the Lyapunov optimization framework to decompose it into deterministic subproblems for each time slot and propose an online network service utility maximization scheduling (NSUMS) algorithm. Experimental results show that our algorithm outperforms benchmark schemes in service utility and energy expenditure, improving the completion ratio by 32%, reducing the failure rate by 80%, and decreasing MD energy consumption by 28%.

物联网对资源密集型应用的需求不断增长，推动了移动边缘计算（MEC）的快速发展。计算卸载允许这些应用在网络边缘执行，但这会给网络运营商带来巨大的电费支出。为了降低这些成本，一种可行的解决方案是利用混合能源供应为基站（BS）供电，这种混合能源供应结合了不可预测的采集能源和智能电网的稳定能源。本文研究了移动设备（MD）的联合计算调度和混合能源网络中的资源分配。我们的目标是通过优化 BS 电池供应、可收获能源、CPU 频率、传输功率、任务分区因子和 MDBS 关联等参数，最大化长期时间平均服务效用。为解决这一复杂问题，我们利用 Lyapunov 优化框架将其分解为每个时隙的确定性子问题，并提出了一种在线网络服务效用最大化调度（NSUMS）算法。实验结果表明，我们的算法在服务效用和能源消耗方面优于基准方案，完成率提高了 32%，故障率降低了 80%，MD 能源消耗减少了 28%。

{"title":"Dynamic computation scheduling for hybrid energy mobile edge computing networks","authors":"Ran Bi , Liang Sun , Yiwei Sun , Meng Han , Qingxu Deng","doi":"10.1016/j.sysarc.2024.103241","DOIUrl":"10.1016/j.sysarc.2024.103241","url":null,"abstract":"<div><p>The rapid expansion of Mobile Edge Computing (MEC) is driven by the growing demand for resource-intensive applications within the Internet of Things. Computation offloading allows these applications to be executed at the network edge, but it leads to significant electricity expenses for network operators. To mitigate these costs, one promising solution is to power base stations (BSs) with a hybrid energy supply that combines unpredictable harvested energy with stable energy from the smart grid. This paper investigates joint computation scheduling for mobile devices (MDs) and resource allocation in a MEC network incorporating hybrid energy sources. Our objective is to maximize long-term time-averaged service utility by optimizing parameters such as BS battery supply, harvestable energy, CPU frequency, transmission power, task-partition factor, and MD-BS associations. To tackle this complex problem, we exploit the Lyapunov optimization framework to decompose it into deterministic subproblems for each time slot and propose an online network service utility maximization scheduling (NSUMS) algorithm. Experimental results show that our algorithm outperforms benchmark schemes in service utility and energy expenditure, improving the completion ratio by 32%, reducing the failure rate by 80%, and decreasing MD energy consumption by 28%.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"155 ","pages":"Article 103241"},"PeriodicalIF":3.7,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141848462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Backdoor-resistant certificateless-based message-locked integrity auditing for computing power network 面向计算能力网络的基于无证书消息锁定的抗后门完整性审计

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Systems Architecture

Pub Date : 2024-07-24 DOI: 10.1016/j.sysarc.2024.103244

Xiaodong Yang , Lizhen Wei , Muzi Li , Xiaoni Du , Caifen Wang

With the evolution of 6G technology, the computing power network (CPN), a crucial infrastructure underpinning the future of digital transformation, is being tightly integrated with 6G to jointly foster significant enhancements in network performance. However, the openness of the network and the semi-trusted nature of cloud service providers pose risks to user data, including potential theft and tampering. To address these challenges, this paper introduces a novel scheme called the backdoor-resistant certificateless message-locked integrity audit (BRCLMIA). This scheme incorporates a cryptographic reverse firewall, offering an efficient and secure approach to auditing CPN data integrity. Our thorough analysis demonstrates that the BRCLMIA scheme effectively safeguards against existential forgery under adaptive message attacks, safeguards against algorithm replacement attacks, and ensures the integrity and authenticity of data transmitted within the CPN. Furthermore, the performance analysis reveals that the BRCLMIA scheme exhibits remarkable communication and computational efficiency while defending against various attacks.

随着 6G 技术的发展，计算能力网络（CPN）这一支撑未来数字化转型的重要基础设施正与 6G 紧密结合，共同促进网络性能的显著提升。然而，网络的开放性和云服务提供商的半信任性质给用户数据带来了风险，包括潜在的盗窃和篡改。为了应对这些挑战，本文介绍了一种名为无后门证书信息锁定完整性审计（BRCLMIA）的新型方案。该方案结合了加密反向防火墙，提供了一种高效、安全的 CPN 数据完整性审计方法。我们的全面分析表明，BRCLMIA 方案能有效抵御自适应信息攻击下的存在性伪造，抵御算法替换攻击，并确保在 CPN 内传输的数据的完整性和真实性。此外，性能分析表明，BRCLMIA 方案在抵御各种攻击的同时，还具有显著的通信和计算效率。

{"title":"Backdoor-resistant certificateless-based message-locked integrity auditing for computing power network","authors":"Xiaodong Yang , Lizhen Wei , Muzi Li , Xiaoni Du , Caifen Wang","doi":"10.1016/j.sysarc.2024.103244","DOIUrl":"10.1016/j.sysarc.2024.103244","url":null,"abstract":"<div><p>With the evolution of 6G technology, the computing power network (CPN), a crucial infrastructure underpinning the future of digital transformation, is being tightly integrated with 6G to jointly foster significant enhancements in network performance. However, the openness of the network and the semi-trusted nature of cloud service providers pose risks to user data, including potential theft and tampering. To address these challenges, this paper introduces a novel scheme called the backdoor-resistant certificateless message-locked integrity audit (BRCLMIA). This scheme incorporates a cryptographic reverse firewall, offering an efficient and secure approach to auditing CPN data integrity. Our thorough analysis demonstrates that the BRCLMIA scheme effectively safeguards against existential forgery under adaptive message attacks, safeguards against algorithm replacement attacks, and ensures the integrity and authenticity of data transmitted within the CPN. Furthermore, the performance analysis reveals that the BRCLMIA scheme exhibits remarkable communication and computational efficiency while defending against various attacks.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"154 ","pages":"Article 103244"},"PeriodicalIF":3.7,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141783438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Coarse-grained reconfigurable architectures for radio baseband processing: A survey 用于无线电基带处理的粗粒度可重构架构：调查

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Systems Architecture

Pub Date : 2024-07-23 DOI: 10.1016/j.sysarc.2024.103243

Zohaib Hassan, Aleksandr Ometov, Elena Simona Lohan, Jari Nurmi

Emerging communication technologies, such as 5G and beyond, have introduced diverse requirements that demand high performance and energy efficiency at all levels. Furthermore, the real-time requirements of different services vary significantly — increasing the baseband processor design complexity and demand for flexible hardware platforms. This paper identifies the key characteristics of hardware platforms for baseband processing and describes the existing processing limitations in traditional architectures. In this paper, Coarse-Grained Reconfigurable Architecture (CGRA) is examined as a prospective hardware platform and its characteristic features are highlighted as compared to traditionally employed architectures that make it a suitable candidate for incorporation as a domain-specific accelerator in baseband processing applications. We survey various CGRAs from the last two decades (2004-2023) and analyze their distinct architectural features which can serve as a reference while designing CGRAs for baseband processing applications. Moreover, we investigate the existing challenges toward developing CGRAs for baseband processing and explore their potential solutions. We also provide an overview of the emerging research directions for CGRA and how they can contribute toward the development of advanced baseband processors. Lastly, we highlight a conceptual RISC-V+CGRA framework that can serve as a potential direction toward integrating CGRA in future baseband processing systems.

新兴通信技术（如 5G 及更先进的技术）提出了多样化的要求，需要在各个层面实现高性能和高能效。此外，不同服务的实时性要求也大相径庭--这增加了基带处理器设计的复杂性和对灵活硬件平台的需求。本文指出了基带处理硬件平台的主要特点，并介绍了传统架构中现有的处理限制。本文将粗粒度可重构架构（CGRA）作为一种前瞻性硬件平台进行研究，并强调了其与传统架构相比所具有的特点，这些特点使其成为基带处理应用中特定领域加速器的合适候选者。我们调查了过去二十年（2004-2023 年）中的各种 CGRA，并分析了其独特的架构特征，这些特征可作为设计基带处理应用 CGRA 的参考。此外，我们还调查了开发基带处理 CGRA 所面临的现有挑战，并探讨了潜在的解决方案。我们还概述了 CGRA 的新兴研究方向，以及它们如何促进先进基带处理器的发展。最后，我们强调了 RISC-V+CGRA 概念框架，该框架可作为将 CGRA 集成到未来基带处理系统的潜在方向。

{"title":"Coarse-grained reconfigurable architectures for radio baseband processing: A survey","authors":"Zohaib Hassan, Aleksandr Ometov, Elena Simona Lohan, Jari Nurmi","doi":"10.1016/j.sysarc.2024.103243","DOIUrl":"10.1016/j.sysarc.2024.103243","url":null,"abstract":"<div><p>Emerging communication technologies, such as 5G and beyond, have introduced diverse requirements that demand high performance and energy efficiency at all levels. Furthermore, the real-time requirements of different services vary significantly — increasing the baseband processor design complexity and demand for flexible hardware platforms. This paper identifies the key characteristics of hardware platforms for baseband processing and describes the existing processing limitations in traditional architectures. In this paper, Coarse-Grained Reconfigurable Architecture (CGRA) is examined as a prospective hardware platform and its characteristic features are highlighted as compared to traditionally employed architectures that make it a suitable candidate for incorporation as a domain-specific accelerator in baseband processing applications. We survey various CGRAs from the last two decades (2004-2023) and analyze their distinct architectural features which can serve as a reference while designing CGRAs for baseband processing applications. Moreover, we investigate the existing challenges toward developing CGRAs for baseband processing and explore their potential solutions. We also provide an overview of the emerging research directions for CGRA and how they can contribute toward the development of advanced baseband processors. Lastly, we highlight a conceptual RISC-V+CGRA framework that can serve as a potential direction toward integrating CGRA in future baseband processing systems.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"154 ","pages":"Article 103243"},"PeriodicalIF":3.7,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1383762124001802/pdfft?md5=7be2071289f906c1ad056f84de0a2459&pid=1-s2.0-S1383762124001802-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141783452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Virtualized real-time workloads in containers and virtual machines 容器和虚拟机中的虚拟化实时工作负载

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Systems Architecture

Pub Date : 2024-07-23 DOI: 10.1016/j.sysarc.2024.103238

Luca Abeni

Real-time virtualization is currently a hot topic, and there is much ongoing research on real-time Virtual Machines and hypervisors. However, most of the previous research focused either on reducing the latencies introduced by the virtualization stack (hypervisor, host Operating System, Virtual Machine scheduling, etc...) or analyzing the virtual CPU scheduling algorithms. Only a few works investigated the impact of the guest Operating System architecture on real-time performance or considered multiple performance metrics (latency, schedulability, startup times, resource consumption) at the same time. This paper compares various features of different virtualization technologies and guest Operating Systems, evaluating their suitability for serving real-time applications. The results indicate that solutions based on KVM (and an appropriate microvm) and the OSv unikernel can be considered viable alternatives to more traditional VMs or containers.

实时虚拟化是当前的热门话题，有关实时虚拟机和管理程序的研究也在不断深入。然而，以前的研究大多集中于减少虚拟化堆栈（管理程序、主机操作系统、虚拟机调度等）带来的延迟，或分析虚拟 CPU 调度算法。只有少数著作研究了客户操作系统架构对实时性能的影响，或同时考虑了多个性能指标（延迟、可调度性、启动时间、资源消耗）。本文比较了不同虚拟化技术和客户操作系统的各种特性，评估了它们对服务实时应用的适用性。结果表明，基于 KVM（和适当的 microvm）和 OSv unikernel 的解决方案可被视为更传统的虚拟机或容器的可行替代方案。

引用次数: 0