首页 > 最新文献

IEEE Transactions on Computers最新文献

英文 中文
2024 Reviewers List 2024审稿人名单
IF 3.6 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-01-14 DOI: 10.1109/TC.2025.3527650
{"title":"2024 Reviewers List","authors":"","doi":"10.1109/TC.2025.3527650","DOIUrl":"https://doi.org/10.1109/TC.2025.3527650","url":null,"abstract":"","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 1","pages":"334-340"},"PeriodicalIF":3.6,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10840336","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Shared Recurrence Floating-Point Divide/Sqrt and Integer Divide/Remainder With Early Termination 共享递归浮点除法/平方根和整数除法/余数提前终止
IF 3.6 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-11-18 DOI: 10.1109/TC.2024.3500380
Kevin Kim;Katherine Parry;David Harris;Cedar Turek;Alessandro Maiuolo;Rose Thompson;James Stine
Division, square root, and remainder are fundamental operations required by most computer systems. Floating-point and integer operations are commonly performed on separate datapaths. This paper presents the first detailed implementation of a shared recurrence unit that supports floating-point division/square root and integer division/remainder. It supports early termination and shares the normalization shifter needed for integer and subnormal inputs. Synthesis results show that shared double-precision dividers producing at least 4 bits per cycle are 9 - 18% smaller and 3 - 16% faster than separate integer and floating-point units.
除法、平方根和余数是大多数计算机系统所需要的基本运算。浮点和整数操作通常在不同的数据路径上执行。本文首次详细实现了一个支持浮点除法/平方根和整数除法/余数的共享递归单元。它支持提前终止,并共享整数和次正规输入所需的归一化移位器。综合结果表明,每个周期至少产生4位的共享双精度分频器比单独的整数和浮点单元小9 - 18%,快3 - 16%。
{"title":"Shared Recurrence Floating-Point Divide/Sqrt and Integer Divide/Remainder With Early Termination","authors":"Kevin Kim;Katherine Parry;David Harris;Cedar Turek;Alessandro Maiuolo;Rose Thompson;James Stine","doi":"10.1109/TC.2024.3500380","DOIUrl":"https://doi.org/10.1109/TC.2024.3500380","url":null,"abstract":"Division, square root, and remainder are fundamental operations required by most computer systems. Floating-point and integer operations are commonly performed on separate datapaths. This paper presents the first detailed implementation of a shared recurrence unit that supports floating-point division/square root and integer division/remainder. It supports early termination and shares the normalization shifter needed for integer and subnormal inputs. Synthesis results show that shared double-precision dividers producing at least 4 bits per cycle are 9 - 18% smaller and 3 - 16% faster than separate integer and floating-point units.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 2","pages":"740-748"},"PeriodicalIF":3.6,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A System-Level Test Methodology for Communication Peripherals in System-on-Chips 片上系统通信外设的系统级测试方法
IF 3.6 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-11-18 DOI: 10.1109/TC.2024.3500375
Francesco Angione;Paolo Bernardi;Nicola di Gruttola Giardino;Gabriele Filipponi;Claudia Bertani;Vincenzo Tancorre
This paper deals with functional System-Level Test (SLT) for System-on-Chips (SoCs) communication peripherals. The proposed methodology is based on analyzing the potential weaknesses of applied structural tests such as Scan-based. Then, the paper illustrates how to develop a functional SLT programs software suite to address such issues. In case the communication peripheral provides detection/correction features, the methodology proposes the design of a hardware companion module to be added to the Automatic Test Equipment (ATE) to interact with the SoC communication module by purposely corrupting data frames. Experimental results are obtained on an industrial, automotive SoC produced by STMicroelectronics focusing on the Controller Area Network (CAN) communication peripheral and showing the effectiveness of the SLT suite to complement structural tests.
本文讨论了片上系统通信外设的功能系统级测试(SLT)。提出的方法是在分析基于扫描的结构测试的潜在缺陷的基础上提出的。然后,本文阐述了如何开发一个功能强大的SLT程序软件套件来解决这些问题。如果通信外设提供检测/校正功能,该方法建议设计一个硬件配套模块,将其添加到自动测试设备(ATE)中,通过故意破坏数据帧与SoC通信模块进行交互。在意法半导体(STMicroelectronics)生产的工业汽车SoC上获得了实验结果,重点是控制器局域网(CAN)通信外设,并显示了SLT套件补充结构测试的有效性。
{"title":"A System-Level Test Methodology for Communication Peripherals in System-on-Chips","authors":"Francesco Angione;Paolo Bernardi;Nicola di Gruttola Giardino;Gabriele Filipponi;Claudia Bertani;Vincenzo Tancorre","doi":"10.1109/TC.2024.3500375","DOIUrl":"https://doi.org/10.1109/TC.2024.3500375","url":null,"abstract":"This paper deals with functional System-Level Test (SLT) for System-on-Chips (SoCs) communication peripherals. The proposed methodology is based on analyzing the potential weaknesses of applied structural tests such as Scan-based. Then, the paper illustrates how to develop a functional SLT programs software suite to address such issues. In case the communication peripheral provides detection/correction features, the methodology proposes the design of a hardware companion module to be added to the Automatic Test Equipment (ATE) to interact with the SoC communication module by purposely corrupting data frames. Experimental results are obtained on an industrial, automotive SoC produced by STMicroelectronics focusing on the Controller Area Network (CAN) communication peripheral and showing the effectiveness of the SLT suite to complement structural tests.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 2","pages":"731-739"},"PeriodicalIF":3.6,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10755212","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stream: Design Space Exploration of Layer-Fused DNNs on Heterogeneous Dataflow Accelerators 流:异构数据流加速器上层融合dnn的设计空间探索
IF 3.6 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-10-10 DOI: 10.1109/TC.2024.3477938
Arne Symons;Linyan Mei;Steven Colleman;Pouya Houshmand;Sebastian Karl;Marian Verhelst
As the landscape of deep neural networks evolves, heterogeneous dataflow accelerators, in the form of multi-core architectures or chiplet-based designs, promise more flexibility and higher inference performance through scalability. So far, these systems exploit the increased parallelism by coarsely mapping a single layer at a time across cores, which incurs frequent costly off-chip memory accesses, or by pipelining batches of inputs, which falls short in meeting the demands of latency-critical applications. To alleviate these bottlenecks, this work explores a new fine-grain mapping paradigm, referred to as layer fusion, on heterogeneous dataflow accelerators through a novel design space exploration framework called Stream. Stream captures a wide variety of heterogeneous dataflow architectures and mapping granularities, and implements a memory and communication-aware latency and energy analysis validated with three distinct state-of-the-art hardware implementations. As such, it facilitates a holistic exploration of architecture and mapping, by strategically allocating the workload through constraint optimization. The findings demonstrate that the integration of layer fusion with heterogeneous dataflow accelerators yields up to $2.2times$ lower energy-delay product in inference efficiency, addressing both energy consumption and latency concerns. The framework is available open-source at: github.com/kuleuven-micas/stream.
随着深度神经网络的发展,异构数据流加速器以多核架构或基于芯片的设计的形式,通过可扩展性承诺更大的灵活性和更高的推理性能。到目前为止,这些系统通过在内核之间一次粗略地映射一个层来利用增加的并行性,这导致频繁的昂贵的片外内存访问,或者通过流水线批量输入,这无法满足延迟关键应用程序的需求。为了缓解这些瓶颈,本研究通过一种名为Stream的新颖设计空间探索框架,在异构数据流加速器上探索了一种新的细粒度映射范式,称为层融合。Stream捕获各种各样的异构数据流架构和映射粒度,并实现内存和通信感知延迟和能量分析,并通过三种不同的最先进的硬件实现进行验证。因此,它通过约束优化战略性地分配工作负载,促进了对体系结构和映射的整体探索。研究结果表明,层融合与异构数据流加速器的集成在推理效率方面产生高达2.2倍的能量延迟产品,同时解决了能耗和延迟问题。该框架的开源地址是:github.com/kuleuven-micas/stream。
{"title":"Stream: Design Space Exploration of Layer-Fused DNNs on Heterogeneous Dataflow Accelerators","authors":"Arne Symons;Linyan Mei;Steven Colleman;Pouya Houshmand;Sebastian Karl;Marian Verhelst","doi":"10.1109/TC.2024.3477938","DOIUrl":"https://doi.org/10.1109/TC.2024.3477938","url":null,"abstract":"As the landscape of deep neural networks evolves, heterogeneous dataflow accelerators, in the form of multi-core architectures or chiplet-based designs, promise more flexibility and higher inference performance through scalability. So far, these systems exploit the increased parallelism by coarsely mapping a single layer at a time across cores, which incurs frequent costly off-chip memory accesses, or by pipelining batches of inputs, which falls short in meeting the demands of latency-critical applications. To alleviate these bottlenecks, this work explores a new fine-grain mapping paradigm, referred to as layer fusion, on heterogeneous dataflow accelerators through a novel design space exploration framework called \u0000<i>Stream</i>\u0000. \u0000<i>Stream</i>\u0000 captures a wide variety of heterogeneous dataflow architectures and mapping granularities, and implements a memory and communication-aware latency and energy analysis validated with three distinct state-of-the-art hardware implementations. As such, it facilitates a holistic exploration of architecture and mapping, by strategically allocating the workload through constraint optimization. The findings demonstrate that the integration of layer fusion with heterogeneous dataflow accelerators yields up to \u0000<inline-formula><tex-math>$2.2times$</tex-math></inline-formula>\u0000 lower energy-delay product in inference efficiency, addressing both energy consumption and latency concerns. The framework is available open-source at: github.com/kuleuven-micas/stream.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 1","pages":"237-249"},"PeriodicalIF":3.6,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FedQClip: Accelerating Federated Learning via Quantized Clipped SGD FedQClip:通过量化的裁剪SGD加速联邦学习
IF 3.6 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-10-10 DOI: 10.1109/TC.2024.3477972
Zhihao Qu;Ninghui Jia;Baoliu Ye;Shihong Hu;Song Guo
Federated Learning (FL) has emerged as a promising technique for collaboratively training machine learning models among multiple participants while preserving privacy-sensitive data. However, the conventional parameter server architecture presents challenges in terms of communication overhead when employing iterative optimization methods such as Stochastic Gradient Descent (SGD). Although communication compression techniques can reduce the traffic cost of FL during each training round, they often lead to degraded convergence rates, mainly due to compression errors and data heterogeneity. To address these issues, this paper presents FedQClip, an innovative approach that combines quantization and Clipped SGD. FedQClip leverages an adaptive step size inversely proportional to the $ell_{2}$ norm of the gradient, effectively mitigating the negative impacts of quantized errors. Additionally, clipped operations can be applied locally and globally to further expedite training. Theoretical analyses provide evidence that, even under the settings of Non-IID (non-independent and identically distributed) data, FedQClip achieves a convergence rate of $mathcal{O}(frac{1}{sqrt{T}})$, effectively addressing the convergence degradation caused by compression errors. Furthermore, our theoretical analysis highlights the importance of selecting an appropriate number of local updates to enhance the convergence of FL training. Through extensive experiments, we demonstrate that FedQClip outperforms state-of-the-art methods in terms of communication efficiency and convergence rate.
联邦学习(FL)已经成为一种有前途的技术,可以在多个参与者之间协作训练机器学习模型,同时保护隐私敏感数据。然而,当采用随机梯度下降(SGD)等迭代优化方法时,传统的参数服务器架构在通信开销方面存在挑战。尽管通信压缩技术可以在每个训练回合中降低FL的流量成本,但它们通常会导致收敛速度下降,主要原因是压缩错误和数据异构。为了解决这些问题,本文提出了FedQClip,一种结合量化和Clipped SGD的创新方法。FedQClip利用与梯度的$ell_{2}$范数成反比的自适应步长,有效地减轻量化误差的负面影响。此外,剪辑操作可以应用于本地和全球,以进一步加快培训。理论分析证明,即使在非iid(非独立同分布)数据的设置下,FedQClip的收敛率也达到$mathcal{O}(frac{1}{sqrt{T}})$,有效地解决了压缩错误导致的收敛性下降问题。此外,我们的理论分析强调了选择适当数量的局部更新以增强FL训练收敛性的重要性。通过广泛的实验,我们证明FedQClip在通信效率和收敛速度方面优于最先进的方法。
{"title":"FedQClip: Accelerating Federated Learning via Quantized Clipped SGD","authors":"Zhihao Qu;Ninghui Jia;Baoliu Ye;Shihong Hu;Song Guo","doi":"10.1109/TC.2024.3477972","DOIUrl":"https://doi.org/10.1109/TC.2024.3477972","url":null,"abstract":"Federated Learning (FL) has emerged as a promising technique for collaboratively training machine learning models among multiple participants while preserving privacy-sensitive data. However, the conventional parameter server architecture presents challenges in terms of communication overhead when employing iterative optimization methods such as Stochastic Gradient Descent (SGD). Although communication compression techniques can reduce the traffic cost of FL during each training round, they often lead to degraded convergence rates, mainly due to compression errors and data heterogeneity. To address these issues, this paper presents FedQClip, an innovative approach that combines quantization and Clipped SGD. FedQClip leverages an adaptive step size inversely proportional to the <inline-formula><tex-math>$ell_{2}$</tex-math></inline-formula> norm of the gradient, effectively mitigating the negative impacts of quantized errors. Additionally, clipped operations can be applied locally and globally to further expedite training. Theoretical analyses provide evidence that, even under the settings of Non-IID (non-independent and identically distributed) data, FedQClip achieves a convergence rate of <inline-formula><tex-math>$mathcal{O}(frac{1}{sqrt{T}})$</tex-math></inline-formula>, effectively addressing the convergence degradation caused by compression errors. Furthermore, our theoretical analysis highlights the importance of selecting an appropriate number of local updates to enhance the convergence of FL training. Through extensive experiments, we demonstrate that FedQClip outperforms state-of-the-art methods in terms of communication efficiency and convergence rate.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 2","pages":"717-730"},"PeriodicalIF":3.6,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Deep Learning-Assisted Template Attack Against Dynamic Frequency Scaling Countermeasures 一种深度学习辅助模板攻击对抗动态频率缩放对抗
IF 3.6 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-10-10 DOI: 10.1109/TC.2024.3477997
Davide Galli;Francesco Lattari;Matteo Matteucci;Davide Zoni
In the last decades, machine learning techniques have been extensively used in place of classical template attacks to implement profiled side-channel analysis. This manuscript focuses on the application of machine learning to counteract Dynamic Frequency Scaling defenses. While state-of-the-art attacks have shown promising results against desynchronization countermeasures, a robust attack strategy has yet to be realized. Motivated by the simplicity and effectiveness of template attacks for devices lacking desynchronization countermeasures, this work presents a Deep Learning-assisted Template Attack (DLaTA) methodology specifically designed to target highly desynchronized traces through Dynamic Frequency Scaling. A deep learning-based pre-processing step recovers information obscured by desynchronization, followed by a template attack for key extraction. Specifically, we developed a three-stage deep learning pipeline to resynchronize traces to a uniform reference clock frequency. The experimental results on the AES cryptosystem executed on a RISC-V System-on-Chip reported a Guessing Entropy equal to 1 and a Guessing Distance greater than 0.25. Results demonstrate the method's ability to successfully retrieve secret keys even in the presence of high desynchronization. As an additional contribution, we publicly release our DFS_DESYNCH database1

https://github.com/hardware-fab/DLaTA

containing the first set of real-world highly desynchronized power traces from the execution of a software AES cryptosystem.
在过去的几十年里,机器学习技术已经被广泛用于代替经典的模板攻击来实现侧信道分析。这篇论文的重点是应用机器学习来抵消动态频率缩放防御。虽然最先进的攻击已经显示出对抗非同步对策的有希望的结果,但一个强大的攻击策略尚未实现。由于缺乏非同步对策的设备的模板攻击的简单性和有效性,本工作提出了一种深度学习辅助模板攻击(DLaTA)方法,专门设计用于通过动态频率缩放针对高度非同步的跟踪。基于深度学习的预处理步骤恢复因不同步而模糊的信息,然后进行模板攻击以提取密钥。具体来说,我们开发了一个三阶段的深度学习管道,将迹线重新同步到统一的参考时钟频率。在RISC-V片上执行AES密码系统的实验结果显示,猜测熵等于1,猜测距离大于0.25。结果表明,即使在高度不同步的情况下,该方法也能成功检索密钥。作为一个额外的贡献,我们公开发布了我们的DFS_DESYNCH数据库11https://github.com/hardware-fab/DLaTA,其中包含了从软件AES密码系统执行中获得的第一组真实世界高度非同步的功率跟踪。
{"title":"A Deep Learning-Assisted Template Attack Against Dynamic Frequency Scaling Countermeasures","authors":"Davide Galli;Francesco Lattari;Matteo Matteucci;Davide Zoni","doi":"10.1109/TC.2024.3477997","DOIUrl":"https://doi.org/10.1109/TC.2024.3477997","url":null,"abstract":"In the last decades, machine learning techniques have been extensively used in place of classical template attacks to implement profiled side-channel analysis. This manuscript focuses on the application of machine learning to counteract Dynamic Frequency Scaling defenses. While state-of-the-art attacks have shown promising results against desynchronization countermeasures, a robust attack strategy has yet to be realized. Motivated by the simplicity and effectiveness of template attacks for devices lacking desynchronization countermeasures, this work presents a Deep Learning-assisted Template Attack (DLaTA) methodology specifically designed to target highly desynchronized traces through Dynamic Frequency Scaling. A deep learning-based pre-processing step recovers information obscured by desynchronization, followed by a template attack for key extraction. Specifically, we developed a three-stage deep learning pipeline to resynchronize traces to a uniform reference clock frequency. The experimental results on the AES cryptosystem executed on a RISC-V System-on-Chip reported a Guessing Entropy equal to 1 and a Guessing Distance greater than 0.25. Results demonstrate the method's ability to successfully retrieve secret keys even in the presence of high desynchronization. As an additional contribution, we publicly release our \u0000<monospace>DFS_DESYNCH</monospace>\u0000 database\u0000<xref><sup>1</sup></xref>\u0000<fn><label><sup>1</sup></label><p><uri>https://github.com/hardware-fab/DLaTA</uri></p></fn>\u0000 containing the first set of real-world highly desynchronized power traces from the execution of a software AES cryptosystem.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 1","pages":"293-306"},"PeriodicalIF":3.6,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10713265","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Balancing Privacy and Accuracy Using Significant Gradient Protection in Federated Learning 在联邦学习中使用显著梯度保护平衡隐私和准确性
IF 3.6 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-10-10 DOI: 10.1109/TC.2024.3477971
Benteng Zhang;Yingchi Mao;Xiaoming He;Huawei Huang;Jie Wu
Previous state-of-the-art studies have demonstrated that adversaries can access sensitive user data by membership inference attacks (MIAs) in Federated Learning (FL). Introducing differential privacy (DP) into the FL framework is an effective way to enhance the privacy of FL. Nevertheless, in differentially private federated learning (DP-FL), local gradients become excessively sparse in certain training rounds. Especially when training with low privacy budgets, there is a risk of introducing excessive noise into clients’ gradients. This issue can lead to a significant degradation in the accuracy of the global model. Thus, how to balance the user's privacy and global model accuracy becomes a challenge in DP-FL. To this end, we propose an approach, known as differential privacy federated aggregation, based on significant gradient protection (DP-FedASGP). DP-FedASGP can mitigate excessive noises by protecting significant gradients and accelerate the convergence of the global model by calculating dynamic aggregation weights for gradients. Experimental results show that DP-FedASGP achieves comparable privacy protection effects to DP-FedAvg and cpSGD (communication-private SGD based on gradient quantization) but outperforms DP-FedSNLC (sparse noise based on clipping losses and privacy budget costs) and FedSMP (sparsified model perturbation). Furthermore, the average global test accuracy of DP-FedASGP across four datasets and three models is about $2.62$%, $4.71$%, $0.45$%, and $0.19$% higher than the above methods, respectively. These improvements indicate that DP-FedASGP is a promising approach for balancing the privacy and accuracy of DP-FL.
先前的最新研究表明,攻击者可以通过联邦学习(FL)中的成员推理攻击(mia)访问敏感用户数据。在FL框架中引入差分隐私(DP)是增强FL隐私性的有效方法,但在差分隐私联邦学习(DP-FL)中,局部梯度在某些训练轮中会变得过于稀疏。特别是在隐私预算较低的情况下进行训练时,有可能在客户端的梯度中引入过多的噪声。这个问题会导致全球模型的精度显著下降。因此,如何平衡用户隐私和全局模型精度成为DP-FL的一个挑战。为此,我们提出了一种基于显著梯度保护(DP-FedASGP)的差分隐私联邦聚合方法。DP-FedASGP可以通过保护显著梯度来减轻过度噪声,并通过计算梯度的动态聚合权来加速全局模型的收敛。实验结果表明,DP-FedASGP的隐私保护效果与DP-FedAvg和cpSGD(基于梯度量化的通信私有SGD)相当,但优于DP-FedSNLC(基于裁剪损失和隐私预算成本的稀疏噪声)和FedSMP(稀疏化模型扰动)。此外,DP-FedASGP在4个数据集和3种模型上的平均全局测试精度分别比上述方法高2.62美元%、4.71美元%、0.45美元%和0.19美元%。这些改进表明DP-FedASGP是一种很有前途的平衡DP-FL的隐私性和准确性的方法。
{"title":"Balancing Privacy and Accuracy Using Significant Gradient Protection in Federated Learning","authors":"Benteng Zhang;Yingchi Mao;Xiaoming He;Huawei Huang;Jie Wu","doi":"10.1109/TC.2024.3477971","DOIUrl":"https://doi.org/10.1109/TC.2024.3477971","url":null,"abstract":"Previous state-of-the-art studies have demonstrated that adversaries can access sensitive user data by membership inference attacks (MIAs) in Federated Learning (FL). Introducing differential privacy (DP) into the FL framework is an effective way to enhance the privacy of FL. Nevertheless, in differentially private federated learning (DP-FL), local gradients become excessively sparse in certain training rounds. Especially when training with low privacy budgets, there is a risk of introducing excessive noise into clients’ gradients. This issue can lead to a significant degradation in the accuracy of the global model. Thus, how to balance the user's privacy and global model accuracy becomes a challenge in DP-FL. To this end, we propose an approach, known as differential privacy federated aggregation, based on significant gradient protection (DP-FedASGP). DP-FedASGP can mitigate excessive noises by protecting significant gradients and accelerate the convergence of the global model by calculating dynamic aggregation weights for gradients. Experimental results show that DP-FedASGP achieves comparable privacy protection effects to DP-FedAvg and cpSGD (communication-private SGD based on gradient quantization) but outperforms DP-FedSNLC (sparse noise based on clipping losses and privacy budget costs) and FedSMP (sparsified model perturbation). Furthermore, the average global test accuracy of DP-FedASGP across four datasets and three models is about \u0000<inline-formula><tex-math>$2.62$</tex-math></inline-formula>\u0000%, \u0000<inline-formula><tex-math>$4.71$</tex-math></inline-formula>\u0000%, \u0000<inline-formula><tex-math>$0.45$</tex-math></inline-formula>\u0000%, and \u0000<inline-formula><tex-math>$0.19$</tex-math></inline-formula>\u0000% higher than the above methods, respectively. These improvements indicate that DP-FedASGP is a promising approach for balancing the privacy and accuracy of DP-FL.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 1","pages":"278-292"},"PeriodicalIF":3.6,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Collaborative Neural Architecture Search for Personalized Federated Learning 个性化联邦学习的协同神经架构搜索
IF 3.6 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-10-10 DOI: 10.1109/TC.2024.3477945
Yi Liu;Song Guo;Jie Zhang;Zicong Hong;Yufeng Zhan;Qihua Zhou
Personalized federated learning (pFL) is a promising approach to train customized models for multiple clients over heterogeneous data distributions. However, existing works on pFL often rely on the optimization of model parameters and ignore the personalization demand on neural network architecture, which can greatly affect the model performance in practice. Therefore, generating personalized models with different neural architectures for different clients is a key issue in implementing pFL in a heterogeneous environment. Motivated by Neural Architecture Search (NAS), a model architecture searching methodology, this paper aims to automate the model design in a collaborative manner while achieving good training performance for each client. Specifically, we reconstruct the centralized searching of NAS into the distributed scheme called Personalized Architecture Search (PAS), where differentiable architecture fine-tuning is achieved via gradient-descent optimization, thus making each client obtain the most appropriate model. Furthermore, to aggregate knowledge from heterogeneous neural architectures, a knowledge distillation-based training framework is proposed to achieve a good trade-off between generalization and personalization in federated learning. Extensive experiments demonstrate that our architecture-level personalization method achieves higher accuracy under the non-iid settings, while not aggravating model complexity over state-of-the-art benchmarks.
个性化联邦学习(pFL)是一种很有前途的方法,可以在异构数据分布上为多个客户端训练定制模型。然而,现有的pFL研究往往依赖于模型参数的优化,而忽略了神经网络结构的个性化需求,这在实际应用中会极大地影响模型的性能。因此,为不同的客户端生成具有不同神经结构的个性化模型是在异构环境中实现pFL的关键问题。基于神经架构搜索(NAS)这一模型架构搜索方法,本文旨在以协作方式实现模型设计的自动化,同时为每个客户端提供良好的训练性能。具体而言,我们将NAS的集中搜索重构为一种名为个性化架构搜索(Personalized Architecture Search, PAS)的分布式方案,其中通过梯度下降优化实现可微架构微调,从而使每个客户端获得最合适的模型。此外,为了从异构神经结构中聚合知识,提出了一种基于知识蒸馏的训练框架,以实现联邦学习中泛化和个性化之间的良好权衡。大量的实验表明,我们的架构级个性化方法在非id设置下实现了更高的精度,同时与最先进的基准测试相比,不会增加模型的复杂性。
{"title":"Collaborative Neural Architecture Search for Personalized Federated Learning","authors":"Yi Liu;Song Guo;Jie Zhang;Zicong Hong;Yufeng Zhan;Qihua Zhou","doi":"10.1109/TC.2024.3477945","DOIUrl":"https://doi.org/10.1109/TC.2024.3477945","url":null,"abstract":"Personalized federated learning (pFL) is a promising approach to train customized models for multiple clients over heterogeneous data distributions. However, existing works on pFL often rely on the optimization of model parameters and ignore the personalization demand on neural network architecture, which can greatly affect the model performance in practice. Therefore, generating personalized models with different neural architectures for different clients is a key issue in implementing pFL in a heterogeneous environment. Motivated by Neural Architecture Search (NAS), a model architecture searching methodology, this paper aims to automate the model design in a collaborative manner while achieving good training performance for each client. Specifically, we reconstruct the centralized searching of NAS into the distributed scheme called Personalized Architecture Search (PAS), where differentiable architecture fine-tuning is achieved via gradient-descent optimization, thus making each client obtain the most appropriate model. Furthermore, to aggregate knowledge from heterogeneous neural architectures, a knowledge distillation-based training framework is proposed to achieve a good trade-off between generalization and personalization in federated learning. Extensive experiments demonstrate that our architecture-level personalization method achieves higher accuracy under the non-iid settings, while not aggravating model complexity over state-of-the-art benchmarks.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 1","pages":"250-262"},"PeriodicalIF":3.6,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Heterogeneous and Adaptive Architecture for Decision-Tree-Based ACL Engine on FPGA 基于FPGA的决策树ACL引擎异构自适应架构
IF 3.6 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-10-10 DOI: 10.1109/TC.2024.3477955
Yao Xin;Chengjun Jia;Wenjun Li;Ori Rottenstreich;Yang Xu;Gaogang Xie;Zhihong Tian;Jun Li
Access Control Lists (ACLs) are crucial for ensuring the security and integrity of modern cloud and carrier networks by regulating access to sensitive information and resources. However, previous software and hardware implementations no longer meet the requirements of modern datacenters. The emergence of FPGA-based SmartNICs presents an opportunity to offload ACL functions from the host CPU, leading to improved network performance in datacenter applications. However, previous FPGA-based ACL designs lacked the necessary flexibility to support different rulesets without hardware reconfiguration while maintaining high performance. In this paper, we propose HACL, a heterogeneous and adaptive architecture for decision-tree-based ACL engine on FPGA. By employing techniques such as tree decomposition and recirculated pipeline scheduling, HACL can accommodate various rulesets without reconfiguring the underlying architecture. To facilitate the efficient mapping of different decision trees to memory and optimize the throughput of a ruleset, we also introduce a heterogeneous framework with a compiler in CPU platform for HACL. We implement HACL on a typical SmartNIC and evaluate its performance. The results demonstrate that HACL achieves a throughput exceeding 260 Mpps when processing 100K-scale ACL rulesets, with low hardware resource utilization. By integrating more engines, HACL can achieve even higher throughput and support larger rulesets.
访问控制列表(acl)通过规范对敏感信息和资源的访问,对于确保现代云和运营商网络的安全性和完整性至关重要。但是,以前的软件和硬件实现已经不能满足现代数据中心的需求。基于fpga的smartnic的出现为从主机CPU中卸载ACL功能提供了机会,从而提高了数据中心应用程序的网络性能。然而,以前基于fpga的ACL设计缺乏必要的灵活性,无法在不重新配置硬件的情况下支持不同的规则集,同时保持高性能。本文提出了一种基于FPGA的决策树ACL引擎的异构自适应结构。通过采用树分解和循环管道调度等技术,HACL可以适应各种规则集,而无需重新配置底层体系结构。为了促进不同决策树到内存的有效映射和优化规则集的吞吐量,我们还在CPU平台上引入了一个带有编译器的异构框架。我们在一个典型的SmartNIC上实现了HACL,并对其性能进行了评估。结果表明,在处理100k级ACL规则集时,ACL的吞吐量超过260 Mpps,硬件资源利用率较低。通过集成更多的引擎,HACL可以实现更高的吞吐量并支持更大的规则集。
{"title":"A Heterogeneous and Adaptive Architecture for Decision-Tree-Based ACL Engine on FPGA","authors":"Yao Xin;Chengjun Jia;Wenjun Li;Ori Rottenstreich;Yang Xu;Gaogang Xie;Zhihong Tian;Jun Li","doi":"10.1109/TC.2024.3477955","DOIUrl":"https://doi.org/10.1109/TC.2024.3477955","url":null,"abstract":"Access Control Lists (ACLs) are crucial for ensuring the security and integrity of modern cloud and carrier networks by regulating access to sensitive information and resources. However, previous software and hardware implementations no longer meet the requirements of modern datacenters. The emergence of FPGA-based SmartNICs presents an opportunity to offload ACL functions from the host CPU, leading to improved network performance in datacenter applications. However, previous FPGA-based ACL designs lacked the necessary flexibility to support different rulesets without hardware reconfiguration while maintaining high performance. In this paper, we propose HACL, a heterogeneous and adaptive architecture for decision-tree-based ACL engine on FPGA. By employing techniques such as tree decomposition and recirculated pipeline scheduling, HACL can accommodate various rulesets without reconfiguring the underlying architecture. To facilitate the efficient mapping of different decision trees to memory and optimize the throughput of a ruleset, we also introduce a heterogeneous framework with a compiler in CPU platform for HACL. We implement HACL on a typical SmartNIC and evaluate its performance. The results demonstrate that HACL achieves a throughput exceeding 260 Mpps when processing 100K-scale ACL rulesets, with low hardware resource utilization. By integrating more engines, HACL can achieve even higher throughput and support larger rulesets.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 1","pages":"263-277"},"PeriodicalIF":3.6,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dependability of the K Minimum Values Sketch: Protection and Comparative Analysis K最小值草图的可靠性:保护与比较分析
IF 3.6 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-10-09 DOI: 10.1109/TC.2024.3475588
Jinhua Zhu;Zhen Gao;Pedro Reviriego;Shanshan Liu;Fabrizio Lombardi
A basic operation in big data analysis is to find the cardinality estimate; to estimate the cardinality at high speed and with a low memory requirement, data sketches that provide approximate estimates, are usually used. The K Minimum Value (KMV) sketch is one of the most popular options; however, soft errors on memories in KMV may substantially degrade performance. This paper is the first to consider the impact of soft errors on the KMV sketch and to compare it with HyperLogLog (HLL), another widely used sketch for cardinality estimate. Initially, the operation of KMV in the presence of soft errors (so its dependability) in the memory is studied by a theoretical analysis and simulation by error injection. The evaluation results show that errors during the construction phase of KMV may cause large deviations in the estimate results. Subsequently, based on the algorithmic features of the KMV sketch, two protection schemes are proposed. The first scheme is based on using a single parity check (SPC) to detect errors and reduce their impact on the cardinality estimate; the second scheme is based on the incremental property of the memory list in KMV. The presented evaluation shows that both schemes can dramatically improve the performance of KMV, and the SPC scheme performs better even though it requires more memory footprint and overheads in the checking operation. Finally, it is shown that soft errors on the unprotected KMV produce larger worst-case errors than in HLL, but the average impact of errors is lower; also, the protected KMV using the proposed schemes are more dependable than HLL with existing protection techniques.
大数据分析中的一个基本操作是找到基数估计;为了在高速和低内存需求下估计基数,通常使用提供近似估计的数据草图。K最小值(KMV)草图是最流行的选择之一;然而,KMV中内存上的软错误可能会大大降低性能。本文首次考虑了软误差对KMV草图的影响,并将其与另一种广泛用于基数估计的草图HyperLogLog (HLL)进行了比较。首先,通过理论分析和误差注入仿真研究了存储器中存在软误差(即其可靠性)时KMV的运行。评价结果表明,KMV建设阶段的误差可能导致评价结果出现较大偏差。随后,根据KMV草图的算法特点,提出了两种保护方案。第一种方案是基于使用单个奇偶校验(SPC)来检测错误并减少它们对基数估计的影响;第二种方案是基于KMV中内存列表的增量特性。给出的评估表明,两种方案都可以显著提高KMV的性能,尽管SPC方案在检查操作中需要更多的内存占用和开销,但它的性能更好。结果表明,软误差在无保护KMV上产生的最坏情况误差大于无保护KMV,但误差的平均影响较小;此外,采用所提方案的受保护KMV比采用现有保护技术的HLL更可靠。
{"title":"Dependability of the K Minimum Values Sketch: Protection and Comparative Analysis","authors":"Jinhua Zhu;Zhen Gao;Pedro Reviriego;Shanshan Liu;Fabrizio Lombardi","doi":"10.1109/TC.2024.3475588","DOIUrl":"https://doi.org/10.1109/TC.2024.3475588","url":null,"abstract":"A basic operation in big data analysis is to find the cardinality estimate; to estimate the cardinality at high speed and with a low memory requirement, data sketches that provide approximate estimates, are usually used. The K Minimum Value (KMV) sketch is one of the most popular options; however, soft errors on memories in KMV may substantially degrade performance. This paper is the first to consider the impact of soft errors on the KMV sketch and to compare it with HyperLogLog (HLL), another widely used sketch for cardinality estimate. Initially, the operation of KMV in the presence of soft errors (so its dependability) in the memory is studied by a theoretical analysis and simulation by error injection. The evaluation results show that errors during the construction phase of KMV may cause large deviations in the estimate results. Subsequently, based on the algorithmic features of the KMV sketch, two protection schemes are proposed. The first scheme is based on using a single parity check (SPC) to detect errors and reduce their impact on the cardinality estimate; the second scheme is based on the incremental property of the memory list in KMV. The presented evaluation shows that both schemes can dramatically improve the performance of KMV, and the SPC scheme performs better even though it requires more memory footprint and overheads in the checking operation. Finally, it is shown that soft errors on the unprotected KMV produce larger worst-case errors than in HLL, but the average impact of errors is lower; also, the protected KMV using the proposed schemes are more dependable than HLL with existing protection techniques.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 1","pages":"210-221"},"PeriodicalIF":3.6,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Computers
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1