首页 > 最新文献

IEEE Transactions on Computers最新文献

英文 中文
ECO-CRYSTALS: Efficient Cryptography CRYSTALS on Standard RISC-V ISA
IF 3.6 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-10-21 DOI: 10.1109/TC.2024.3483631
Xinyi Ji;Jiankuo Dong;Junhao Huang;Zhijian Yuan;Wangchen Dai;Fu Xiao;Jingqiang Lin
The field of post-quantum cryptography (PQC) is continuously evolving. Many researchers are exploring efficient PQC implementation on various platforms, including x86, ARM, FPGA, GPU, etc. In this paper, we present an Efficient CryptOgraphy CRYSTALS (ECO-CRYSTALS) implementation on standard 64-bit RISC-V Instruction Set Architecture (ISA). The target schemes are two winners of the National Institute of Standards and Technology (NIST) PQC competition: CRYSTALS-Kyber and CRYSTALS-Dilithium, where the two most time-consuming operations are Keccak and polynomial multiplication. Notably, this paper is the first highly-optimized assembly software implementation to deploy Kyber and Dilithium on the 64-bit RISC-V ISA. Firstly, we propose a better scheduling strategy for Keccak, which is specifically tailored for the 64-bit dual-issue RISC-V architecture. Our 24-round Keccak permutation (Keccak-$p$[1600,24]) achieves a 59.18% speed-up compared to the reference implementation. Secondly, we apply two modular arithmetic (Montgomery arithmetic and Plantard arithmetic) in the polynomial multiplication of Kyber and Dilithium to get a better lazy reduction. Then, we propose a flexible dual-instruction-issue scheme of Number Theoretic Transform (NTT). As for the matrix-vector multiplication, we introduce a row-to-column processing methodology to minimize the expensive memory access operations. Compared to the reference implementation, we obtain a speedup of 53.85%$thicksim$85.57% for NTT, matrix-vector multiplication, and INTT in our ECO-CRYSTALS. Finally, the ECO-CRYSTALS implementation for key generation, encapsulation, and decapsulation in Kyber achieves 399k, 448k, and 479k cycles respectively, achieving speedups of 60.82%, 63.93%, and 65.56% compared to the NIST reference implementation. Similarly, the ECO-CRYSTALS implementation for key generation, sign, and verify in Dilithium reaches 1 364k, 3 191k, and 1 369k cycles, showcasing speedups of 54.84%, 64.98%, and 57.20%, respectively.
{"title":"ECO-CRYSTALS: Efficient Cryptography CRYSTALS on Standard RISC-V ISA","authors":"Xinyi Ji;Jiankuo Dong;Junhao Huang;Zhijian Yuan;Wangchen Dai;Fu Xiao;Jingqiang Lin","doi":"10.1109/TC.2024.3483631","DOIUrl":"https://doi.org/10.1109/TC.2024.3483631","url":null,"abstract":"The field of post-quantum cryptography (PQC) is continuously evolving. Many researchers are exploring efficient PQC implementation on various platforms, including x86, ARM, FPGA, GPU, etc. In this paper, we present an Efficient CryptOgraphy CRYSTALS (ECO-CRYSTALS) implementation on standard 64-bit RISC-V Instruction Set Architecture (ISA). The target schemes are two winners of the National Institute of Standards and Technology (NIST) PQC competition: CRYSTALS-Kyber and CRYSTALS-Dilithium, where the two most time-consuming operations are Keccak and polynomial multiplication. Notably, this paper is the first highly-optimized assembly software implementation to deploy Kyber and Dilithium on the 64-bit RISC-V ISA. Firstly, we propose a better scheduling strategy for Keccak, which is specifically tailored for the 64-bit dual-issue RISC-V architecture. Our 24-round Keccak permutation (Keccak-<inline-formula><tex-math>$p$</tex-math></inline-formula>[1600,24]) achieves a 59.18% speed-up compared to the reference implementation. Secondly, we apply two modular arithmetic (Montgomery arithmetic and Plantard arithmetic) in the polynomial multiplication of Kyber and Dilithium to get a better lazy reduction. Then, we propose a flexible dual-instruction-issue scheme of Number Theoretic Transform (NTT). As for the matrix-vector multiplication, we introduce a row-to-column processing methodology to minimize the expensive memory access operations. Compared to the reference implementation, we obtain a speedup of 53.85%<inline-formula><tex-math>$thicksim$</tex-math></inline-formula>85.57% for NTT, matrix-vector multiplication, and INTT in our ECO-CRYSTALS. Finally, the ECO-CRYSTALS implementation for key generation, encapsulation, and decapsulation in Kyber achieves 399k, 448k, and 479k cycles respectively, achieving speedups of 60.82%, 63.93%, and 65.56% compared to the NIST reference implementation. Similarly, the ECO-CRYSTALS implementation for key generation, sign, and verify in Dilithium reaches 1 364k, 3 191k, and 1 369k cycles, showcasing speedups of 54.84%, 64.98%, and 57.20%, respectively.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 2","pages":"401-413"},"PeriodicalIF":3.6,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stream: Design Space Exploration of Layer-Fused DNNs on Heterogeneous Dataflow Accelerators 流:异构数据流加速器上层融合dnn的设计空间探索
IF 3.6 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-10-10 DOI: 10.1109/TC.2024.3477938
Arne Symons;Linyan Mei;Steven Colleman;Pouya Houshmand;Sebastian Karl;Marian Verhelst
As the landscape of deep neural networks evolves, heterogeneous dataflow accelerators, in the form of multi-core architectures or chiplet-based designs, promise more flexibility and higher inference performance through scalability. So far, these systems exploit the increased parallelism by coarsely mapping a single layer at a time across cores, which incurs frequent costly off-chip memory accesses, or by pipelining batches of inputs, which falls short in meeting the demands of latency-critical applications. To alleviate these bottlenecks, this work explores a new fine-grain mapping paradigm, referred to as layer fusion, on heterogeneous dataflow accelerators through a novel design space exploration framework called Stream. Stream captures a wide variety of heterogeneous dataflow architectures and mapping granularities, and implements a memory and communication-aware latency and energy analysis validated with three distinct state-of-the-art hardware implementations. As such, it facilitates a holistic exploration of architecture and mapping, by strategically allocating the workload through constraint optimization. The findings demonstrate that the integration of layer fusion with heterogeneous dataflow accelerators yields up to $2.2times$ lower energy-delay product in inference efficiency, addressing both energy consumption and latency concerns. The framework is available open-source at: github.com/kuleuven-micas/stream.
随着深度神经网络的发展,异构数据流加速器以多核架构或基于芯片的设计的形式,通过可扩展性承诺更大的灵活性和更高的推理性能。到目前为止,这些系统通过在内核之间一次粗略地映射一个层来利用增加的并行性,这导致频繁的昂贵的片外内存访问,或者通过流水线批量输入,这无法满足延迟关键应用程序的需求。为了缓解这些瓶颈,本研究通过一种名为Stream的新颖设计空间探索框架,在异构数据流加速器上探索了一种新的细粒度映射范式,称为层融合。Stream捕获各种各样的异构数据流架构和映射粒度,并实现内存和通信感知延迟和能量分析,并通过三种不同的最先进的硬件实现进行验证。因此,它通过约束优化战略性地分配工作负载,促进了对体系结构和映射的整体探索。研究结果表明,层融合与异构数据流加速器的集成在推理效率方面产生高达2.2倍的能量延迟产品,同时解决了能耗和延迟问题。该框架的开源地址是:github.com/kuleuven-micas/stream。
{"title":"Stream: Design Space Exploration of Layer-Fused DNNs on Heterogeneous Dataflow Accelerators","authors":"Arne Symons;Linyan Mei;Steven Colleman;Pouya Houshmand;Sebastian Karl;Marian Verhelst","doi":"10.1109/TC.2024.3477938","DOIUrl":"https://doi.org/10.1109/TC.2024.3477938","url":null,"abstract":"As the landscape of deep neural networks evolves, heterogeneous dataflow accelerators, in the form of multi-core architectures or chiplet-based designs, promise more flexibility and higher inference performance through scalability. So far, these systems exploit the increased parallelism by coarsely mapping a single layer at a time across cores, which incurs frequent costly off-chip memory accesses, or by pipelining batches of inputs, which falls short in meeting the demands of latency-critical applications. To alleviate these bottlenecks, this work explores a new fine-grain mapping paradigm, referred to as layer fusion, on heterogeneous dataflow accelerators through a novel design space exploration framework called \u0000<i>Stream</i>\u0000. \u0000<i>Stream</i>\u0000 captures a wide variety of heterogeneous dataflow architectures and mapping granularities, and implements a memory and communication-aware latency and energy analysis validated with three distinct state-of-the-art hardware implementations. As such, it facilitates a holistic exploration of architecture and mapping, by strategically allocating the workload through constraint optimization. The findings demonstrate that the integration of layer fusion with heterogeneous dataflow accelerators yields up to \u0000<inline-formula><tex-math>$2.2times$</tex-math></inline-formula>\u0000 lower energy-delay product in inference efficiency, addressing both energy consumption and latency concerns. The framework is available open-source at: github.com/kuleuven-micas/stream.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 1","pages":"237-249"},"PeriodicalIF":3.6,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FedQClip: Accelerating Federated Learning via Quantized Clipped SGD FedQClip:通过量化的裁剪SGD加速联邦学习
IF 3.6 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-10-10 DOI: 10.1109/TC.2024.3477972
Zhihao Qu;Ninghui Jia;Baoliu Ye;Shihong Hu;Song Guo
Federated Learning (FL) has emerged as a promising technique for collaboratively training machine learning models among multiple participants while preserving privacy-sensitive data. However, the conventional parameter server architecture presents challenges in terms of communication overhead when employing iterative optimization methods such as Stochastic Gradient Descent (SGD). Although communication compression techniques can reduce the traffic cost of FL during each training round, they often lead to degraded convergence rates, mainly due to compression errors and data heterogeneity. To address these issues, this paper presents FedQClip, an innovative approach that combines quantization and Clipped SGD. FedQClip leverages an adaptive step size inversely proportional to the $ell_{2}$ norm of the gradient, effectively mitigating the negative impacts of quantized errors. Additionally, clipped operations can be applied locally and globally to further expedite training. Theoretical analyses provide evidence that, even under the settings of Non-IID (non-independent and identically distributed) data, FedQClip achieves a convergence rate of $mathcal{O}(frac{1}{sqrt{T}})$, effectively addressing the convergence degradation caused by compression errors. Furthermore, our theoretical analysis highlights the importance of selecting an appropriate number of local updates to enhance the convergence of FL training. Through extensive experiments, we demonstrate that FedQClip outperforms state-of-the-art methods in terms of communication efficiency and convergence rate.
联邦学习(FL)已经成为一种有前途的技术,可以在多个参与者之间协作训练机器学习模型,同时保护隐私敏感数据。然而,当采用随机梯度下降(SGD)等迭代优化方法时,传统的参数服务器架构在通信开销方面存在挑战。尽管通信压缩技术可以在每个训练回合中降低FL的流量成本,但它们通常会导致收敛速度下降,主要原因是压缩错误和数据异构。为了解决这些问题,本文提出了FedQClip,一种结合量化和Clipped SGD的创新方法。FedQClip利用与梯度的$ell_{2}$范数成反比的自适应步长,有效地减轻量化误差的负面影响。此外,剪辑操作可以应用于本地和全球,以进一步加快培训。理论分析证明,即使在非iid(非独立同分布)数据的设置下,FedQClip的收敛率也达到$mathcal{O}(frac{1}{sqrt{T}})$,有效地解决了压缩错误导致的收敛性下降问题。此外,我们的理论分析强调了选择适当数量的局部更新以增强FL训练收敛性的重要性。通过广泛的实验,我们证明FedQClip在通信效率和收敛速度方面优于最先进的方法。
{"title":"FedQClip: Accelerating Federated Learning via Quantized Clipped SGD","authors":"Zhihao Qu;Ninghui Jia;Baoliu Ye;Shihong Hu;Song Guo","doi":"10.1109/TC.2024.3477972","DOIUrl":"https://doi.org/10.1109/TC.2024.3477972","url":null,"abstract":"Federated Learning (FL) has emerged as a promising technique for collaboratively training machine learning models among multiple participants while preserving privacy-sensitive data. However, the conventional parameter server architecture presents challenges in terms of communication overhead when employing iterative optimization methods such as Stochastic Gradient Descent (SGD). Although communication compression techniques can reduce the traffic cost of FL during each training round, they often lead to degraded convergence rates, mainly due to compression errors and data heterogeneity. To address these issues, this paper presents FedQClip, an innovative approach that combines quantization and Clipped SGD. FedQClip leverages an adaptive step size inversely proportional to the <inline-formula><tex-math>$ell_{2}$</tex-math></inline-formula> norm of the gradient, effectively mitigating the negative impacts of quantized errors. Additionally, clipped operations can be applied locally and globally to further expedite training. Theoretical analyses provide evidence that, even under the settings of Non-IID (non-independent and identically distributed) data, FedQClip achieves a convergence rate of <inline-formula><tex-math>$mathcal{O}(frac{1}{sqrt{T}})$</tex-math></inline-formula>, effectively addressing the convergence degradation caused by compression errors. Furthermore, our theoretical analysis highlights the importance of selecting an appropriate number of local updates to enhance the convergence of FL training. Through extensive experiments, we demonstrate that FedQClip outperforms state-of-the-art methods in terms of communication efficiency and convergence rate.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 2","pages":"717-730"},"PeriodicalIF":3.6,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Deep Learning-Assisted Template Attack Against Dynamic Frequency Scaling Countermeasures 一种深度学习辅助模板攻击对抗动态频率缩放对抗
IF 3.6 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-10-10 DOI: 10.1109/TC.2024.3477997
Davide Galli;Francesco Lattari;Matteo Matteucci;Davide Zoni
In the last decades, machine learning techniques have been extensively used in place of classical template attacks to implement profiled side-channel analysis. This manuscript focuses on the application of machine learning to counteract Dynamic Frequency Scaling defenses. While state-of-the-art attacks have shown promising results against desynchronization countermeasures, a robust attack strategy has yet to be realized. Motivated by the simplicity and effectiveness of template attacks for devices lacking desynchronization countermeasures, this work presents a Deep Learning-assisted Template Attack (DLaTA) methodology specifically designed to target highly desynchronized traces through Dynamic Frequency Scaling. A deep learning-based pre-processing step recovers information obscured by desynchronization, followed by a template attack for key extraction. Specifically, we developed a three-stage deep learning pipeline to resynchronize traces to a uniform reference clock frequency. The experimental results on the AES cryptosystem executed on a RISC-V System-on-Chip reported a Guessing Entropy equal to 1 and a Guessing Distance greater than 0.25. Results demonstrate the method's ability to successfully retrieve secret keys even in the presence of high desynchronization. As an additional contribution, we publicly release our DFS_DESYNCH database1

https://github.com/hardware-fab/DLaTA

containing the first set of real-world highly desynchronized power traces from the execution of a software AES cryptosystem.
在过去的几十年里,机器学习技术已经被广泛用于代替经典的模板攻击来实现侧信道分析。这篇论文的重点是应用机器学习来抵消动态频率缩放防御。虽然最先进的攻击已经显示出对抗非同步对策的有希望的结果,但一个强大的攻击策略尚未实现。由于缺乏非同步对策的设备的模板攻击的简单性和有效性,本工作提出了一种深度学习辅助模板攻击(DLaTA)方法,专门设计用于通过动态频率缩放针对高度非同步的跟踪。基于深度学习的预处理步骤恢复因不同步而模糊的信息,然后进行模板攻击以提取密钥。具体来说,我们开发了一个三阶段的深度学习管道,将迹线重新同步到统一的参考时钟频率。在RISC-V片上执行AES密码系统的实验结果显示,猜测熵等于1,猜测距离大于0.25。结果表明,即使在高度不同步的情况下,该方法也能成功检索密钥。作为一个额外的贡献,我们公开发布了我们的DFS_DESYNCH数据库11https://github.com/hardware-fab/DLaTA,其中包含了从软件AES密码系统执行中获得的第一组真实世界高度非同步的功率跟踪。
{"title":"A Deep Learning-Assisted Template Attack Against Dynamic Frequency Scaling Countermeasures","authors":"Davide Galli;Francesco Lattari;Matteo Matteucci;Davide Zoni","doi":"10.1109/TC.2024.3477997","DOIUrl":"https://doi.org/10.1109/TC.2024.3477997","url":null,"abstract":"In the last decades, machine learning techniques have been extensively used in place of classical template attacks to implement profiled side-channel analysis. This manuscript focuses on the application of machine learning to counteract Dynamic Frequency Scaling defenses. While state-of-the-art attacks have shown promising results against desynchronization countermeasures, a robust attack strategy has yet to be realized. Motivated by the simplicity and effectiveness of template attacks for devices lacking desynchronization countermeasures, this work presents a Deep Learning-assisted Template Attack (DLaTA) methodology specifically designed to target highly desynchronized traces through Dynamic Frequency Scaling. A deep learning-based pre-processing step recovers information obscured by desynchronization, followed by a template attack for key extraction. Specifically, we developed a three-stage deep learning pipeline to resynchronize traces to a uniform reference clock frequency. The experimental results on the AES cryptosystem executed on a RISC-V System-on-Chip reported a Guessing Entropy equal to 1 and a Guessing Distance greater than 0.25. Results demonstrate the method's ability to successfully retrieve secret keys even in the presence of high desynchronization. As an additional contribution, we publicly release our \u0000<monospace>DFS_DESYNCH</monospace>\u0000 database\u0000<xref><sup>1</sup></xref>\u0000<fn><label><sup>1</sup></label><p><uri>https://github.com/hardware-fab/DLaTA</uri></p></fn>\u0000 containing the first set of real-world highly desynchronized power traces from the execution of a software AES cryptosystem.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 1","pages":"293-306"},"PeriodicalIF":3.6,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10713265","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AsyncGBP${}^{+}$+: Bridging SSL/TLS and Heterogeneous Computing Power With GPU-Based Providers
IF 3.6 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-10-10 DOI: 10.1109/TC.2024.3477987
Yi Bian;Fangyu Zheng;Yuewu Wang;Lingguang Lei;Yuan Ma;Tian Zhou;Jiankuo Dong;Guang Fan;Jiwu Jing
The rapid evolution of GPUs has emerged as a promising solution for accelerating the worldwide used SSL/TLS, which faces performance bottlenecks due to its underlying heavy cryptographic computations. Nevertheless, substantial structural adjustments from the parallel mode of GPUs to the serial mode of the SSL/TLS stack are imperative, potentially constraining the practical deployment of GPUs. In this paper, we propose AsyncGBP${}^{+}$, a three-level framework that facilitates the seamless conversion of cryptographic requests from synchronous to asynchronous mode. We conduct an in-depth analysis of the OpenSSL provider and cryptographic primitive features relevant to GPU implementations, aiming to fully exploit the potential of GPUs. Notably, AsyncGBP${}^{+}$ supports three working settings (offline/online/hybrid), finely tailored for various public key cryptographic primitives, including traditional ones like X25519, Ed25519, ECDSA, and the quantum-safe CRYSTALS-Kyber. A comprehensive evaluation demonstrates that AsyncGBP${}^{+}$ can efficiently achieve an improvement of up to 137.8$times$ compared to the default OpenSSL provider (for X25519, Ed25519, ECDSA) and 113.30$times$ compared to OpenSSL-compatible liboqs (for CRYSTALS-Kyber) in a single-process setting. Furthermore, AsyncGBP${}^{+}$ surpasses the current fastest commercial-off-the-shelf OpenSSL-compatible TLS accelerator with a 5.3$times$ to 7.0$times$ performance improvement.
{"title":"AsyncGBP${}^{+}$+: Bridging SSL/TLS and Heterogeneous Computing Power With GPU-Based Providers","authors":"Yi Bian;Fangyu Zheng;Yuewu Wang;Lingguang Lei;Yuan Ma;Tian Zhou;Jiankuo Dong;Guang Fan;Jiwu Jing","doi":"10.1109/TC.2024.3477987","DOIUrl":"https://doi.org/10.1109/TC.2024.3477987","url":null,"abstract":"The rapid evolution of GPUs has emerged as a promising solution for accelerating the worldwide used SSL/TLS, which faces performance bottlenecks due to its underlying heavy cryptographic computations. Nevertheless, substantial structural adjustments from the parallel mode of GPUs to the serial mode of the SSL/TLS stack are imperative, potentially constraining the practical deployment of GPUs. In this paper, we propose AsyncGBP<inline-formula><tex-math>${}^{+}$</tex-math></inline-formula>, a three-level framework that facilitates the seamless conversion of cryptographic requests from synchronous to asynchronous mode. We conduct an in-depth analysis of the OpenSSL provider and cryptographic primitive features relevant to GPU implementations, aiming to fully exploit the potential of GPUs. Notably, AsyncGBP<inline-formula><tex-math>${}^{+}$</tex-math></inline-formula> supports three working settings (offline/online/hybrid), finely tailored for various public key cryptographic primitives, including traditional ones like X25519, Ed25519, ECDSA, and the quantum-safe CRYSTALS-Kyber. A comprehensive evaluation demonstrates that AsyncGBP<inline-formula><tex-math>${}^{+}$</tex-math></inline-formula> can efficiently achieve an improvement of up to 137.8<inline-formula><tex-math>$times$</tex-math></inline-formula> compared to the default OpenSSL provider (for X25519, Ed25519, ECDSA) and 113.30<inline-formula><tex-math>$times$</tex-math></inline-formula> compared to OpenSSL-compatible <monospace>liboqs</monospace> (for CRYSTALS-Kyber) in a single-process setting. Furthermore, AsyncGBP<inline-formula><tex-math>${}^{+}$</tex-math></inline-formula> surpasses the current fastest commercial-off-the-shelf OpenSSL-compatible TLS accelerator with a 5.3<inline-formula><tex-math>$times$</tex-math></inline-formula> to 7.0<inline-formula><tex-math>$times$</tex-math></inline-formula> performance improvement.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 2","pages":"356-370"},"PeriodicalIF":3.6,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Balancing Privacy and Accuracy Using Significant Gradient Protection in Federated Learning 在联邦学习中使用显著梯度保护平衡隐私和准确性
IF 3.6 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-10-10 DOI: 10.1109/TC.2024.3477971
Benteng Zhang;Yingchi Mao;Xiaoming He;Huawei Huang;Jie Wu
Previous state-of-the-art studies have demonstrated that adversaries can access sensitive user data by membership inference attacks (MIAs) in Federated Learning (FL). Introducing differential privacy (DP) into the FL framework is an effective way to enhance the privacy of FL. Nevertheless, in differentially private federated learning (DP-FL), local gradients become excessively sparse in certain training rounds. Especially when training with low privacy budgets, there is a risk of introducing excessive noise into clients’ gradients. This issue can lead to a significant degradation in the accuracy of the global model. Thus, how to balance the user's privacy and global model accuracy becomes a challenge in DP-FL. To this end, we propose an approach, known as differential privacy federated aggregation, based on significant gradient protection (DP-FedASGP). DP-FedASGP can mitigate excessive noises by protecting significant gradients and accelerate the convergence of the global model by calculating dynamic aggregation weights for gradients. Experimental results show that DP-FedASGP achieves comparable privacy protection effects to DP-FedAvg and cpSGD (communication-private SGD based on gradient quantization) but outperforms DP-FedSNLC (sparse noise based on clipping losses and privacy budget costs) and FedSMP (sparsified model perturbation). Furthermore, the average global test accuracy of DP-FedASGP across four datasets and three models is about $2.62$%, $4.71$%, $0.45$%, and $0.19$% higher than the above methods, respectively. These improvements indicate that DP-FedASGP is a promising approach for balancing the privacy and accuracy of DP-FL.
先前的最新研究表明,攻击者可以通过联邦学习(FL)中的成员推理攻击(mia)访问敏感用户数据。在FL框架中引入差分隐私(DP)是增强FL隐私性的有效方法,但在差分隐私联邦学习(DP-FL)中,局部梯度在某些训练轮中会变得过于稀疏。特别是在隐私预算较低的情况下进行训练时,有可能在客户端的梯度中引入过多的噪声。这个问题会导致全球模型的精度显著下降。因此,如何平衡用户隐私和全局模型精度成为DP-FL的一个挑战。为此,我们提出了一种基于显著梯度保护(DP-FedASGP)的差分隐私联邦聚合方法。DP-FedASGP可以通过保护显著梯度来减轻过度噪声,并通过计算梯度的动态聚合权来加速全局模型的收敛。实验结果表明,DP-FedASGP的隐私保护效果与DP-FedAvg和cpSGD(基于梯度量化的通信私有SGD)相当,但优于DP-FedSNLC(基于裁剪损失和隐私预算成本的稀疏噪声)和FedSMP(稀疏化模型扰动)。此外,DP-FedASGP在4个数据集和3种模型上的平均全局测试精度分别比上述方法高2.62美元%、4.71美元%、0.45美元%和0.19美元%。这些改进表明DP-FedASGP是一种很有前途的平衡DP-FL的隐私性和准确性的方法。
{"title":"Balancing Privacy and Accuracy Using Significant Gradient Protection in Federated Learning","authors":"Benteng Zhang;Yingchi Mao;Xiaoming He;Huawei Huang;Jie Wu","doi":"10.1109/TC.2024.3477971","DOIUrl":"https://doi.org/10.1109/TC.2024.3477971","url":null,"abstract":"Previous state-of-the-art studies have demonstrated that adversaries can access sensitive user data by membership inference attacks (MIAs) in Federated Learning (FL). Introducing differential privacy (DP) into the FL framework is an effective way to enhance the privacy of FL. Nevertheless, in differentially private federated learning (DP-FL), local gradients become excessively sparse in certain training rounds. Especially when training with low privacy budgets, there is a risk of introducing excessive noise into clients’ gradients. This issue can lead to a significant degradation in the accuracy of the global model. Thus, how to balance the user's privacy and global model accuracy becomes a challenge in DP-FL. To this end, we propose an approach, known as differential privacy federated aggregation, based on significant gradient protection (DP-FedASGP). DP-FedASGP can mitigate excessive noises by protecting significant gradients and accelerate the convergence of the global model by calculating dynamic aggregation weights for gradients. Experimental results show that DP-FedASGP achieves comparable privacy protection effects to DP-FedAvg and cpSGD (communication-private SGD based on gradient quantization) but outperforms DP-FedSNLC (sparse noise based on clipping losses and privacy budget costs) and FedSMP (sparsified model perturbation). Furthermore, the average global test accuracy of DP-FedASGP across four datasets and three models is about \u0000<inline-formula><tex-math>$2.62$</tex-math></inline-formula>\u0000%, \u0000<inline-formula><tex-math>$4.71$</tex-math></inline-formula>\u0000%, \u0000<inline-formula><tex-math>$0.45$</tex-math></inline-formula>\u0000%, and \u0000<inline-formula><tex-math>$0.19$</tex-math></inline-formula>\u0000% higher than the above methods, respectively. These improvements indicate that DP-FedASGP is a promising approach for balancing the privacy and accuracy of DP-FL.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 1","pages":"278-292"},"PeriodicalIF":3.6,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Collaborative Neural Architecture Search for Personalized Federated Learning 个性化联邦学习的协同神经架构搜索
IF 3.6 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-10-10 DOI: 10.1109/TC.2024.3477945
Yi Liu;Song Guo;Jie Zhang;Zicong Hong;Yufeng Zhan;Qihua Zhou
Personalized federated learning (pFL) is a promising approach to train customized models for multiple clients over heterogeneous data distributions. However, existing works on pFL often rely on the optimization of model parameters and ignore the personalization demand on neural network architecture, which can greatly affect the model performance in practice. Therefore, generating personalized models with different neural architectures for different clients is a key issue in implementing pFL in a heterogeneous environment. Motivated by Neural Architecture Search (NAS), a model architecture searching methodology, this paper aims to automate the model design in a collaborative manner while achieving good training performance for each client. Specifically, we reconstruct the centralized searching of NAS into the distributed scheme called Personalized Architecture Search (PAS), where differentiable architecture fine-tuning is achieved via gradient-descent optimization, thus making each client obtain the most appropriate model. Furthermore, to aggregate knowledge from heterogeneous neural architectures, a knowledge distillation-based training framework is proposed to achieve a good trade-off between generalization and personalization in federated learning. Extensive experiments demonstrate that our architecture-level personalization method achieves higher accuracy under the non-iid settings, while not aggravating model complexity over state-of-the-art benchmarks.
个性化联邦学习(pFL)是一种很有前途的方法,可以在异构数据分布上为多个客户端训练定制模型。然而,现有的pFL研究往往依赖于模型参数的优化,而忽略了神经网络结构的个性化需求,这在实际应用中会极大地影响模型的性能。因此,为不同的客户端生成具有不同神经结构的个性化模型是在异构环境中实现pFL的关键问题。基于神经架构搜索(NAS)这一模型架构搜索方法,本文旨在以协作方式实现模型设计的自动化,同时为每个客户端提供良好的训练性能。具体而言,我们将NAS的集中搜索重构为一种名为个性化架构搜索(Personalized Architecture Search, PAS)的分布式方案,其中通过梯度下降优化实现可微架构微调,从而使每个客户端获得最合适的模型。此外,为了从异构神经结构中聚合知识,提出了一种基于知识蒸馏的训练框架,以实现联邦学习中泛化和个性化之间的良好权衡。大量的实验表明,我们的架构级个性化方法在非id设置下实现了更高的精度,同时与最先进的基准测试相比,不会增加模型的复杂性。
{"title":"Collaborative Neural Architecture Search for Personalized Federated Learning","authors":"Yi Liu;Song Guo;Jie Zhang;Zicong Hong;Yufeng Zhan;Qihua Zhou","doi":"10.1109/TC.2024.3477945","DOIUrl":"https://doi.org/10.1109/TC.2024.3477945","url":null,"abstract":"Personalized federated learning (pFL) is a promising approach to train customized models for multiple clients over heterogeneous data distributions. However, existing works on pFL often rely on the optimization of model parameters and ignore the personalization demand on neural network architecture, which can greatly affect the model performance in practice. Therefore, generating personalized models with different neural architectures for different clients is a key issue in implementing pFL in a heterogeneous environment. Motivated by Neural Architecture Search (NAS), a model architecture searching methodology, this paper aims to automate the model design in a collaborative manner while achieving good training performance for each client. Specifically, we reconstruct the centralized searching of NAS into the distributed scheme called Personalized Architecture Search (PAS), where differentiable architecture fine-tuning is achieved via gradient-descent optimization, thus making each client obtain the most appropriate model. Furthermore, to aggregate knowledge from heterogeneous neural architectures, a knowledge distillation-based training framework is proposed to achieve a good trade-off between generalization and personalization in federated learning. Extensive experiments demonstrate that our architecture-level personalization method achieves higher accuracy under the non-iid settings, while not aggravating model complexity over state-of-the-art benchmarks.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 1","pages":"250-262"},"PeriodicalIF":3.6,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Heterogeneous and Adaptive Architecture for Decision-Tree-Based ACL Engine on FPGA 基于FPGA的决策树ACL引擎异构自适应架构
IF 3.6 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-10-10 DOI: 10.1109/TC.2024.3477955
Yao Xin;Chengjun Jia;Wenjun Li;Ori Rottenstreich;Yang Xu;Gaogang Xie;Zhihong Tian;Jun Li
Access Control Lists (ACLs) are crucial for ensuring the security and integrity of modern cloud and carrier networks by regulating access to sensitive information and resources. However, previous software and hardware implementations no longer meet the requirements of modern datacenters. The emergence of FPGA-based SmartNICs presents an opportunity to offload ACL functions from the host CPU, leading to improved network performance in datacenter applications. However, previous FPGA-based ACL designs lacked the necessary flexibility to support different rulesets without hardware reconfiguration while maintaining high performance. In this paper, we propose HACL, a heterogeneous and adaptive architecture for decision-tree-based ACL engine on FPGA. By employing techniques such as tree decomposition and recirculated pipeline scheduling, HACL can accommodate various rulesets without reconfiguring the underlying architecture. To facilitate the efficient mapping of different decision trees to memory and optimize the throughput of a ruleset, we also introduce a heterogeneous framework with a compiler in CPU platform for HACL. We implement HACL on a typical SmartNIC and evaluate its performance. The results demonstrate that HACL achieves a throughput exceeding 260 Mpps when processing 100K-scale ACL rulesets, with low hardware resource utilization. By integrating more engines, HACL can achieve even higher throughput and support larger rulesets.
访问控制列表(acl)通过规范对敏感信息和资源的访问,对于确保现代云和运营商网络的安全性和完整性至关重要。但是,以前的软件和硬件实现已经不能满足现代数据中心的需求。基于fpga的smartnic的出现为从主机CPU中卸载ACL功能提供了机会,从而提高了数据中心应用程序的网络性能。然而,以前基于fpga的ACL设计缺乏必要的灵活性,无法在不重新配置硬件的情况下支持不同的规则集,同时保持高性能。本文提出了一种基于FPGA的决策树ACL引擎的异构自适应结构。通过采用树分解和循环管道调度等技术,HACL可以适应各种规则集,而无需重新配置底层体系结构。为了促进不同决策树到内存的有效映射和优化规则集的吞吐量,我们还在CPU平台上引入了一个带有编译器的异构框架。我们在一个典型的SmartNIC上实现了HACL,并对其性能进行了评估。结果表明,在处理100k级ACL规则集时,ACL的吞吐量超过260 Mpps,硬件资源利用率较低。通过集成更多的引擎,HACL可以实现更高的吞吐量并支持更大的规则集。
{"title":"A Heterogeneous and Adaptive Architecture for Decision-Tree-Based ACL Engine on FPGA","authors":"Yao Xin;Chengjun Jia;Wenjun Li;Ori Rottenstreich;Yang Xu;Gaogang Xie;Zhihong Tian;Jun Li","doi":"10.1109/TC.2024.3477955","DOIUrl":"https://doi.org/10.1109/TC.2024.3477955","url":null,"abstract":"Access Control Lists (ACLs) are crucial for ensuring the security and integrity of modern cloud and carrier networks by regulating access to sensitive information and resources. However, previous software and hardware implementations no longer meet the requirements of modern datacenters. The emergence of FPGA-based SmartNICs presents an opportunity to offload ACL functions from the host CPU, leading to improved network performance in datacenter applications. However, previous FPGA-based ACL designs lacked the necessary flexibility to support different rulesets without hardware reconfiguration while maintaining high performance. In this paper, we propose HACL, a heterogeneous and adaptive architecture for decision-tree-based ACL engine on FPGA. By employing techniques such as tree decomposition and recirculated pipeline scheduling, HACL can accommodate various rulesets without reconfiguring the underlying architecture. To facilitate the efficient mapping of different decision trees to memory and optimize the throughput of a ruleset, we also introduce a heterogeneous framework with a compiler in CPU platform for HACL. We implement HACL on a typical SmartNIC and evaluate its performance. The results demonstrate that HACL achieves a throughput exceeding 260 Mpps when processing 100K-scale ACL rulesets, with low hardware resource utilization. By integrating more engines, HACL can achieve even higher throughput and support larger rulesets.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 1","pages":"263-277"},"PeriodicalIF":3.6,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enabling High Performance and Resource Utilization in Clustered Cache via Hotness Identification, Data Copying, and Instance Merging
IF 3.6 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-10-10 DOI: 10.1109/TC.2024.3477994
Hongmin Li;Si Wu;Zhipeng Li;Qianli Wang;Yongkun Li;Yinlong Xu
In-memory cache systems such as Redis provide low-latency and high-performance data access for modern internet services. However, in large-scale Redis systems, the workloads show strong skewness and varied locality, which degrades system performance and incurs low CPU utilization. Though there are many approaches toward load imbalance, the two-layered architecture of Redis makes its workload skewness show special characteristics. Redis first maps data into data groups, which is called Group Mapping. Then the data groups are distributed to instances by Instance Mapping. Under Redis's layered architecture, it gives rise to a small number of hot-spot instances with very limited hot data groups, as well as a large number of remaining cold instances. To improve Redis's performance and CPU utilization, it entails the accurate identification of instance and data group hotness, and handling hot data groups and cold instances. We propose HPUCache+ to address the hot-spot problem via hotness identification, hot data copying, and cold instance merging. HPUCache+ accurately and dynamically detects instance and data group hotness based on multiple resources and workload characteristics at low cost. It enables access to multiple data copies by dynamically updating the cached mapping in Redis client, achieving high user access performance with Redis client compatibility, while providing highly self-definable service level agreement. It also proposes an asynchronous instance merging strategy based on disk snapshots and temporal caches, which separates the massive data movement from the critical user access path to achieve high-performance instance merging. We implement HPUCache+ into Redis. Experiments show that, compared to the native Redis design, HPUCache+ achieves up to 2.3$times$ and 3.5$times$ throughput gains, 11.3$times$ and 14.3$times$ CPU utilization gains, respectively. It also achieves up to 50% less CPU and 75% less memory consumption compared to the state-of-the-art approach Anna.
{"title":"Enabling High Performance and Resource Utilization in Clustered Cache via Hotness Identification, Data Copying, and Instance Merging","authors":"Hongmin Li;Si Wu;Zhipeng Li;Qianli Wang;Yongkun Li;Yinlong Xu","doi":"10.1109/TC.2024.3477994","DOIUrl":"https://doi.org/10.1109/TC.2024.3477994","url":null,"abstract":"In-memory cache systems such as Redis provide low-latency and high-performance data access for modern internet services. However, in large-scale Redis systems, the workloads show strong skewness and varied locality, which degrades system performance and incurs low CPU utilization. Though there are many approaches toward load imbalance, the two-layered architecture of Redis makes its workload skewness show special characteristics. Redis first maps data into data groups, which is called <i>Group Mapping. Then the data groups are distributed to instances by Instance Mapping.</i> Under Redis's layered architecture, it gives rise to a small number of hot-spot instances with very limited hot data groups, as well as a large number of remaining cold instances. To improve Redis's performance and CPU utilization, it entails the accurate identification of instance and data group hotness, and handling hot data groups and cold instances. We propose HPUCache+ to address the hot-spot problem via hotness identification, hot data copying, and cold instance merging. HPUCache+ accurately and dynamically detects instance and data group hotness based on multiple resources and workload characteristics at low cost. It enables access to multiple data copies by dynamically updating the cached mapping in Redis client, achieving high user access performance with Redis client compatibility, while providing highly self-definable service level agreement. It also proposes an asynchronous instance merging strategy based on disk snapshots and temporal caches, which separates the massive data movement from the critical user access path to achieve high-performance instance merging. We implement HPUCache+ into Redis. Experiments show that, compared to the native Redis design, HPUCache+ achieves up to 2.3<inline-formula><tex-math>$times$</tex-math></inline-formula> and 3.5<inline-formula><tex-math>$times$</tex-math></inline-formula> throughput gains, 11.3<inline-formula><tex-math>$times$</tex-math></inline-formula> and 14.3<inline-formula><tex-math>$times$</tex-math></inline-formula> CPU utilization gains, respectively. It also achieves up to 50% less CPU and 75% less memory consumption compared to the state-of-the-art approach Anna.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 2","pages":"371-385"},"PeriodicalIF":3.6,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NPC: A Non-Conflicting Processing-in-Memory Controller in DDR Memory Systems
IF 3.6 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-10-10 DOI: 10.1109/TC.2024.3477981
Seungyong Lee;Sanghyun Lee;Minseok Seo;Chunmyung Park;Woojae Shin;Hyuk-Jae Lee;Hyun Kim
Processing-in-Memory (PIM) has emerged as a promising solution to address the memory wall problem. Existing memory interfaces must support new PIM commands to utilize PIM, making the definition of PIM commands according to memory modes a major issue in the development of practical PIM products. For performance and OS-transparency, the memory controller is responsible for changing the memory mode, which requires modifying the controller and resolving conflicts with existing functionalities. Additionally, it must operate to minimize mode transition overhead, which can cause significant performance degradation. In this study, we present NPC, a memory controller designed for mode transition PIM that delivers PIM commands via the DDR interface. NPC issues PIM commands while transparently changing the memory mode with a dedicated scheduling policy that reduces the number of mode transitions with aggregative issuing. Moreover, existing functions, such as refresh, are optimized for PIM operation. We implement NPC in hardware and develop a PIM emulation system to validate it on FPGA platforms. Experimental results reveal that NPC is compatible with existing interfaces and functionality, and the proposed scheduling policy improves performance by 2.2$boldsymbol{times}$ with balanced fairness, achieving up to 97% of the ideal performance. These findings have the potential to aid the application of PIM in real systems and contribute to the commercialization of mode transition PIM.
{"title":"NPC: A Non-Conflicting Processing-in-Memory Controller in DDR Memory Systems","authors":"Seungyong Lee;Sanghyun Lee;Minseok Seo;Chunmyung Park;Woojae Shin;Hyuk-Jae Lee;Hyun Kim","doi":"10.1109/TC.2024.3477981","DOIUrl":"https://doi.org/10.1109/TC.2024.3477981","url":null,"abstract":"Processing-in-Memory (PIM) has emerged as a promising solution to address the memory wall problem. Existing memory interfaces must support new PIM commands to utilize PIM, making the definition of PIM commands according to memory modes a major issue in the development of practical PIM products. For performance and OS-transparency, the memory controller is responsible for changing the memory mode, which requires modifying the controller and resolving conflicts with existing functionalities. Additionally, it must operate to minimize mode transition overhead, which can cause significant performance degradation. In this study, we present NPC, a memory controller designed for mode transition PIM that delivers PIM commands via the DDR interface. NPC issues PIM commands while transparently changing the memory mode with a dedicated scheduling policy that reduces the number of mode transitions with aggregative issuing. Moreover, existing functions, such as refresh, are optimized for PIM operation. We implement NPC in hardware and develop a PIM emulation system to validate it on FPGA platforms. Experimental results reveal that NPC is compatible with existing interfaces and functionality, and the proposed scheduling policy improves performance by 2.2<inline-formula><tex-math>$boldsymbol{times}$</tex-math></inline-formula> with balanced fairness, achieving up to 97% of the ideal performance. These findings have the potential to aid the application of PIM in real systems and contribute to the commercialization of mode transition PIM.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 3","pages":"1025-1039"},"PeriodicalIF":3.6,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143388596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Computers
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1