首页 > 最新文献

IEEE Transactions on Very Large Scale Integration (VLSI) Systems最新文献

英文 中文
A High-Throughput Constructive Interference Precoder for 16 × MU-MIMO Systems 适用于 16 美元多 MU-MIMO 系统的高吞吐量建设性干扰前置编码器
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-07-12 DOI: 10.1109/TVLSI.2024.3423341
Yu-Cheng Lin;Ren-Hao Chiou;Chia-Hsiang Yang
In a multiuser multiple-input multiple-output (MU-MIMO) downlink system, users are susceptible to interuser interference (IUI) because of data being simultaneously transmitted over the same time-frequency resources. Conventionally, precoding algorithms aim to eliminate the IUI. However, constructive interference (CI) precoding can achieve better error performance by exploiting the IUI. This article presents a high-throughput CI precoder. Design optimization across the algorithm and the architecture layers is conducted, reducing the complexity for multiplications by 81.6%. As the number of iterations for convergence varies, dynamic resource allocation is utilized to support each modulation mode with maximized utilization: time-multiplexing for the 4-QAM mode and parallel-processing for the 16-QAM mode. The proposed symbol updater also allows more efficient scheduling. As a proof of concept, a CI precoder chip that supports up to $16 times $ MU-MIMO systems is designed based in a 40-nm CMOS technology. The performance gains at a bit error rate (BER) $= 10^{-4}$ are 10.7 and 12.5 dB for 4-QAM and 16-QAM, respectively, compared with conventional regularized zero-forcing (RZF) schemes. The precoder delivers a maximum throughput of 3.2 Gb/s at a clock frequency of 200 MHz for the $16 times $ MU-MIMO configuration.
在多用户多输入多输出(MU-MIMO)下行链路系统中,由于数据在相同的时频资源上同时传输,用户很容易受到用户间干扰(IUI)的影响。传统的预编码算法旨在消除 IUI。然而,建设性干扰(CI)预编码可以利用 IUI 实现更好的误差性能。本文介绍了一种高吞吐量的 CI 预编码器。通过对算法和架构层进行设计优化,乘法的复杂度降低了 81.6%。由于收敛的迭代次数不同,因此采用了动态资源分配,以最大限度地利用资源来支持每种调制模式:4-QAM 模式采用时间多路复用,16-QAM 模式采用并行处理。拟议的符号更新器还能提高调度效率。作为概念验证,基于40纳米CMOS技术设计了一款CI前置编码器芯片,可支持高达16美元的MU-MIMO系统。与传统的正则化零强迫(RZF)方案相比,4-QAM 和 16-QAM 在误码率(BER)$= 10^{-4}$ 时的性能分别提高了 10.7 和 12.5 dB。在时钟频率为 200 MHz 的 $16 times $ MU-MIMO 配置下,前置编码器的最大吞吐量为 3.2 Gb/s。
{"title":"A High-Throughput Constructive Interference Precoder for 16 × MU-MIMO Systems","authors":"Yu-Cheng Lin;Ren-Hao Chiou;Chia-Hsiang Yang","doi":"10.1109/TVLSI.2024.3423341","DOIUrl":"10.1109/TVLSI.2024.3423341","url":null,"abstract":"In a multiuser multiple-input multiple-output (MU-MIMO) downlink system, users are susceptible to interuser interference (IUI) because of data being simultaneously transmitted over the same time-frequency resources. Conventionally, precoding algorithms aim to eliminate the IUI. However, constructive interference (CI) precoding can achieve better error performance by exploiting the IUI. This article presents a high-throughput CI precoder. Design optimization across the algorithm and the architecture layers is conducted, reducing the complexity for multiplications by 81.6%. As the number of iterations for convergence varies, dynamic resource allocation is utilized to support each modulation mode with maximized utilization: time-multiplexing for the 4-QAM mode and parallel-processing for the 16-QAM mode. The proposed symbol updater also allows more efficient scheduling. As a proof of concept, a CI precoder chip that supports up to \u0000<inline-formula> <tex-math>$16 times $ </tex-math></inline-formula>\u0000 MU-MIMO systems is designed based in a 40-nm CMOS technology. The performance gains at a bit error rate (BER) \u0000<inline-formula> <tex-math>$= 10^{-4}$ </tex-math></inline-formula>\u0000 are 10.7 and 12.5 dB for 4-QAM and 16-QAM, respectively, compared with conventional regularized zero-forcing (RZF) schemes. The precoder delivers a maximum throughput of 3.2 Gb/s at a clock frequency of 200 MHz for the \u0000<inline-formula> <tex-math>$16 times $ </tex-math></inline-formula>\u0000 MU-MIMO configuration.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 10","pages":"1878-1888"},"PeriodicalIF":2.8,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141612038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A nMOS-R Cross-Coupled Level Shifter With High dV/dt Noise Immunity for 600-V High-Voltage Gate Driver IC 用于 600 V 高压栅极驱动器集成电路、具有高 dV/dt 抗噪能力的 nMOS-R 交叉耦合电平转换器
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-07-12 DOI: 10.1109/TVLSI.2024.3417385
Yu Lu;Xiaowu Cai;Jian Lu;Longli Pan;Jianying Dang;Yafei Xie;Xupeng Wang;Bo Li
In digital integrated circuits with multiple power domains, level shifters (LSs) are essential circuit elements that can transform the voltage region from low to high. However, high-frequency gate drivers can generate hundreds of voltages per nanosecond noise (high dV/dt noise). Such high dV/dt noise can cause malfunction of a conventional pulse-triggered cross-coupled LS (CCLS) that is used to control the high-side nMOS switch. In this article, a novel LS with noise immunity is proposed and investigated. Compared with the conventional resistor load LS, the proposed circuit adopts nMOS-R cross-coupled (NRCC) LS, and realizes the selective filtering ability by exploiting the path that filters out the noise introduced by the dV/dt. The high-voltage gate drive integrated circuit (HVIC) is implemented using a 600 V silicon-on-insulator (SOI) BCD process. Analyses and experiments show that the proposed design can help the HVIC maintain a high common-mode transient immunity (CMTI) of up to 137 V/ns while allowing a negative VS swing down to -9.4 V under a 15 V supply voltage. Compared with the traditional HVIC with resistance load LS, the proposed novel HVIC with the NRCC LS improves the noise immunity of dV/dt by 182%.
在具有多个功率域的数字集成电路中,电平转换器(LS)是将电压区域从低电平转换到高电平的重要电路元件。然而,高频栅极驱动器每纳秒可产生数百个电压噪声(高 dV/dt 噪声)。这种高 dV/dt 噪声会导致用于控制高压侧 nMOS 开关的传统脉冲触发交叉耦合 LS(CCLS)出现故障。本文提出并研究了一种具有抗噪能力的新型 LS。与传统的电阻负载 LS 相比,所提出的电路采用了 nMOS-R 交叉耦合 (NRCC) LS,并通过利用滤除 dV/dt 引入的噪声的路径实现了选择性滤波能力。高压栅极驱动集成电路(HVIC)采用 600 V 硅绝缘体(SOI)BCD 工艺实现。分析和实验表明,所提出的设计可帮助 HVIC 保持高达 137 V/ns 的高共模瞬态抗扰度 (CMTI),同时允许在 15 V 电源电压下实现低至 -9.4 V 的负 VS 摆幅。与采用电阻负载 LS 的传统 HVIC 相比,采用 NRCC LS 的新型 HVIC 将 dV/dt 的抗噪能力提高了 182%。
{"title":"A nMOS-R Cross-Coupled Level Shifter With High dV/dt Noise Immunity for 600-V High-Voltage Gate Driver IC","authors":"Yu Lu;Xiaowu Cai;Jian Lu;Longli Pan;Jianying Dang;Yafei Xie;Xupeng Wang;Bo Li","doi":"10.1109/TVLSI.2024.3417385","DOIUrl":"10.1109/TVLSI.2024.3417385","url":null,"abstract":"In digital integrated circuits with multiple power domains, level shifters (LSs) are essential circuit elements that can transform the voltage region from low to high. However, high-frequency gate drivers can generate hundreds of voltages per nanosecond noise (high dV/dt noise). Such high dV/dt noise can cause malfunction of a conventional pulse-triggered cross-coupled LS (CCLS) that is used to control the high-side nMOS switch. In this article, a novel LS with noise immunity is proposed and investigated. Compared with the conventional resistor load LS, the proposed circuit adopts nMOS-R cross-coupled (NRCC) LS, and realizes the selective filtering ability by exploiting the path that filters out the noise introduced by the dV/dt. The high-voltage gate drive integrated circuit (HVIC) is implemented using a 600 V silicon-on-insulator (SOI) BCD process. Analyses and experiments show that the proposed design can help the HVIC maintain a high common-mode transient immunity (CMTI) of up to 137 V/ns while allowing a negative VS swing down to -9.4 V under a 15 V supply voltage. Compared with the traditional HVIC with resistance load LS, the proposed novel HVIC with the NRCC LS improves the noise immunity of dV/dt by 182%.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 11","pages":"1993-2000"},"PeriodicalIF":2.8,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141612039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improvement in Resilience of AES Design With Reconfigured CFB Mode Against Power Attacks 利用重新配置的 CFB 模式提高 AES 设计的抗功率攻击能力
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-07-11 DOI: 10.1109/TVLSI.2024.3422501
Thockchom Birjit Singha;Basa Sanjana;Titu Mary Ignatius;Roy Paily Palathinkal;Shaik Rafi Ahamed
Advanced encryption standard (AES) is used to secure the communication process on the Internet-of-Things (IoT) hardware. It is implementable in various 128-bit modes, such as electronic code book (ECB), cipher block chaining (CBC), cipher feedback (CFB), output feedback (OFB), and counter (CTR), to facilitate parallel processing of data. The noninvasive nature of power analysis attacks (PAAs) to retrieve secret information off a physical device renders such hardware to be unsafe from the adversaries. Also, the assessment of the aforementioned modes for security remains obscured, which is undertaken by this work as a novel attempt. In addition, this work proposes a novel 64-bit version of CFB mode, which provides the highest security with respect to other modes and several unprotected AES designs. PAAs are performed on ASIC platform utilizing UMC 65-nm technology node and a hardware experimental setup using side-channel attack security evaluation board (SASEBO), both at 16-MHz AES frequency and traces sampled at the rate of 1 GSa/s. The measurements to disclose (MTDs) of >1 000 000 provided by the proposed CFB-64 are significantly more than that provided by usual unprotected AES designs. It also offers the highest MTD, and least signal-to-noise ratio (SNR) and mutual information (MI) among other modes, indicating the highest security. The proposed CFB-64 acts as a countermeasure upon integration with an unprotected AES.
高级加密标准(AES)用于确保物联网(IoT)硬件上通信过程的安全。它可通过各种 128 位模式实现,如电子密码本 (ECB)、密码块链 (CBC)、密码反馈 (CFB)、输出反馈 (OFB) 和计数器 (CTR),以促进数据的并行处理。从物理设备上获取机密信息的功率分析攻击(PAA)具有非侵入性的特点,这使得此类硬件对对手来说并不安全。同时,对上述模式的安全性评估仍然模糊不清,而本作品正是对此进行了新的尝试。此外,本研究还提出了一种新颖的 64 位 CFB 模式,与其他模式和几种未受保护的 AES 设计相比,它具有最高的安全性。PAAs 在采用 UMC 65-nm 技术节点的 ASIC 平台和使用侧信道攻击安全评估板(SASEBO)的硬件实验装置上进行,均以 16-MHz AES 频率和 1 GSa/s 的采样率进行跟踪。拟议的 CFB-64 所提供的披露测量值(MTD)大于 1 000 000,大大超过了普通无保护 AES 设计所提供的测量值。在其他模式中,它还能提供最高的 MTD、最小的信噪比(SNR)和互信息(MI),这表明它具有最高的安全性。所提出的 CFB-64 与无保护 AES 集成后,可起到反制作用。
{"title":"Improvement in Resilience of AES Design With Reconfigured CFB Mode Against Power Attacks","authors":"Thockchom Birjit Singha;Basa Sanjana;Titu Mary Ignatius;Roy Paily Palathinkal;Shaik Rafi Ahamed","doi":"10.1109/TVLSI.2024.3422501","DOIUrl":"10.1109/TVLSI.2024.3422501","url":null,"abstract":"Advanced encryption standard (AES) is used to secure the communication process on the Internet-of-Things (IoT) hardware. It is implementable in various 128-bit modes, such as electronic code book (ECB), cipher block chaining (CBC), cipher feedback (CFB), output feedback (OFB), and counter (CTR), to facilitate parallel processing of data. The noninvasive nature of power analysis attacks (PAAs) to retrieve secret information off a physical device renders such hardware to be unsafe from the adversaries. Also, the assessment of the aforementioned modes for security remains obscured, which is undertaken by this work as a novel attempt. In addition, this work proposes a novel 64-bit version of CFB mode, which provides the highest security with respect to other modes and several unprotected AES designs. PAAs are performed on ASIC platform utilizing UMC 65-nm technology node and a hardware experimental setup using side-channel attack security evaluation board (SASEBO), both at 16-MHz AES frequency and traces sampled at the rate of 1 GSa/s. The measurements to disclose (MTDs) of >1 000 000 provided by the proposed CFB-64 are significantly more than that provided by usual unprotected AES designs. It also offers the highest MTD, and least signal-to-noise ratio (SNR) and mutual information (MI) among other modes, indicating the highest security. The proposed CFB-64 acts as a countermeasure upon integration with an unprotected AES.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 11","pages":"2149-2153"},"PeriodicalIF":2.8,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141612040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
P2-ViT: Power-of-Two Post-Training Quantization and Acceleration for Fully Quantized Vision Transformer P$^2$-ViT:用于全量化视觉变换器的二重幂后训练量化和加速技术
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-07-11 DOI: 10.1109/TVLSI.2024.3422684
Huihong Shi;Xin Cheng;Wendong Mao;Zhongfeng Wang
Vision transformers (ViTs) have excelled in computer vision (CV) tasks but are memory-consuming and computation-intensive, challenging their deployment on resource-constrained devices. To tackle this limitation, prior works have explored ViT-tailored quantization algorithms but retained floating-point scaling factors, which yield nonnegligible requantization overhead, limiting ViTs’ hardware efficiency and motivating more hardware-friendly solutions. To this end, we propose P2-ViT, the first power-of-two (PoT) posttraining quantization (PTQ) and acceleration framework to accelerate fully quantized ViTs. Specifically, as for quantization, we explore a dedicated quantization scheme to effectively quantize ViTs with PoT scaling factors, thus minimizing the requantization overhead. Furthermore, we propose coarse-to-fine automatic mixed-precision quantization to enable better accuracy-efficiency tradeoffs. In terms of hardware, we develop a dedicated chunk-based accelerator featuring multiple tailored subprocessors to individually handle ViTs’ different types of operations, alleviating reconfigurable overhead. In addition, we design a tailored row-stationary dataflow to seize the pipeline processing opportunity introduced by our PoT scaling factors, thereby enhancing throughput. Extensive experiments consistently validate P2-ViT’s effectiveness. Particularly, we offer comparable or even superior quantization performance with PoT scaling factors when compared with the counterpart with floating-point scaling factors. Besides, we achieve up to $10.1times $ speedup and $36.8times $ energy saving over GPU’s Turing Tensor Cores, and up to $1.84times $ higher computation utilization efficiency against SOTA quantization-based ViT accelerators. Codes are available at https://github.com/shihuihong214/P2-ViT.
视觉变换器(ViT)在计算机视觉(CV)任务中表现出色,但其内存消耗大、计算密集,这对在资源受限的设备上部署视觉变换器提出了挑战。为解决这一限制,之前的工作探索了针对 ViT 的量化算法,但保留了浮点缩放因子,这产生了不可忽略的重新量化开销,限制了 ViT 的硬件效率,并激发了对硬件更友好的解决方案。为此,我们提出了 P2-ViT,这是首个二幂(PoT)训练后量化(PTQ)和加速框架,用于加速完全量化的 ViT。具体来说,在量化方面,我们探索了一种专用量化方案,以有效量化具有 PoT 缩放因子的 ViT,从而最大限度地减少重新量化开销。此外,我们还提出了从粗到细的自动混合精度量化方案,以实现更好的精度-效率权衡。在硬件方面,我们开发了一种基于分块的专用加速器,具有多个定制的子处理器,可单独处理 ViTs 不同类型的操作,从而减轻了可重新配置的开销。此外,我们还设计了量身定制的行静态数据流,以抓住 PoT 扩展因子带来的流水线处理机会,从而提高吞吐量。大量实验不断验证 P2-ViT 的有效性。特别是,与使用浮点缩放因子的对应方案相比,我们使用 PoT 缩放因子提供了相当甚至更优越的量化性能。此外,与GPU的图灵张量核相比,我们实现了高达10.1倍的提速和36.8倍的节能,与基于SOTA量化的ViT加速器相比,我们实现了高达1.84倍的计算利用效率。代码见 https://github.com/shihuihong214/P2-ViT。
{"title":"P2-ViT: Power-of-Two Post-Training Quantization and Acceleration for Fully Quantized Vision Transformer","authors":"Huihong Shi;Xin Cheng;Wendong Mao;Zhongfeng Wang","doi":"10.1109/TVLSI.2024.3422684","DOIUrl":"10.1109/TVLSI.2024.3422684","url":null,"abstract":"Vision transformers (ViTs) have excelled in computer vision (CV) tasks but are memory-consuming and computation-intensive, challenging their deployment on resource-constrained devices. To tackle this limitation, prior works have explored ViT-tailored quantization algorithms but retained floating-point scaling factors, which yield nonnegligible requantization overhead, limiting ViTs’ hardware efficiency and motivating more hardware-friendly solutions. To this end, we propose P2-ViT, the first power-of-two (PoT) posttraining quantization (PTQ) and acceleration framework to accelerate fully quantized ViTs. Specifically, as for quantization, we explore a dedicated quantization scheme to effectively quantize ViTs with PoT scaling factors, thus minimizing the requantization overhead. Furthermore, we propose coarse-to-fine automatic mixed-precision quantization to enable better accuracy-efficiency tradeoffs. In terms of hardware, we develop a dedicated chunk-based accelerator featuring multiple tailored subprocessors to individually handle ViTs’ different types of operations, alleviating reconfigurable overhead. In addition, we design a tailored row-stationary dataflow to seize the pipeline processing opportunity introduced by our PoT scaling factors, thereby enhancing throughput. Extensive experiments consistently validate P2-ViT’s effectiveness. Particularly, we offer comparable or even superior quantization performance with PoT scaling factors when compared with the counterpart with floating-point scaling factors. Besides, we achieve up to \u0000<inline-formula> <tex-math>$10.1times $ </tex-math></inline-formula>\u0000 speedup and \u0000<inline-formula> <tex-math>$36.8times $ </tex-math></inline-formula>\u0000 energy saving over GPU’s Turing Tensor Cores, and up to \u0000<inline-formula> <tex-math>$1.84times $ </tex-math></inline-formula>\u0000 higher computation utilization efficiency against SOTA quantization-based ViT accelerators. Codes are available at \u0000<uri>https://github.com/shihuihong214/P2-ViT</uri>\u0000.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 9","pages":"1704-1717"},"PeriodicalIF":2.8,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141612041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FELIX: FPGA-Based Scalable and Lightweight Accelerator for Large Integer Extended GCD FELIX:基于 FPGA 的可扩展轻量级大整数扩展 GCD 加速器
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-07-10 DOI: 10.1109/TVLSI.2024.3417016
Samuel Coulon;Tianyou Bao;Jiafeng Xie
The extended greatest common divisor (XGCD) computation is a critical component in various cryptographic applications and algorithms, including both pre- and postquantum cryptosystems. In addition to computing the greatest common divisor (GCD) of two integers, the XGCD also produces Bézout coefficients $b_{a}$ and $b_{b}$ which satisfy $mathrm {GCD}(a,b) = atimes b_{a} + btimes b_{b}$ . In particular, computing the XGCD for large integers is of significant interest. Most recently, XGCD computation between 6479-bit integers is required for solving Nth-degree truncated polynomial ring unit (NTRU) trapdoors in Falcon, a National Institute of Standards and Technology (NIST)-selected postquantum digital signature scheme. To this point, existing literature has primarily focused on exploring software-based implementations for XGCD. The few existing high-performance hardware architectures require significant hardware resources and may not be desirable for practical usage, and the lightweight architectures suffer from poor performance. To fill the research gap, this work proposes a novel FPGA-based scalable and lightweight accelerator for large integer XGCD (FELIX). First, a new algorithm suitable for scalable and lightweight computation of XGCD is proposed. Next, a hardware accelerator (FELIX) is presented, including both constant- and variable-time versions. Finally, a thorough evaluation is carried out to showcase the efficiency of the proposed FELIX. In certain configurations, FELIX involves 81% less equivalent area-time product (eATP) than the state-of-the-art design for 1024-bit integers, and achieves a 95% reduction in latency over the software for 6479-bit integers (Falcon parameter set) with reasonable resource usage. Overall, the proposed FELIX is highly efficient, scalable, lightweight, and suitable for very large integer computation, making it the first such XGCD accelerator in the literature (to the best of our knowledge).
扩展最大公约数(XGCD)计算是包括前量子和后量子密码系统在内的各种密码应用和算法的关键组成部分。除了计算两个整数的最大公约数(GCD)外,XGCD 还能产生满足 $mathrm {GCD}(a,b) = atimes b_{a} + btimes b_{b}$ 条件的贝祖特系数 $b_{a}$ 和 $b_{b}$ 。 特别是,计算大整数的 XGCD 具有重大意义。最近,在美国国家标准与技术研究院(NIST)选定的后量子数字签名方案 Falcon 中,需要计算 6479 位整数之间的 XGCD,以解决 Nth 度截断多项式环单元(NTRU)陷阱门。到目前为止,现有文献主要侧重于探索基于软件的 XGCD 实现。现有的少数高性能硬件架构需要大量硬件资源,在实际应用中可能并不理想,而轻量级架构的性能也很差。为了填补研究空白,本研究提出了一种基于 FPGA 的新型可扩展轻量级大整数 XGCD(FELIX)加速器。首先,提出了一种适用于 XGCD 可扩展轻量级计算的新算法。接着,介绍了硬件加速器(FELIX),包括恒时和变时版本。最后,进行了全面评估,以展示所提 FELIX 的效率。在某些配置下,对于 1024 位整数,FELIX 的等效面积-时间乘积(eATP)比最先进的设计少 81%;对于 6479 位整数(猎鹰参数集),FELIX 的延迟比软件减少 95%,而且资源使用合理。总之,所提出的 FELIX 高效、可扩展、轻量级,适用于超大整数计算,是文献中首个此类 XGCD 加速器(据我们所知)。
{"title":"FELIX: FPGA-Based Scalable and Lightweight Accelerator for Large Integer Extended GCD","authors":"Samuel Coulon;Tianyou Bao;Jiafeng Xie","doi":"10.1109/TVLSI.2024.3417016","DOIUrl":"10.1109/TVLSI.2024.3417016","url":null,"abstract":"The extended greatest common divisor (XGCD) computation is a critical component in various cryptographic applications and algorithms, including both pre- and postquantum cryptosystems. In addition to computing the greatest common divisor (GCD) of two integers, the XGCD also produces Bézout coefficients \u0000<inline-formula> <tex-math>$b_{a}$ </tex-math></inline-formula>\u0000 and \u0000<inline-formula> <tex-math>$b_{b}$ </tex-math></inline-formula>\u0000 which satisfy \u0000<inline-formula> <tex-math>$mathrm {GCD}(a,b) = atimes b_{a} + btimes b_{b}$ </tex-math></inline-formula>\u0000. In particular, computing the XGCD for large integers is of significant interest. Most recently, XGCD computation between 6479-bit integers is required for solving Nth-degree truncated polynomial ring unit (NTRU) trapdoors in Falcon, a National Institute of Standards and Technology (NIST)-selected postquantum digital signature scheme. To this point, existing literature has primarily focused on exploring software-based implementations for XGCD. The few existing high-performance hardware architectures require significant hardware resources and may not be desirable for practical usage, and the lightweight architectures suffer from poor performance. To fill the research gap, this work proposes a novel FPGA-based scalable and lightweight accelerator for large integer XGCD (FELIX). First, a new algorithm suitable for scalable and lightweight computation of XGCD is proposed. Next, a hardware accelerator (FELIX) is presented, including both constant- and variable-time versions. Finally, a thorough evaluation is carried out to showcase the efficiency of the proposed FELIX. In certain configurations, FELIX involves 81% less equivalent area-time product (eATP) than the state-of-the-art design for 1024-bit integers, and achieves a 95% reduction in latency over the software for 6479-bit integers (Falcon parameter set) with reasonable resource usage. Overall, the proposed FELIX is highly efficient, scalable, lightweight, and suitable for very large integer computation, making it the first such XGCD accelerator in the literature (to the best of our knowledge).","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 9","pages":"1684-1695"},"PeriodicalIF":2.8,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10593812","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141585201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Dual-Mode Buck Converter with Light-Load Efficiency Improvement and Seamless Mode Transition Technique 具有轻负载效率改进和无缝模式转换技术的双模降压转换器
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-07-09 DOI: 10.1109/TVLSI.2024.3422382
Chengzhi Xu;Xufeng Liao;Peiyuan Fu;Yongyuan Li;Lianxi Liu
In order to improve the efficiency over a wide load range, a power converter of the Internet of Things (IoT) usually works in dual modes, which are pulsewidth modulation (PWM) and pulse frequency modulation (PFM). A mixed load detection scheme is adopted to enable the appropriate modes under different loads, whose analog detector has an accurate detection in the heavy load, and the digital load detection improves the light-load efficiency. When the power converter operates in different modes, the control loops are different. Meanwhile, a seamless mode transition technique (SMTT) is presented in this article to improve the transient response during mode change between PWM and PFM. A test chip was fabricated in a 0.18- $mu $ m standard CMOS process, and the chip area is $1.59times 1.37$ mm2. The experimental results show that the efficiency is above 85.3% under $V_{text {IN}}=3.3$ V, $V_{text {OUT}}=1.8$ V, and in the load range from 1 to 300 mA, while peak efficiency can reach 96.1% at 100-mA load. Compared to the case without the proposed technique, the under/overshoot voltage can be reduced by above 55% during the mode transition.
为了提高宽负载范围内的效率,物联网(IoT)电源转换器通常采用双模式工作,即脉宽调制(PWM)和脉冲频率调制(PFM)。采用混合负载检测方案可在不同负载下启用适当的模式,其模拟检测器可在重负载下进行精确检测,而数字负载检测可提高轻负载效率。当功率转换器工作在不同模式时,控制回路也不同。同时,本文提出了一种无缝模式转换技术(SMTT),以改善 PWM 和 PFM 模式转换时的瞬态响应。测试芯片采用 0.18- $mu $ m 标准 CMOS 工艺制造,芯片面积为 1.59 美元乘 1.37 美元 mm2。实验结果表明,在 $V_{text {IN}}=3.3$ V、$V_{text {OUT}}=1.8$ V、负载范围为 1 至 300 mA 的条件下,效率高于 85.3%;在负载为 100 mA 时,峰值效率可达 96.1%。与未采用拟议技术的情况相比,模式转换期间的欠/过冲电压可降低 55% 以上。
{"title":"A Dual-Mode Buck Converter with Light-Load Efficiency Improvement and Seamless Mode Transition Technique","authors":"Chengzhi Xu;Xufeng Liao;Peiyuan Fu;Yongyuan Li;Lianxi Liu","doi":"10.1109/TVLSI.2024.3422382","DOIUrl":"10.1109/TVLSI.2024.3422382","url":null,"abstract":"In order to improve the efficiency over a wide load range, a power converter of the Internet of Things (IoT) usually works in dual modes, which are pulsewidth modulation (PWM) and pulse frequency modulation (PFM). A mixed load detection scheme is adopted to enable the appropriate modes under different loads, whose analog detector has an accurate detection in the heavy load, and the digital load detection improves the light-load efficiency. When the power converter operates in different modes, the control loops are different. Meanwhile, a seamless mode transition technique (SMTT) is presented in this article to improve the transient response during mode change between PWM and PFM. A test chip was fabricated in a 0.18-\u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000m standard CMOS process, and the chip area is \u0000<inline-formula> <tex-math>$1.59times 1.37$ </tex-math></inline-formula>\u0000 mm2. The experimental results show that the efficiency is above 85.3% under \u0000<inline-formula> <tex-math>$V_{text {IN}}=3.3$ </tex-math></inline-formula>\u0000 V, \u0000<inline-formula> <tex-math>$V_{text {OUT}}=1.8$ </tex-math></inline-formula>\u0000 V, and in the load range from 1 to 300 mA, while peak efficiency can reach 96.1% at 100-mA load. Compared to the case without the proposed technique, the under/overshoot voltage can be reduced by above 55% during the mode transition.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 10","pages":"1782-1791"},"PeriodicalIF":2.8,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141568489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Low-Power Co-Processor to Predict Ventricular Arrhythmia for Wearable Healthcare Devices 为可穿戴医疗设备预测室性心律失常的低功耗协处理器
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-07-08 DOI: 10.1109/TVLSI.2024.3413584
Meenali Janveja;Rushik Parmar;Srichandan Dash;Jan Pidanic;Gaurav Trivedi
Ventricular arrhythmia (VA) is the most critical cardiac anomaly among all arrhythmia beats. Thus, it becomes imperative to predict the occurrence of VA to avoid sudden casualties caused by these arrhythmia beats. In the past, only a few hardware designs have been proposed to predict VA using various features derived from electrocardiogram (ECG) signals and processed using machine learning classifiers. However, these designs are either complex or need more prediction accuracy. Therefore, a deep neural network (DNN)-based co-processor for arrhythmia prediction is proposed in this article. It can predict VA at least $15 min $ before its occurrence with 91.6% accuracy. Co-processor architecture for arrhythmia prediction (CoAP) uses an optimal feature vector extracted from the ECG signal and an optimized DNN, using a novel approximate multiplier (AM). CoAP operates at 12.5 kHz and consumes $4.69~mu text { W}$ when implemented using SCL $180text {-nm}$ bulk CMOS technology. The low power realization of the proposed design and its higher accuracy, compared with well-known state-of-the-art methods, make it suitable for wearable devices.
室性心律失常(VA)是所有心律失常中最严重的心脏异常。因此,当务之急是预测室性心律失常的发生,以避免这些心律失常搏动造成的突发伤亡。过去,只有少数几种硬件设计可利用从心电图(ECG)信号中提取的各种特征并通过机器学习分类器进行处理来预测 VA。然而,这些设计要么复杂,要么需要更高的预测精度。因此,本文提出了一种基于深度神经网络(DNN)的心律失常预测协处理器。它可以在 VA 发生前至少 15 美元预测 VA,准确率高达 91.6%。用于心律失常预测的协处理器架构(CoAP)使用从心电图信号中提取的最佳特征向量和优化的 DNN,并使用新型近似乘法器(AM)。CoAP 的工作频率为 12.5 kHz,采用 SCL 180 美元/文{-nm}$ 体 CMOS 技术实现时的功耗为 4.69 美元/文{W}$。与众所周知的最先进方法相比,拟议设计的低功耗实现及其更高的精度使其适用于可穿戴设备。
{"title":"A Low-Power Co-Processor to Predict Ventricular Arrhythmia for Wearable Healthcare Devices","authors":"Meenali Janveja;Rushik Parmar;Srichandan Dash;Jan Pidanic;Gaurav Trivedi","doi":"10.1109/TVLSI.2024.3413584","DOIUrl":"10.1109/TVLSI.2024.3413584","url":null,"abstract":"Ventricular arrhythmia (VA) is the most critical cardiac anomaly among all arrhythmia beats. Thus, it becomes imperative to predict the occurrence of VA to avoid sudden casualties caused by these arrhythmia beats. In the past, only a few hardware designs have been proposed to predict VA using various features derived from electrocardiogram (ECG) signals and processed using machine learning classifiers. However, these designs are either complex or need more prediction accuracy. Therefore, a deep neural network (DNN)-based co-processor for arrhythmia prediction is proposed in this article. It can predict VA at least \u0000<inline-formula> <tex-math>$15 min $ </tex-math></inline-formula>\u0000 before its occurrence with 91.6% accuracy. Co-processor architecture for arrhythmia prediction (CoAP) uses an optimal feature vector extracted from the ECG signal and an optimized DNN, using a novel approximate multiplier (AM). CoAP operates at 12.5 kHz and consumes \u0000<inline-formula> <tex-math>$4.69~mu text { W}$ </tex-math></inline-formula>\u0000 when implemented using SCL \u0000<inline-formula> <tex-math>$180text {-nm}$ </tex-math></inline-formula>\u0000 bulk CMOS technology. The low power realization of the proposed design and its higher accuracy, compared with well-known state-of-the-art methods, make it suitable for wearable devices.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 9","pages":"1672-1683"},"PeriodicalIF":2.8,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141568490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Area-Efficient Systolic Array Redundancy Architecture for Reliable AI Accelerator 用于可靠人工智能加速器的高效面积收缩阵列冗余架构
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-07-08 DOI: 10.1109/TVLSI.2024.3421563
Hayoung Lee;Jongho Park;Sungho Kang
The increasing demand for data-intensive analytics, driven by the rapid advances in artificial intelligence (AI), has led to the proposal of various AI accelerators. However, as AI-based solutions are being applied to applications that require high accuracy and reliability, ensuring the dependability of these solutions has become a critical issue. In this brief, we present an area-efficient systolic array redundancy architecture for reliable AI accelerator. In the proposed architecture, computations assigned to faulty multiply-accumulate (MAC) units are bypassed using dedicated routes. Subsequently, the same computations are executed in shiftable redundant MACs or selectable redundant MACs. This ensures the correct completion of calculations all without performance reduction. Moreover, the reassignment of computations can be efficiently managed through a simple scheduling algorithm. As a result, the proposed architecture achieves a high repair rate through the redundant MACs and effective computation reassignment. Despite these capabilities, the proposed architecture incurs only a small area overhead.
在人工智能(AI)飞速发展的推动下,人们对数据密集型分析的需求日益增长,因此各种人工智能加速器应运而生。然而,由于基于人工智能的解决方案正被应用于需要高精度和高可靠性的应用中,确保这些解决方案的可靠性已成为一个关键问题。在本简介中,我们提出了一种用于可靠人工智能加速器的面积效率高的收缩阵列冗余架构。在所提出的架构中,分配给故障乘积(MAC)单元的计算将通过专用路由绕过。随后,相同的计算在可移位冗余 MAC 或可选择冗余 MAC 中执行。这样就能确保在不降低性能的情况下正确完成计算。此外,计算的重新分配可通过简单的调度算法进行有效管理。因此,拟议架构通过冗余 MAC 和有效的计算重新分配实现了高修复率。尽管具有这些功能,但所提出的架构只产生了很小的面积开销。
{"title":"An Area-Efficient Systolic Array Redundancy Architecture for Reliable AI Accelerator","authors":"Hayoung Lee;Jongho Park;Sungho Kang","doi":"10.1109/TVLSI.2024.3421563","DOIUrl":"10.1109/TVLSI.2024.3421563","url":null,"abstract":"The increasing demand for data-intensive analytics, driven by the rapid advances in artificial intelligence (AI), has led to the proposal of various AI accelerators. However, as AI-based solutions are being applied to applications that require high accuracy and reliability, ensuring the dependability of these solutions has become a critical issue. In this brief, we present an area-efficient systolic array redundancy architecture for reliable AI accelerator. In the proposed architecture, computations assigned to faulty multiply-accumulate (MAC) units are bypassed using dedicated routes. Subsequently, the same computations are executed in shiftable redundant MACs or selectable redundant MACs. This ensures the correct completion of calculations all without performance reduction. Moreover, the reassignment of computations can be efficiently managed through a simple scheduling algorithm. As a result, the proposed architecture achieves a high repair rate through the redundant MACs and effective computation reassignment. Despite these capabilities, the proposed architecture incurs only a small area overhead.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 10","pages":"1950-1954"},"PeriodicalIF":2.8,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141568492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Error Detection Cryptographic Architectures Benchmarked on FPGAs for Montgomery Ladder 在 FPGA 上以蒙哥马利梯形图为基准的高效错误检测密码体系结构
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-07-05 DOI: 10.1109/TVLSI.2024.3419700
Kasra Ahmadi;Saeed Aghapour;Mehran Mozaffari Kermani;Reza Azarderakhsh
Elliptic curve scalar multiplication (ECSM) is a fundamental element of public key cryptography. The ECSM implementations on deeply embedded architectures and Internet-of-nano-Things have been vulnerable to both permanent and transient errors, as well as fault attacks. Consequently, error detection is crucial. In this work, we present a novel algorithm-level error detection scheme on Montgomery Ladder often used for a number of elliptic curves featuring highly efficient point arithmetic, known as Montgomery curves. Our error detection simulations achieve high error coverage on loop abort and scalar bit flipping fault model using binary tree data structure. Assuming n is the size of the private key, the overhead of our error detection scheme is $O(n)$ . Finally, we conduct a benchmark of our proposed error detection scheme on both ARMv8 and field-programmable gate array (FPGA) platforms to illustrate the implementation and resource utilization. Deployed on Cortex-A72 processors, our proposed error detection scheme maintains a clock cycle overhead of less than 5.2%. In addition, integrating our error detection approach into FPGAs, including AMD/Xilinx Zynq Ultrascale+ and Artix Ultrascale+, results in a comparable throughput and less than 2% increase in area compared with the original hardware implementation. We note that we envision using adoptions of the proposed architectures in the postquantum cryptography (PQC) based on elliptic curves.
椭圆曲线标量乘法(ECSM)是公钥密码学的基本要素。在深度嵌入式架构和纳米物联网上实现的 ECSM 容易受到永久和瞬时错误以及故障攻击的影响。因此,错误检测至关重要。在这项工作中,我们针对蒙哥马利梯形图(Montgomery Ladder)提出了一种新颖的算法级错误检测方案,这种梯形图通常用于一些具有高效点运算功能的椭圆曲线,即蒙哥马利曲线。我们的错误检测模拟利用二叉树数据结构,在循环中止和标量位翻转故障模型上实现了高错误覆盖率。假设 n 是私钥的大小,我们的错误检测方案的开销为 $O(n)$。最后,我们在 ARMv8 和现场可编程门阵列(FPGA)平台上对我们提出的错误检测方案进行了基准测试,以说明其实现和资源利用情况。在 Cortex-A72 处理器上部署我们提出的错误检测方案后,时钟周期开销保持在 5.2% 以下。此外,将我们的错误检测方法集成到 FPGA(包括 AMD/Xilinx Zynq Ultrascale+ 和 Artix Ultrascale+)中,与原始硬件实现相比,吞吐量相当,面积增加不到 2%。我们注意到,我们设想在基于椭圆曲线的后量子密码学(PQC)中采用所提出的架构。
{"title":"Efficient Error Detection Cryptographic Architectures Benchmarked on FPGAs for Montgomery Ladder","authors":"Kasra Ahmadi;Saeed Aghapour;Mehran Mozaffari Kermani;Reza Azarderakhsh","doi":"10.1109/TVLSI.2024.3419700","DOIUrl":"10.1109/TVLSI.2024.3419700","url":null,"abstract":"Elliptic curve scalar multiplication (ECSM) is a fundamental element of public key cryptography. The ECSM implementations on deeply embedded architectures and Internet-of-nano-Things have been vulnerable to both permanent and transient errors, as well as fault attacks. Consequently, error detection is crucial. In this work, we present a novel algorithm-level error detection scheme on Montgomery Ladder often used for a number of elliptic curves featuring highly efficient point arithmetic, known as Montgomery curves. Our error detection simulations achieve high error coverage on loop abort and scalar bit flipping fault model using binary tree data structure. Assuming n is the size of the private key, the overhead of our error detection scheme is \u0000<inline-formula> <tex-math>$O(n)$ </tex-math></inline-formula>\u0000. Finally, we conduct a benchmark of our proposed error detection scheme on both ARMv8 and field-programmable gate array (FPGA) platforms to illustrate the implementation and resource utilization. Deployed on Cortex-A72 processors, our proposed error detection scheme maintains a clock cycle overhead of less than 5.2%. In addition, integrating our error detection approach into FPGAs, including AMD/Xilinx Zynq Ultrascale+ and Artix Ultrascale+, results in a comparable throughput and less than 2% increase in area compared with the original hardware implementation. We note that we envision using adoptions of the proposed architectures in the postquantum cryptography (PQC) based on elliptic curves.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 11","pages":"2154-2158"},"PeriodicalIF":2.8,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141568493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Secure Edge-Coded Signaling IoT Transceiver With Reduced Encryption Overhead 减少加密开销的安全边缘编码信令物联网收发器
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-07-05 DOI: 10.1109/TVLSI.2024.3418713
Mizan Abraha Gebremicheal;Ibrahim M. Elfadel
The edge-coded signaling (ECS) protocol enables single-wire signaling in IoT devices and sensors using two important neuromorphic attributes. The first is the coding of bits as a stream of pulses (spikes), and the second is the circumvention of clock and data recovery (CDR) at the receiver. In addition, ECS can be endowed with strong, yet lightweight, security features using an ultralow-latency version of the A5/1 stream cipher. Such strong security comes at the expense of decreased data rates and significant area overhead. In this article, we introduce a new generation of secure ECS protocols that incorporates two notable improvements. The first is a more compact pulse stream definition that results in improved data rates for the plain ECS protocol. The second is a coding-aware version of the low-latency A5/1 stream cipher that results in minimal impact on the effective data rate of the transmission. Consequently, a new all-digital and secure ECS transceiver design is proposed, prototyped, and functionally verified in 65-nm technology. Compared with previous generations of secure ECS transceivers, this new design achieves an increase of approximately 138%, 199%, and 640% in minimum, average, and maximum data rates, respectively, and results in increased resiliency against brute-force attacks by a factor of 16. Furthermore, the ASIC implementation shows that it maintains the compact and energy-efficient features of the ECS architecture, using only $28~mu $ W with an average energy efficiency of 2.745 pJ/bit and a gate count of approximately 2880 gates. This is more than 40% decrease in the equivalent gate count relative to the previous secure ECS generation.
边缘编码信令(ECS)协议利用两个重要的神经形态属性在物联网设备和传感器中实现单线信令。首先是将比特编码为脉冲(尖峰)流,其次是在接收器上规避时钟和数据恢复(CDR)。此外,ECS 还可以使用超低延迟版本的 A5/1 流密码,具有强大而轻巧的安全功能。这种强大的安全性是以降低数据传输速率和显著的面积开销为代价的。在本文中,我们介绍了新一代安全 ECS 协议,其中包含两项显著改进。首先是更紧凑的脉冲流定义,从而提高了普通 ECS 协议的数据传输率。其次是低延迟 A5/1 流密码的编码感知版本,它对传输的有效数据速率影响最小。因此,我们提出了一种全新的全数字安全 ECS 收发器设计,并在 65 纳米技术中进行了原型设计和功能验证。与前几代安全 ECS 收发器相比,这种新设计的最低、平均和最高数据传输速率分别提高了约 138%、199% 和 640%,对暴力破解攻击的抵御能力提高了 16 倍。此外,ASIC 实现表明,它保持了 ECS 架构的紧凑和高能效特性,仅使用 28~mu $ W,平均能效为 2.745 pJ/bit,门数约为 2880 门。与上一代安全 ECS 相比,等效门数减少了 40% 以上。
{"title":"Secure Edge-Coded Signaling IoT Transceiver With Reduced Encryption Overhead","authors":"Mizan Abraha Gebremicheal;Ibrahim M. Elfadel","doi":"10.1109/TVLSI.2024.3418713","DOIUrl":"10.1109/TVLSI.2024.3418713","url":null,"abstract":"The edge-coded signaling (ECS) protocol enables single-wire signaling in IoT devices and sensors using two important neuromorphic attributes. The first is the coding of bits as a stream of pulses (spikes), and the second is the circumvention of clock and data recovery (CDR) at the receiver. In addition, ECS can be endowed with strong, yet lightweight, security features using an ultralow-latency version of the A5/1 stream cipher. Such strong security comes at the expense of decreased data rates and significant area overhead. In this article, we introduce a new generation of secure ECS protocols that incorporates two notable improvements. The first is a more compact pulse stream definition that results in improved data rates for the plain ECS protocol. The second is a coding-aware version of the low-latency A5/1 stream cipher that results in minimal impact on the effective data rate of the transmission. Consequently, a new all-digital and secure ECS transceiver design is proposed, prototyped, and functionally verified in 65-nm technology. Compared with previous generations of secure ECS transceivers, this new design achieves an increase of approximately 138%, 199%, and 640% in minimum, average, and maximum data rates, respectively, and results in increased resiliency against brute-force attacks by a factor of 16. Furthermore, the ASIC implementation shows that it maintains the compact and energy-efficient features of the ECS architecture, using only \u0000<inline-formula> <tex-math>$28~mu $ </tex-math></inline-formula>\u0000W with an average energy efficiency of 2.745 pJ/bit and a gate count of approximately 2880 gates. This is more than 40% decrease in the equivalent gate count relative to the previous secure ECS generation.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 9","pages":"1661-1671"},"PeriodicalIF":2.8,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141568491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1