首页 > 最新文献

IEEE Transactions on Very Large Scale Integration (VLSI) Systems最新文献

英文 中文
A Dual-Mode Continuous–Time Sigma-Delta Modulator With a Reconfigurable Loop Filter Based on a Single Op-Amp Resonator 基于单运放谐振器的带可重构环路滤波器的双模连续时间Σ-Δ 调制器
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-06-20 DOI: 10.1109/TVLSI.2024.3414298
Young-Kyun Cho
This brief proposes a dual-mode continuous-time (CT) sigma-delta modulator (SDM) for switched-mode power supplies comprising a switchable loop filter (LF) based on a single op-amp resonator (SOR). The proposed modulator adaptively adjusts the LF architecture between the third and second order and optimizes the noise transfer function (NTF) using the partial resistors as per the sampling frequency. This facilitates the desired bandwidth and resolution while mitigating design complexity and minimizing the need for tuning circuitry. Moreover, the LF implemented with the SOR enhances both the power and area efficiency of the modulator in each operating mode by reducing the number of active components. The modulator was fabricated based on an 0.18- $mu $ m CMOS process with an active area of 0.105 mm2. It achieved peak signal-to-noise ratios (SNRs) of 66.0/65.3 dB for signal bandwidths of 0.5/1.1 MHz. The power consumptions were 127/ $280~mu $ W from a 1.8-V supply when clocked at 40/160 MHz. The figures of merit for each mode were 82/93 fJ/conv.-step.
本简介提出了一种用于开关电源的双模连续时间(CT)Σ-Δ调制器(SDM),包括一个基于单运放谐振器(SOR)的可切换环路滤波器(LF)。拟议的调制器可在三阶和二阶之间自适应调整 LF 结构,并根据采样频率使用部分电阻器优化噪声传递函数(NTF)。这有助于实现所需的带宽和分辨率,同时降低设计复杂性,最大限度地减少对调谐电路的需求。此外,利用 SOR 实现的低频通过减少有源元件的数量,提高了调制器在每种工作模式下的功率和面积效率。该调制器采用 0.18- $mu $ m CMOS 工艺制造,有源面积为 0.105 mm2。信号带宽为 0.5/1.1 MHz 时,峰值信噪比 (SNR) 分别为 66.0/65.3 dB。时钟频率为 40/160 MHz 时,1.8 V 电源功耗为 127/ 280~mu $ W。每种模式的功耗分别为 82/93 fJ/conv.-step。
{"title":"A Dual-Mode Continuous–Time Sigma-Delta Modulator With a Reconfigurable Loop Filter Based on a Single Op-Amp Resonator","authors":"Young-Kyun Cho","doi":"10.1109/TVLSI.2024.3414298","DOIUrl":"10.1109/TVLSI.2024.3414298","url":null,"abstract":"This brief proposes a dual-mode continuous-time (CT) sigma-delta modulator (SDM) for switched-mode power supplies comprising a switchable loop filter (LF) based on a single op-amp resonator (SOR). The proposed modulator adaptively adjusts the LF architecture between the third and second order and optimizes the noise transfer function (NTF) using the partial resistors as per the sampling frequency. This facilitates the desired bandwidth and resolution while mitigating design complexity and minimizing the need for tuning circuitry. Moreover, the LF implemented with the SOR enhances both the power and area efficiency of the modulator in each operating mode by reducing the number of active components. The modulator was fabricated based on an 0.18-\u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000m CMOS process with an active area of 0.105 mm2. It achieved peak signal-to-noise ratios (SNRs) of 66.0/65.3 dB for signal bandwidths of 0.5/1.1 MHz. The power consumptions were 127/\u0000<inline-formula> <tex-math>$280~mu $ </tex-math></inline-formula>\u0000W from a 1.8-V supply when clocked at 40/160 MHz. The figures of merit for each mode were 82/93 fJ/conv.-step.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141531283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Low-Precision Mixed-Computation Models for Inference on Edge 用于边缘推理的低精度混合计算模型
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-06-20 DOI: 10.1109/TVLSI.2024.3409640
Seyedarmin Azizi;Mahdi Nazemi;Mehdi Kamal;Massoud Pedram
This article presents a mixed-computation neural network processing approach for edge applications that incorporates low-precision (low-width) Posit and low-precision fixed point (FixP) number systems. This mixed-computation approach uses 4-bit Posit (Posit4), which has higher precision around 0, for representing weights with high sensitivity, while it uses 4-bit FixP (FixP4) for representing other weights. A heuristic for analyzing the importance and the quantization error of the weights is presented to assign the proper number system to different weights. In addition, a gradient approximation for Posit representation is introduced to improve the quality of weight updates in the backpropagation process. Due to the high energy consumption of the fully Posit-based computations, neural network operations are carried out in FixP or Posit/FixP. An efficient hardware implementation of an MAC operation with a first Posit operand and FixP for a second operand and accumulator is presented. The efficacy of the proposed low-precision mixed-computation approach is extensively assessed on vision and language models. The results show that on average, the accuracy of the mixed-computation is about 1.5% higher than that of FixP with a cost of 0.19% energy overhead.
本文介绍了一种用于边缘应用的混合计算神经网络处理方法,该方法结合了低精度(低宽度)Posit 和低精度定点(FixP)数字系统。这种混合计算方法使用 4 位 Posit (Posit4)(0 附近精度较高)来表示具有高灵敏度的权重,而使用 4 位 FixP (FixP4) 来表示其他权重。本文提出了一种分析权重重要性和量化误差的启发式方法,以便为不同权重分配合适的数字系统。此外,还引入了用于 Posit 表示的梯度近似方法,以提高反向传播过程中权重更新的质量。由于完全基于 Posit 的计算能耗较高,因此神经网络操作是在 FixP 或 Posit/FixP 中进行的。本文介绍了一种 MAC 运算的高效硬件实现方法,其第一操作数为 Posit,第二操作数和累加器为 FixP。在视觉和语言模型上广泛评估了所提出的低精度混合计算方法的功效。结果表明,平均而言,混合运算的精确度比 FixP 高出约 1.5%,而能量开销仅为 0.19%。
{"title":"Low-Precision Mixed-Computation Models for Inference on Edge","authors":"Seyedarmin Azizi;Mahdi Nazemi;Mehdi Kamal;Massoud Pedram","doi":"10.1109/TVLSI.2024.3409640","DOIUrl":"10.1109/TVLSI.2024.3409640","url":null,"abstract":"This article presents a mixed-computation neural network processing approach for edge applications that incorporates low-precision (low-width) Posit and low-precision fixed point (FixP) number systems. This mixed-computation approach uses 4-bit Posit (Posit4), which has higher precision around 0, for representing weights with high sensitivity, while it uses 4-bit FixP (FixP4) for representing other weights. A heuristic for analyzing the importance and the quantization error of the weights is presented to assign the proper number system to different weights. In addition, a gradient approximation for Posit representation is introduced to improve the quality of weight updates in the backpropagation process. Due to the high energy consumption of the fully Posit-based computations, neural network operations are carried out in FixP or Posit/FixP. An efficient hardware implementation of an MAC operation with a first Posit operand and FixP for a second operand and accumulator is presented. The efficacy of the proposed low-precision mixed-computation approach is extensively assessed on vision and language models. The results show that on average, the accuracy of the mixed-computation is about 1.5% higher than that of FixP with a cost of 0.19% energy overhead.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing ConvNets With ConvFIFO: A Crossbar PIM Architecture Based on Kernel-Stationary First-In-First-Out Dataflow 用 ConvFIFO 增强 ConvNets:基于内核静态先进先出数据流的跨条 PIM 架构
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-06-17 DOI: 10.1109/TVLSI.2024.3409648
Yu Qian;Liang Zhao;Fanzi Meng;Xiapeng Xu;Cheng Zhuo;Xunzhao Yin
Convolutional neural networks (ConvNets) have long been the model of choice for computer vision (CV) problems and gained renewed traction lately. In order to compute ConvNets more efficiently, process-in-memory (PIM) architectures based on emerging non-volatile memories (NVMs) such as RRAM have been widely studied. However, conventional NVM-based PIM suffered from various non-idealities including IR drop, sneak-path currents, large analog-to-digital converter (ADC) overhead, device variations, circuits mismatch, and error propagation. In this work, we propose ConvFIFO, a crossbar-memory-based PIM architecture for ConvNets featuring a kernel-stationary dataflow. Through the design of FIFO-type input and output buffers, smaller row-activation parallelism, and more compact ADCs, ConvFIFO can maximize the reuse rates of inputs and partial sums to achieve a more balanced trade-off among throughput, accuracy, and area/energy consumption. Using SRAM-based FIFO as the input/output buffer, ConvFIFO achieves a systolic architecture without the need to move weight data, bypassing the limitation of NVM endurance and minimizing the movement of partial sums. Moreover, the FIFO nature of the dataflow allows flexible pipeline design and load balancing. Compared to classical NVM-based PIM architectures such as ISAAC, ConvFIFO exhibits significant performance enhancement for various ConvNet models, showing 1.66– $1.69times $ /1.69– $1.74times $ /4.23– $4.79times $ /1.59– $1.74times $ improvement in terms of energy consumption, latency, Ops/W, and Ops/s $times $ mm2, respectively. Compared to GPUs, ConvFIFO exhibits only an average accuracy loss of 1.82% during inference.
长期以来,卷积神经网络(ConvNets)一直是计算机视觉(CV)问题的首选模型,近来再次受到关注。为了更高效地计算 ConvNets,人们广泛研究了基于 RRAM 等新兴非易失性存储器(NVM)的内存进程(PIM)架构。然而,传统的基于 NVM 的 PIM 存在各种非理想情况,包括 IR 下降、潜行路径电流、模数转换器(ADC)开销大、器件变化、电路不匹配和错误传播。在这项工作中,我们提出了 ConvFIFO,这是一种基于交叉条内存的 PIM 架构,适用于具有内核静态数据流的 ConvNets。通过设计 FIFO 型输入和输出缓冲器、更小的行激活并行性和更紧凑的 ADC,ConvFIFO 可以最大限度地提高输入和部分和的重用率,从而在吞吐量、精度和面积/能耗之间实现更平衡的权衡。ConvFIFO 使用基于 SRAM 的 FIFO 作为输入/输出缓冲器,实现了无需移动加权数据的收缩架构,绕过了 NVM 耐用性的限制,并最大限度地减少了部分和的移动。此外,数据流的 FIFO 特性允许灵活的流水线设计和负载平衡。与经典的基于NVM的PIM架构(如ISAAC)相比,ConvFIFO在各种ConvNet模型中都表现出了显著的性能提升,在能耗、延迟、Ops/W和Ops/s $/times $ mm2方面分别显示出1.66- $1.69/times $ /1.69- $1.74/times $ /4.23- $4.79/times $ /1.59- $1.74/times $的提升。与 GPU 相比,ConvFIFO 在推理过程中的平均精度损失仅为 1.82%。
{"title":"Enhancing ConvNets With ConvFIFO: A Crossbar PIM Architecture Based on Kernel-Stationary First-In-First-Out Dataflow","authors":"Yu Qian;Liang Zhao;Fanzi Meng;Xiapeng Xu;Cheng Zhuo;Xunzhao Yin","doi":"10.1109/TVLSI.2024.3409648","DOIUrl":"10.1109/TVLSI.2024.3409648","url":null,"abstract":"Convolutional neural networks (ConvNets) have long been the model of choice for computer vision (CV) problems and gained renewed traction lately. In order to compute ConvNets more efficiently, process-in-memory (PIM) architectures based on emerging non-volatile memories (NVMs) such as RRAM have been widely studied. However, conventional NVM-based PIM suffered from various non-idealities including IR drop, sneak-path currents, large analog-to-digital converter (ADC) overhead, device variations, circuits mismatch, and error propagation. In this work, we propose ConvFIFO, a crossbar-memory-based PIM architecture for ConvNets featuring a kernel-stationary dataflow. Through the design of FIFO-type input and output buffers, smaller row-activation parallelism, and more compact ADCs, ConvFIFO can maximize the reuse rates of inputs and partial sums to achieve a more balanced trade-off among throughput, accuracy, and area/energy consumption. Using SRAM-based FIFO as the input/output buffer, ConvFIFO achieves a systolic architecture without the need to move weight data, bypassing the limitation of NVM endurance and minimizing the movement of partial sums. Moreover, the FIFO nature of the dataflow allows flexible pipeline design and load balancing. Compared to classical NVM-based PIM architectures such as ISAAC, ConvFIFO exhibits significant performance enhancement for various ConvNet models, showing 1.66–\u0000<inline-formula> <tex-math>$1.69times $ </tex-math></inline-formula>\u0000/1.69–\u0000<inline-formula> <tex-math>$1.74times $ </tex-math></inline-formula>\u0000/4.23–\u0000<inline-formula> <tex-math>$4.79times $ </tex-math></inline-formula>\u0000/1.59–\u0000<inline-formula> <tex-math>$1.74times $ </tex-math></inline-formula>\u0000 improvement in terms of energy consumption, latency, Ops/W, and Ops/s\u0000<inline-formula> <tex-math>$times $ </tex-math></inline-formula>\u0000mm2, respectively. Compared to GPUs, ConvFIFO exhibits only an average accuracy loss of 1.82% during inference.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141939173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Design Framework for Generating Energy-Efficient Accelerator on FPGA Toward Low-Level Vision 在 FPGA 上生成高能效加速器以实现低级视觉的设计框架
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-06-17 DOI: 10.1109/TVLSI.2024.3409649
Zikang Zhou;Xuyang Duan;Jun Han
Low-level vision algorithms play an increasingly crucial role in a wide range of applications, such as biomedical, security, and autopilot. The low-level vision accelerators have also been extensively researched. As low-level vision is often deployed in embedded devices, its accelerators need to achieve high energy efficiency. Meanwhile, the broad application scenarios of low-level vision contribute to its rapid iteration. Designing energy-efficient accelerators for quickly evolving low-level vision algorithms demands substantial effort. Therefore, a design framework specifically tailored for the generation of low-level vision accelerators is urgently needed. In this article, we propose an end-to-end algorithm-hardware generation framework, EffiVision, on field-programmable gate array (FPGA), aimed at generating highly energy-efficient dedicated accelerators for low-level vision neural networks. EffiVision proposes a hardware template that features multiple parallelisms and large architecture exploration spaces specifically designed to accommodate the characteristics of low-level vision networks. Then, it employs activation-weight aware mixed-precision quantization and FPGA-aware NNLUTs to search the suitable hardware parameters within the hardware template, generating highly energy-efficient accelerators tailored for low-level vision networks. We used EffiVision to perform hardware generation for three low-level vision neural networks fast super-resolution convolutional neural network (FSRCNN), denoising convolutional neural network (DnCNN), and demosaicing convolutional neural network (DMCNN) on Xilinx FPGA development boards, achieving the best energy efficiencies of 174.9, 97.8, and 92.7 GOPS/W, respectively. The generated accelerators of FSRCNN and DnCNN are $1.11times $ and $3.37times $ more efficient than previous works.
低级视觉算法在生物医学、安全和自动驾驶等广泛应用中发挥着越来越重要的作用。低级视觉加速器也得到了广泛的研究。由于低级视觉通常部署在嵌入式设备中,因此其加速器需要实现高能效。同时,低级视觉的广泛应用场景也促使其快速迭代。为快速演进的低级视觉算法设计高能效加速器需要投入大量精力。因此,我们迫切需要一个专门用于生成低级视觉加速器的设计框架。在本文中,我们在现场可编程门阵列(FPGA)上提出了一个端到端算法-硬件生成框架 EffiVision,旨在为低级视觉神经网络生成高能效的专用加速器。EffiVision 提出的硬件模板具有多种并行性和大型架构探索空间,专门设计用于适应低级视觉网络的特性。然后,它采用激活权值感知混合精度量化和 FPGA 感知 NNLUT,在硬件模板内搜索合适的硬件参数,生成专为低级视觉网络定制的高能效加速器。我们使用 EffiVision 在赛灵思 FPGA 开发板上为三个低级视觉神经网络快速超分辨率卷积神经网络 (FSRCNN)、去噪卷积神经网络 (DnCNN) 和去马赛克卷积神经网络 (DMCNN) 生成了硬件,分别实现了 174.9、97.8 和 92.7 GOPS/W 的最佳能效。生成的 FSRCNN 和 DnCNN 的加速器比以前的工作效率分别高出 1.11 美元和 3.37 美元。
{"title":"A Design Framework for Generating Energy-Efficient Accelerator on FPGA Toward Low-Level Vision","authors":"Zikang Zhou;Xuyang Duan;Jun Han","doi":"10.1109/TVLSI.2024.3409649","DOIUrl":"10.1109/TVLSI.2024.3409649","url":null,"abstract":"Low-level vision algorithms play an increasingly crucial role in a wide range of applications, such as biomedical, security, and autopilot. The low-level vision accelerators have also been extensively researched. As low-level vision is often deployed in embedded devices, its accelerators need to achieve high energy efficiency. Meanwhile, the broad application scenarios of low-level vision contribute to its rapid iteration. Designing energy-efficient accelerators for quickly evolving low-level vision algorithms demands substantial effort. Therefore, a design framework specifically tailored for the generation of low-level vision accelerators is urgently needed. In this article, we propose an end-to-end algorithm-hardware generation framework, EffiVision, on field-programmable gate array (FPGA), aimed at generating highly energy-efficient dedicated accelerators for low-level vision neural networks. EffiVision proposes a hardware template that features multiple parallelisms and large architecture exploration spaces specifically designed to accommodate the characteristics of low-level vision networks. Then, it employs activation-weight aware mixed-precision quantization and FPGA-aware NNLUTs to search the suitable hardware parameters within the hardware template, generating highly energy-efficient accelerators tailored for low-level vision networks. We used EffiVision to perform hardware generation for three low-level vision neural networks fast super-resolution convolutional neural network (FSRCNN), denoising convolutional neural network (DnCNN), and demosaicing convolutional neural network (DMCNN) on Xilinx FPGA development boards, achieving the best energy efficiencies of 174.9, 97.8, and 92.7 GOPS/W, respectively. The generated accelerators of FSRCNN and DnCNN are \u0000<inline-formula> <tex-math>$1.11times $ </tex-math></inline-formula>\u0000 and \u0000<inline-formula> <tex-math>$3.37times $ </tex-math></inline-formula>\u0000 more efficient than previous works.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141780239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ALT-Lock: Logic and Timing Ambiguity-Based IP Obfuscation Against Reverse Engineering ALT-Lock:基于逻辑和时序模糊性的 IP 混淆技术对抗逆向工程
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-06-17 DOI: 10.1109/TVLSI.2024.3411033
Jonti Talukdar;Woo-Hyun Paik;Eduardo Ortega;Krishnendu Chakrabarty
We present a logic ambiguity-based intellectual property (IP) obfuscation method that replaces traditional key gates with key-controlled functionally ambiguous logic gates, called LGA gates. We also protect timing paths by developing timing-ambiguous sequential cells called TA cells. We call this locking scheme ambiguous logic and timing logic locking (referred to as ALT-Lock). ALT-Lock ensures a two-pronged system-level security scheme where the attacker is forced to unlock not only combinational logic obfuscation but also timing obfuscation. We show that a combination of logic and timing ambiguity (TA) provides security against oracle-guided attacks. This method is superior to other traditional IP protection schemes such as combinational or sequential locking as it guarantees security against both oracle-guided and oracle-free attacks, while ensuring low power, performance, and area (PPA) overhead.
我们提出了一种基于逻辑模糊性的知识产权(IP)混淆方法,用密钥控制的功能模糊逻辑门(称为 LGA 门)取代传统的密钥门。我们还通过开发称为 TA 单元的时序模糊单元来保护时序路径。我们将这种锁定方案称为模糊逻辑和时序逻辑锁定(简称 ALT-Lock)。ALT-Lock 确保了一种双管齐下的系统级安全方案,攻击者不仅要解锁组合逻辑混淆,还要解锁时序混淆。我们表明,逻辑和时序模糊性(TA)的结合可提供抵御甲骨文引导攻击的安全性。这种方法优于其他传统的知识产权保护方案,如组合锁定或顺序锁定,因为它既能保证对甲骨文引导攻击和无甲骨文攻击的安全性,又能确保较低的功耗、性能和面积(PPA)开销。
{"title":"ALT-Lock: Logic and Timing Ambiguity-Based IP Obfuscation Against Reverse Engineering","authors":"Jonti Talukdar;Woo-Hyun Paik;Eduardo Ortega;Krishnendu Chakrabarty","doi":"10.1109/TVLSI.2024.3411033","DOIUrl":"10.1109/TVLSI.2024.3411033","url":null,"abstract":"We present a logic ambiguity-based intellectual property (IP) obfuscation method that replaces traditional key gates with key-controlled functionally ambiguous logic gates, called LGA gates. We also protect timing paths by developing timing-ambiguous sequential cells called TA cells. We call this locking scheme ambiguous logic and timing logic locking (referred to as ALT-Lock). ALT-Lock ensures a two-pronged system-level security scheme where the attacker is forced to unlock not only combinational logic obfuscation but also timing obfuscation. We show that a combination of logic and timing ambiguity (TA) provides security against oracle-guided attacks. This method is superior to other traditional IP protection schemes such as combinational or sequential locking as it guarantees security against both oracle-guided and oracle-free attacks, while ensuring low power, performance, and area (PPA) overhead.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141780048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Endurance-Aware Compiler for 3-D Stackable FeRAM as Global Buffer in TPU-Like Architecture 类 TPU 架构中作为全局缓冲器的 3-D 可堆叠 FeRAM 的耐用性感知编译器
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-06-17 DOI: 10.1109/TVLSI.2024.3412631
Yuan-Chun Luo;Anni Lu;Yandong Luo;Sou-Chi Chang;Uygar Avci;Shimeng Yu
Emerging nonvolatile memories as embedded memories offer low leakage power and high memory density, compared to the static random access memory (SRAM) and embedded dynamic random access memory (eDRAM) at the same technology node. However, the emerging memories generally suffer from limited cycling endurance. For read/write intensive applications, the limited endurance could become a bottleneck that limits the lifetime of the overall system. In this work, Intel’s reported prototype 3-D stackable ferroelectric random access memory (FeRAM) is considered as the global buffer memory of a tensor-processing-unit (TPU)-like architecture. An endurance-aware compiler is proposed to evaluate the maximum number of deep neural network (DNN) trainings considering the experimentally measured endurance limit. In addition, the proposed compiler applies two strategies to alleviate the endurance issue. The first strategy is wear leveling, and the second strategy is the dual-mode operation between volatile and nonvolatile modes. The maximum numbers of trainings increase by $6times $ to $300times $ and $4times $ to $58times $ thanks to the wear-leveling and dual-mode operations, respectively. Finally, a guideline of the system endurance (maximum number of trainings) is provided with given memory device endurance to bridge the gap between memory device engineers and system designers.
与相同技术节点的静态随机存取存储器(SRAM)和嵌入式动态随机存取存储器(eDRAM)相比,作为嵌入式存储器的新兴非易失性存储器具有漏电功率低、存储器密度高等特点。然而,新兴存储器普遍存在循环耐久性有限的问题。对于读/写密集型应用,有限的耐用性可能成为限制整个系统寿命的瓶颈。在这项工作中,英特尔公司报告的三维可堆叠铁电随机存取存储器(FeRAM)原型被视为类似张量处理单元(TPU)架构的全局缓冲存储器。考虑到实验测得的耐用性限制,提出了一种耐用性感知编译器,用于评估深度神经网络(DNN)训练的最大次数。此外,该编译器还采用了两种策略来缓解耐久性问题。第一种策略是损耗均衡,第二种策略是易失性和非易失性模式之间的双模式操作。由于采用了磨平和双模式操作,最大训练次数分别从 6 次增加到 300 次和 4 次增加到 58 次。最后,在给定内存设备耐久性的情况下,提供了系统耐久性(最大训练次数)指南,以缩小内存设备工程师和系统设计师之间的差距。
{"title":"Endurance-Aware Compiler for 3-D Stackable FeRAM as Global Buffer in TPU-Like Architecture","authors":"Yuan-Chun Luo;Anni Lu;Yandong Luo;Sou-Chi Chang;Uygar Avci;Shimeng Yu","doi":"10.1109/TVLSI.2024.3412631","DOIUrl":"10.1109/TVLSI.2024.3412631","url":null,"abstract":"Emerging nonvolatile memories as embedded memories offer low leakage power and high memory density, compared to the static random access memory (SRAM) and embedded dynamic random access memory (eDRAM) at the same technology node. However, the emerging memories generally suffer from limited cycling endurance. For read/write intensive applications, the limited endurance could become a bottleneck that limits the lifetime of the overall system. In this work, Intel’s reported prototype 3-D stackable ferroelectric random access memory (FeRAM) is considered as the global buffer memory of a tensor-processing-unit (TPU)-like architecture. An endurance-aware compiler is proposed to evaluate the maximum number of deep neural network (DNN) trainings considering the experimentally measured endurance limit. In addition, the proposed compiler applies two strategies to alleviate the endurance issue. The first strategy is wear leveling, and the second strategy is the dual-mode operation between volatile and nonvolatile modes. The maximum numbers of trainings increase by \u0000<inline-formula> <tex-math>$6times $ </tex-math></inline-formula>\u0000 to \u0000<inline-formula> <tex-math>$300times $ </tex-math></inline-formula>\u0000 and \u0000<inline-formula> <tex-math>$4times $ </tex-math></inline-formula>\u0000 to \u0000<inline-formula> <tex-math>$58times $ </tex-math></inline-formula>\u0000 thanks to the wear-leveling and dual-mode operations, respectively. Finally, a guideline of the system endurance (maximum number of trainings) is provided with given memory device endurance to bridge the gap between memory device engineers and system designers.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141938978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gain and Power Enhancement With Coupled Technique for a Distributed Power Amplifier in 0.25- μm GaN HEMT Technology 利用耦合技术提高 0.25 μm GaN HEMT 技术分布式功率放大器的增益和功率
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-06-17 DOI: 10.1109/TVLSI.2024.3411143
Xu Yan;Jingyuan Zhang;Guansheng Lv;Wenhua Chen;Yongxin Guo
In this article, a fully integrated 1.0–11.0-GHz wideband distributed power amplifier (DPA) monolithic microwave integrated circuit (MMIC) design is presented. Particularly, a coupled technique with bandpass (CTB) characteristic between the kth output node and the ( $k+1$ )th input node of amplification units (AUs) is adopted in the DPA design. It generates an additional signal reuse path (SRP) to reuse part of the output signal to superimpose the input signal, and then they will be reamplified to the output artificial transmission line (O-ATML). Moreover, due to the bandpass characteristic, the signal reuse can be manipulated to target the upper cutting edges of the working band to alleviate sharp gain and power roll-off. By carefully controlling the SRP, the overall gain, output power, and bandwidth are enhanced and extended. The systematic design approach for the DPA is detailed with circuit implementations and optimizations. To validate the proposed concept, a DPA MMIC prototype is implemented and fabricated in a commercial 0.25- $mu $ m gallium nitride (GaN)-on-silicon carbide (SiC) high-electron-mobility transistor (HEMT) process. It shows the compact layout within a die size of 3.36 mm2. Under 28-V VDD power supply, the measured results show a flat $14.8pm 1.0$ -dB small-signal gain with 10.0-GHz wide operating bandwidth and good impedance matching conditions. A saturated output power ( ${P} _{text {sat}}$ ) of 7.25 W with peak power-added efficiency (PAE) exceeding 38.7% is achieved. The proposed DPA obtains around 1.54–2.16-W/mm2 power density associated with an average PAE of 34.5% over the entire frequency range.
本文介绍了一种全集成的 1.0-11.0-GHz 宽带分布式功率放大器(DPA)单片微波集成电路(MMIC)设计。特别是,在 DPA 设计中,第 k 个输出节点与放大单元(AU)的 ( $k+1$ )th 输入节点之间采用了具有带通(CTB)特性的耦合技术。它产生一个额外的信号重用路径(SRP),重用部分输出信号来叠加输入信号,然后将它们重新放大到输出人工传输线(O-ATML)。此外,由于带通特性,信号重用可以针对工作频带的上切边进行操作,以减轻急剧的增益和功率滚降。通过仔细控制 SRP,整体增益、输出功率和带宽都得到了增强和扩展。本文详细介绍了 DPA 的系统设计方法、电路实现和优化。为了验证所提出的概念,在商用 0.25- $mu $ m 氮化镓(GaN)-碳化硅(SiC)高电子迁移率晶体管(HEMT)工艺中实现并制造了一个 DPA MMIC 原型。它显示了 3.36 平方毫米芯片尺寸内的紧凑布局。在 28 V VDD 供电条件下,测量结果显示具有 14.8/pm 1.0$ -dB 的平坦小信号增益、10.0-GHz 宽工作带宽和良好的阻抗匹配条件。饱和输出功率(${P} _{text {sat}}$ )为 7.25 W,峰值功率附加效率(PAE)超过 38.7%。在整个频率范围内,拟议的 DPA 功率密度约为 1.54-2.16-W/mm2,平均 PAE 为 34.5%。
{"title":"Gain and Power Enhancement With Coupled Technique for a Distributed Power Amplifier in 0.25- μm GaN HEMT Technology","authors":"Xu Yan;Jingyuan Zhang;Guansheng Lv;Wenhua Chen;Yongxin Guo","doi":"10.1109/TVLSI.2024.3411143","DOIUrl":"10.1109/TVLSI.2024.3411143","url":null,"abstract":"In this article, a fully integrated 1.0–11.0-GHz wideband distributed power amplifier (DPA) monolithic microwave integrated circuit (MMIC) design is presented. Particularly, a coupled technique with bandpass (CTB) characteristic between the kth output node and the (\u0000<inline-formula> <tex-math>$k+1$ </tex-math></inline-formula>\u0000)th input node of amplification units (AUs) is adopted in the DPA design. It generates an additional signal reuse path (SRP) to reuse part of the output signal to superimpose the input signal, and then they will be reamplified to the output artificial transmission line (O-ATML). Moreover, due to the bandpass characteristic, the signal reuse can be manipulated to target the upper cutting edges of the working band to alleviate sharp gain and power roll-off. By carefully controlling the SRP, the overall gain, output power, and bandwidth are enhanced and extended. The systematic design approach for the DPA is detailed with circuit implementations and optimizations. To validate the proposed concept, a DPA MMIC prototype is implemented and fabricated in a commercial 0.25-\u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000m gallium nitride (GaN)-on-silicon carbide (SiC) high-electron-mobility transistor (HEMT) process. It shows the compact layout within a die size of 3.36 mm2. Under 28-V VDD power supply, the measured results show a flat \u0000<inline-formula> <tex-math>$14.8pm 1.0$ </tex-math></inline-formula>\u0000-dB small-signal gain with 10.0-GHz wide operating bandwidth and good impedance matching conditions. A saturated output power (\u0000<inline-formula> <tex-math>${P} _{text {sat}}$ </tex-math></inline-formula>\u0000) of 7.25 W with peak power-added efficiency (PAE) exceeding 38.7% is achieved. The proposed DPA obtains around 1.54–2.16-W/mm2 power density associated with an average PAE of 34.5% over the entire frequency range.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141780047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
iEDCL: Streamlined, False-Error-Free Error Detection and Correction Scheme in a Near-Threshold Enabled 32-bit Processor iEDCL:在支持近阈值的 32 位处理器中简化的无误差检测和纠错方案
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-06-17 DOI: 10.1109/TVLSI.2024.3409315
Runze Yu;Zhenhao Li;Xi Deng;Zhaoxu Wang;Wei Jia;Haoming Zhang;Zhenglin Liu
This article presents internal error detection, correction, and latching (iEDCL), a designer-friendly, fully functional error detection and correction (EDAC) approach tailored for energy-efficient near-threshold systems capable of tolerating variations. It embeds error detection (ED), correction, and latching circuits within a flip-flop (FF) with an additional 15 transistors to monitor critical paths. Notably, iEDCL’s error-aware capability remains stable despite clock latency and parasitic effects, relieving designers of extensive involvement and eliminating false errors. iEDCL is automatedly implemented in an ARM Cortex-M0 processor at 55 nm without extra architecture modifications, incurring only a 6.78% area overhead. An adaptive voltage scaling (AVS) loop enables automatic operation, achieving high energy efficiency beyond the point of the first failure while maintaining a predefined error rate. Measurement results obtained from different dies at various temperatures demonstrate significant energy savings achieved by the iEDCL processor, with up to 16.9% and 49.1% reductions compared to critical baseline and signoff designs, respectively, while maintaining a 5% error rate at a 16 MHz frequency. To the best of our knowledge, this article presents one of the first FF EDAC implementations fully operational without potential false errors at near-threshold voltages while enhancing energy efficiency.
本文介绍了内部错误检测、纠正和锁存 (iEDCL),这是一种对设计人员友好的全功能错误检测和纠正 (EDAC) 方法,专为能够容许变化的高能效近阈值系统量身定制。它将错误检测 (ED)、纠正和锁存电路嵌入一个触发器 (FF),并增加了 15 个晶体管来监控关键路径。值得注意的是,尽管存在时钟延迟和寄生效应,iEDCL 的错误感知能力仍能保持稳定,从而减轻了设计人员的大量工作,并消除了错误。iEDCL 在 ARM Cortex-M0 处理器中以 55 纳米工艺自动实现,无需额外的架构修改,仅产生 6.78% 的面积开销。自适应电压缩放(AVS)环路实现了自动运行,在保持预定误差率的同时,实现了首次故障点之后的高能效。在不同温度下从不同芯片获得的测量结果表明,iEDCL 处理器实现了显著的节能效果,与临界基线和签名设计相比,分别降低了 16.9% 和 49.1%,同时在 16 MHz 频率下保持了 5% 的错误率。据我们所知,这篇文章介绍了首批 FF EDAC 实现之一,在提高能效的同时,在近阈值电压下没有潜在的误差。
{"title":"iEDCL: Streamlined, False-Error-Free Error Detection and Correction Scheme in a Near-Threshold Enabled 32-bit Processor","authors":"Runze Yu;Zhenhao Li;Xi Deng;Zhaoxu Wang;Wei Jia;Haoming Zhang;Zhenglin Liu","doi":"10.1109/TVLSI.2024.3409315","DOIUrl":"10.1109/TVLSI.2024.3409315","url":null,"abstract":"This article presents internal error detection, correction, and latching (iEDCL), a designer-friendly, fully functional error detection and correction (EDAC) approach tailored for energy-efficient near-threshold systems capable of tolerating variations. It embeds error detection (ED), correction, and latching circuits within a flip-flop (FF) with an additional 15 transistors to monitor critical paths. Notably, iEDCL’s error-aware capability remains stable despite clock latency and parasitic effects, relieving designers of extensive involvement and eliminating false errors. iEDCL is automatedly implemented in an ARM Cortex-M0 processor at 55 nm without extra architecture modifications, incurring only a 6.78% area overhead. An adaptive voltage scaling (AVS) loop enables automatic operation, achieving high energy efficiency beyond the point of the first failure while maintaining a predefined error rate. Measurement results obtained from different dies at various temperatures demonstrate significant energy savings achieved by the iEDCL processor, with up to 16.9% and 49.1% reductions compared to critical baseline and signoff designs, respectively, while maintaining a 5% error rate at a 16 MHz frequency. To the best of our knowledge, this article presents one of the first FF EDAC implementations fully operational without potential false errors at near-threshold voltages while enhancing energy efficiency.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141780238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Hardware and Software Co-Design for Energy-Efficient Neural Network Accelerator With Multiplication-Less Folded-Accumulative PE for Radar-Based Hand Gesture Recognition 针对基于雷达的手势识别,采用无乘法折叠累积 PE 的高能效神经网络加速器的软硬件协同设计
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-06-17 DOI: 10.1109/TVLSI.2024.3409674
Fan Li;Yunqi Guan;Wenbin Ye
This work presents a novel lightweight neural network (NN) model and a dedicated NN accelerator for radar-based hand gesture recognition (HGR). The NN model employs symmetric weights, group 1-D-convolution, and power-of-two (POT) quantization, achieving 92.84% accuracy on a public dataset with only 4.8 k parameters, while reducing parameter storage by 40%. The custom accelerator features a multiplication-less folded-accumulative processing element (PE), group-wise computation optimization, and an efficient scheduling mechanism for fully connected (FC) layers. Implemented on a Xilinx field-programmable gate array (FPGA) board XC7S15 and 65-nm CMOS technology, it surpasses existing solutions in power efficiency and cost-effectiveness, addressing the computational demands for IoT deployment.
本研究提出了一种新型轻量级神经网络(NN)模型和专用 NN 加速器,用于基于雷达的手势识别(HGR)。该神经网络模型采用了对称权重、组一维卷积和二乘幂(POT)量化技术,在仅有 4.8 k 个参数的公共数据集上实现了 92.84% 的准确率,同时将参数存储量减少了 40%。定制加速器具有无乘法折叠累积处理元件(PE)、分组计算优化和全连接(FC)层的高效调度机制。它采用赛灵思现场可编程门阵列(FPGA)板 XC7S15 和 65 纳米 CMOS 技术实现,在能效和成本效益方面超越了现有解决方案,满足了物联网部署的计算需求。
{"title":"A Hardware and Software Co-Design for Energy-Efficient Neural Network Accelerator With Multiplication-Less Folded-Accumulative PE for Radar-Based Hand Gesture Recognition","authors":"Fan Li;Yunqi Guan;Wenbin Ye","doi":"10.1109/TVLSI.2024.3409674","DOIUrl":"10.1109/TVLSI.2024.3409674","url":null,"abstract":"This work presents a novel lightweight neural network (NN) model and a dedicated NN accelerator for radar-based hand gesture recognition (HGR). The NN model employs symmetric weights, group 1-D-convolution, and power-of-two (POT) quantization, achieving 92.84% accuracy on a public dataset with only 4.8 k parameters, while reducing parameter storage by 40%. The custom accelerator features a multiplication-less folded-accumulative processing element (PE), group-wise computation optimization, and an efficient scheduling mechanism for fully connected (FC) layers. Implemented on a Xilinx field-programmable gate array (FPGA) board XC7S15 and 65-nm CMOS technology, it surpasses existing solutions in power efficiency and cost-effectiveness, addressing the computational demands for IoT deployment.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141939053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhanced-Linearity Wideband Full-Duplex Receiver With Shared Self-Interference Canceller 带共享自干扰消除器的增强线性宽带全双工接收器
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-06-12 DOI: 10.1109/TVLSI.2024.3410010
Fan Chen;Wei Li;Chuangguo Wang;Yunyou Pu;Xingyu Ma;Shijiao Dong;Yun Wang;Hongtao Xu
A wideband full-duplex (FD) receiver with enhanced-linearity technique and shared self-interference cancellation (SIC) is implemented in a 40-nm CMOS process. By combining Hilbert-transform-equalization (HTE)-based self-interference (SI) canceller and translational loop, an FD receiver with RF domain cancellation is presented with an extra auxiliary cancellation path by reusing the mixer in the translational loop. By introducing the auxiliary path, the influence of SI circuit to receiver front end is minimized. Meanwhile, a self-loaded linearization technique with acceptable noise degradation and extra power consumption is proposed to be employed in the FD receiver for both receiver and SI canceller. Due to the 2-D regulation, such a technique can achieve a relatively robust linearity improvement and bring flexibility to circuit design. The measurement results show that the proposed FD receiver operates across 0.8–3.5 GHz with a gain of 29.0–31.8 dB and a noise figure of 3.68–5.23 dB. The proposed linearization technique achieves 3.2–4.7-dB linearity improvement for receiver with only 0.45–0.64-dB NF degradation. In addition, the canceller with the proposed linearization method achieves RF domain delays ranging from 1.59 to 4.03 ns while demonstrating more than 6.33-dB linearity improvement. With the implementation of self-loaded technique and shared SIC, a greater than 23.4-dB RF domain SI suppression is measured across 40-MHz bandwidth (BW) with 64-QAM modulated signals in a circulator-based setup for the SIC scheme in this work with RX noise degradation of less than 1.38 dB.
在 40 纳米 CMOS 工艺中实现了具有增强线性技术和共享自干扰消除(SIC)功能的宽带全双工(FD)接收器。通过将基于希尔伯特变换均衡(HTE)的自干扰(SI)消除器与平移环路相结合,提出了一种具有射频域消除功能的 FD 接收器,并通过在平移环路中重复使用混频器提供了一条额外的辅助消除路径。通过引入辅助路径,SI 电路对接收器前端的影响降到了最低。同时,还提出了一种可接受噪声衰减和额外功耗的自加载线性化技术,可在 FD 接收器中同时用于接收器和 SI 消除器。由于采用了二维调节,这种技术可以实现相对稳健的线性改进,并为电路设计带来灵活性。测量结果表明,建议的 FD 接收器工作频率为 0.8-3.5 GHz,增益为 29.0-31.8 dB,噪声系数为 3.68-5.23 dB。所提出的线性化技术使接收器的线性度提高了 3.2-4.7 分贝,而噪声系数仅降低了 0.45-0.64 分贝。此外,采用拟议线性化方法的消除器可实现 1.59 至 4.03 ns 的射频域延迟,同时线性度提高了 6.33 分贝。采用自加载技术和共享 SIC 后,在基于环行器的设置中,本研究中的 SIC 方案在 40-MHz 带宽 (BW) 上使用 64-QAM 调制信号测得的射频域 SI 抑制大于 23.4dB,RX 噪声衰减小于 1.38dB。
{"title":"Enhanced-Linearity Wideband Full-Duplex Receiver With Shared Self-Interference Canceller","authors":"Fan Chen;Wei Li;Chuangguo Wang;Yunyou Pu;Xingyu Ma;Shijiao Dong;Yun Wang;Hongtao Xu","doi":"10.1109/TVLSI.2024.3410010","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3410010","url":null,"abstract":"A wideband full-duplex (FD) receiver with enhanced-linearity technique and shared self-interference cancellation (SIC) is implemented in a 40-nm CMOS process. By combining Hilbert-transform-equalization (HTE)-based self-interference (SI) canceller and translational loop, an FD receiver with RF domain cancellation is presented with an extra auxiliary cancellation path by reusing the mixer in the translational loop. By introducing the auxiliary path, the influence of SI circuit to receiver front end is minimized. Meanwhile, a self-loaded linearization technique with acceptable noise degradation and extra power consumption is proposed to be employed in the FD receiver for both receiver and SI canceller. Due to the 2-D regulation, such a technique can achieve a relatively robust linearity improvement and bring flexibility to circuit design. The measurement results show that the proposed FD receiver operates across 0.8–3.5 GHz with a gain of 29.0–31.8 dB and a noise figure of 3.68–5.23 dB. The proposed linearization technique achieves 3.2–4.7-dB linearity improvement for receiver with only 0.45–0.64-dB NF degradation. In addition, the canceller with the proposed linearization method achieves RF domain delays ranging from 1.59 to 4.03 ns while demonstrating more than 6.33-dB linearity improvement. With the implementation of self-loaded technique and shared SIC, a greater than 23.4-dB RF domain SI suppression is measured across 40-MHz bandwidth (BW) with 64-QAM modulated signals in a circulator-based setup for the SIC scheme in this work with RX noise degradation of less than 1.38 dB.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142077652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1