Pub Date : 2025-08-26DOI: 10.1109/TVLSI.2025.3582145
Heng Zhang;Wenhe Yin;Sunan He;Yuan Du;Li Du
In the above article [1], the die photograph on the right side of original Fig. 9 was inadvertently mirrored horizontally, as shown in Fig. 1. This occurred during the annotation process, where the image used had already been flipped without our awareness. As a result, the internal layout labeling (e.g., CIMA1, CIMA2, and ADC) appeared in reverse orientation relative to the actual die.Fig. 1.Difference clarification between the original Fig. 9 of our published article and the revised Fig. 9. Fig. 9.Die photograph and measure setup for the proposed chip.
{"title":"Corrections to “An Efficient Two-Stage Pipelined Compute-in-Memory Macro for Accelerating Transformer Feed-Forward Networks”","authors":"Heng Zhang;Wenhe Yin;Sunan He;Yuan Du;Li Du","doi":"10.1109/TVLSI.2025.3582145","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3582145","url":null,"abstract":"In the above article [1], the die photograph on the right side of original Fig. 9 was inadvertently mirrored horizontally, as shown in Fig. 1. This occurred during the annotation process, where the image used had already been flipped without our awareness. As a result, the internal layout labeling (e.g., CIMA1, CIMA2, and ADC) appeared in reverse orientation relative to the actual die.Fig. 1.Difference clarification between the original Fig. 9 of our published article and the revised Fig. 9. Fig. 9.Die photograph and measure setup for the proposed chip.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 9","pages":"2602-2602"},"PeriodicalIF":3.1,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11142530","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144904873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-26DOI: 10.1109/TVLSI.2025.3598544
{"title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information","authors":"","doi":"10.1109/TVLSI.2025.3598544","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3598544","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 9","pages":"C3-C3"},"PeriodicalIF":3.1,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11142502","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144904832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-22DOI: 10.1109/TVLSI.2025.3600042
Siying Wu;Yu Gong;Jiaao Dai;Da Song;Shouzhong Peng;Yue Zhang;You Wang;Weiqiang Liu
The rapid development of Internet of Things (IoT) devices has triggered massive data transmission. Meanwhile, advances in artificial intelligence (AI) introduce new security vulnerabilities in device interactions. These challenges demand lightweight yet robust security solutions. In this context, physical unclonable functions (PUFs) serve as critical hardware security primitives, enabling reliable authentication for edge devices. Nevertheless, PUF is increasingly susceptible to novel threats, notably machine learning attacks. To address this security vulnerability to attacks, we propose a novel double-layer dynamic challenge cross-selection magnetoresistive random access memory PUF (MPUF). This design leverages the inherent process variation in spin-transfer torque magnetoresistive random access memory (STT-MRAM) as an entropy source. The proposed structure incorporates an obfuscation decode circuit (ODC) that combines xor gates and shift registers. It dynamically obfuscates interlayer relationships between two PUF arrays to enhance circuit nonlinearity. The simulation results demonstrate uniformity of 50.16%, uniqueness of 49.94%, a worst bit error rate (BER) of 2.34% for $- 25~^{circ } $ C to $125~^{circ } $ C and 1.56% for $0.5sim 1.1$ V. In addition, four common machine learning models are used to attack this PUF, achieving accuracies of 50.49%, 50.49%, 50.48%, and 58.41%, which are close to a random guess. Compared with traditional PUF implementations, this work exhibits higher reliability and enhanced security while maintaining low power consumption of approximately 9.975 fJ/bit.
{"title":"Dynamic Challenge Cross-Selection Physical Unclonable Function Based on MRAM","authors":"Siying Wu;Yu Gong;Jiaao Dai;Da Song;Shouzhong Peng;Yue Zhang;You Wang;Weiqiang Liu","doi":"10.1109/TVLSI.2025.3600042","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3600042","url":null,"abstract":"The rapid development of Internet of Things (IoT) devices has triggered massive data transmission. Meanwhile, advances in artificial intelligence (AI) introduce new security vulnerabilities in device interactions. These challenges demand lightweight yet robust security solutions. In this context, physical unclonable functions (PUFs) serve as critical hardware security primitives, enabling reliable authentication for edge devices. Nevertheless, PUF is increasingly susceptible to novel threats, notably machine learning attacks. To address this security vulnerability to attacks, we propose a novel double-layer dynamic challenge cross-selection magnetoresistive random access memory PUF (MPUF). This design leverages the inherent process variation in spin-transfer torque magnetoresistive random access memory (STT-MRAM) as an entropy source. The proposed structure incorporates an obfuscation decode circuit (ODC) that combines <sc>xor</small> gates and shift registers. It dynamically obfuscates interlayer relationships between two PUF arrays to enhance circuit nonlinearity. The simulation results demonstrate uniformity of 50.16%, uniqueness of 49.94%, a worst bit error rate (BER) of 2.34% for <inline-formula> <tex-math>$- 25~^{circ } $ </tex-math></inline-formula>C to <inline-formula> <tex-math>$125~^{circ } $ </tex-math></inline-formula>C and 1.56% for <inline-formula> <tex-math>$0.5sim 1.1$ </tex-math></inline-formula> V. In addition, four common machine learning models are used to attack this PUF, achieving accuracies of 50.49%, 50.49%, 50.48%, and 58.41%, which are close to a random guess. Compared with traditional PUF implementations, this work exhibits higher reliability and enhanced security while maintaining low power consumption of approximately 9.975 fJ/bit.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 12","pages":"3510-3514"},"PeriodicalIF":3.1,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145595134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-21DOI: 10.1109/TVLSI.2025.3599864
Xi Deng;Zhaoxu Wang;Zhenhao Li;Runze Yu;Haoming Zhang;Liaoyuan Li;Zhenglin Liu
This brief presents a 55-nm level shifter (LS) that enables wide voltage range conversion from 80mV to 1.2V with high energy efficiency and fast transition speed. The proposed design incorporates a complementary output buffer and an assist discharge path to suppress the short-circuit current and enhance the transition speed. A multithreshold transistor strategy is adopted to expand the input range and reduce static power. Measurement results across 15 samples demonstrate robust subthreshold performance with 4.4-ns transition delay and 49.1-fJ/transition energy during 0.3–1.2-V conversion at 1MHz. The measured average minimum convertible input voltages are 80 and 139mV at input frequencies of 50kHz and 1MHz, respectively. The compact layout occupies only 7.96$mu $ m2. Compared to the best benchmarked prior work, the proposed LS achieves 33.8% improvement in energy-delay metrics, making it a highly efficient and scalable solution for energy-constrained systems and the Internet of Things (IoT).
{"title":"A Fast and Energy-Efficient Level Shifter With Complementary Output Buffer for Energy-Constrained Systems","authors":"Xi Deng;Zhaoxu Wang;Zhenhao Li;Runze Yu;Haoming Zhang;Liaoyuan Li;Zhenglin Liu","doi":"10.1109/TVLSI.2025.3599864","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3599864","url":null,"abstract":"This brief presents a 55-nm level shifter (LS) that enables wide voltage range conversion from 80mV to 1.2V with high energy efficiency and fast transition speed. The proposed design incorporates a complementary output buffer and an assist discharge path to suppress the short-circuit current and enhance the transition speed. A multithreshold transistor strategy is adopted to expand the input range and reduce static power. Measurement results across 15 samples demonstrate robust subthreshold performance with 4.4-ns transition delay and 49.1-fJ/transition energy during 0.3–1.2-V conversion at 1MHz. The measured average minimum convertible input voltages are 80 and 139mV at input frequencies of 50kHz and 1MHz, respectively. The compact layout occupies only 7.96<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>m<sup>2</sup>. Compared to the best benchmarked prior work, the proposed LS achieves 33.8% improvement in energy-delay metrics, making it a highly efficient and scalable solution for energy-constrained systems and the Internet of Things (IoT).","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 12","pages":"3515-3519"},"PeriodicalIF":3.1,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145595103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Traditional wake-up receiver (WuRX) systems based on voltage-domain comparators for weak signal detection can readily suffer from false triggering due to noise. To address this issue, this work analyzes the mechanism of noise-induced failure in traditional voltage-domain comparator and proposes a time-domain comparator (TDCMP) scheme based on time-domain integration. By temporally integrating the input signal, the noise immunity of the comparator is significantly enhanced. To suppress process, supply voltage, and temperature (PVT) drift in the TDCMP, this work designs a frequency-locked loop (FLL) that employs a voltage-controlled oscillator (VCO) isomorphic to the TDCMP’s voltage-controlled delay line (VCDL) for drift calibration while simultaneously providing the clock signal for the WuRX. Implemented in a 65-nm CMOS process, the core chip area is 0.23 mm2, with a total system power consumption of 20.9 nW. The measurement results demonstrate that the proposed TDCMP enhances the WuRX sensitivity by 4 dB.
{"title":"A Time-Domain Integration Comparison Scheme With Noise Immunity for Wake-Up Receivers","authors":"Hongjian Lan;Rong Zhou;Jianhang Yang;Tianxu Chen;Zhen Li;Bowen Wang;Zhangming Zhu","doi":"10.1109/TVLSI.2025.3596260","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3596260","url":null,"abstract":"Traditional wake-up receiver (WuRX) systems based on voltage-domain comparators for weak signal detection can readily suffer from false triggering due to noise. To address this issue, this work analyzes the mechanism of noise-induced failure in traditional voltage-domain comparator and proposes a time-domain comparator (TDCMP) scheme based on time-domain integration. By temporally integrating the input signal, the noise immunity of the comparator is significantly enhanced. To suppress process, supply voltage, and temperature (PVT) drift in the TDCMP, this work designs a frequency-locked loop (FLL) that employs a voltage-controlled oscillator (VCO) isomorphic to the TDCMP’s voltage-controlled delay line (VCDL) for drift calibration while simultaneously providing the clock signal for the WuRX. Implemented in a 65-nm CMOS process, the core chip area is 0.23 mm<sup>2</sup>, with a total system power consumption of 20.9 nW. The measurement results demonstrate that the proposed TDCMP enhances the WuRX sensitivity by 4 dB.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 12","pages":"3500-3504"},"PeriodicalIF":3.1,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145595141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-06DOI: 10.1109/TVLSI.2025.3593020
Jiayu Liu;Yuanhang Li;Zhengyang Huang;Chao Chen;Ruiqi Chen;Bruno da Silva
Sample entropy (SampEn) is an algorithm within information entropy that enables effective analysis of biological signals. Due to the need for extensive similarity matching operations, the SampEn calculation process is time-consuming. Although a series of fast SampEn algorithms have been proposed, they remain time-intensive when processing large data volumes. Additionally, previous field-programmable gate array (FPGA)-based hardware accelerators designed for SampEn suffer from architectural design limitations, consuming substantial on-chip memory resources and operating at low frequencies. In this article, we propose FASE, an FPGA-based accelerator for lightweight sample entropy (LW-SampEn) with Monte Carlo (MC) sampling. The FASE design comprises two main parts: algorithm and hardware optimizations. On the algorithmic side, we introduce MC sampling into the merge-sort-based LW-SampEn algorithm, named MCLW-SampEn. MCLW-SampEn effectively reduces the computation load for large data volumes while maintaining algorithmic accuracy. For hardware, we first design efficient sorting and allocation modules to address boundary localization and load imbalance issues in previous accelerator designs. Then, we replicate the computation across the main phases to enable parallel processing. Finally, we deploy the design on the Pynq-Z2 board for validation. Experimental results show that the proposed MCLW-SampEn algorithm achieves an average speed up of $3times $ over the LW-SampEn algorithm, with accuracy losses kept within 0.5%. Compared to state-of-the-art (SOTA) designs, FASE achieves an average speed up of $12.8times $ while reducing power consumption by 89.3%. Ablation studies indicate that, for the same algorithm, FASE offers a $7.4times $ speedup over related FPGA designs.
{"title":"FASE: An FPGA-Based Accelerator for Lightweight Sample Entropy With Monte Carlo Sampling","authors":"Jiayu Liu;Yuanhang Li;Zhengyang Huang;Chao Chen;Ruiqi Chen;Bruno da Silva","doi":"10.1109/TVLSI.2025.3593020","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3593020","url":null,"abstract":"Sample entropy (SampEn) is an algorithm within information entropy that enables effective analysis of biological signals. Due to the need for extensive similarity matching operations, the SampEn calculation process is time-consuming. Although a series of fast SampEn algorithms have been proposed, they remain time-intensive when processing large data volumes. Additionally, previous field-programmable gate array (FPGA)-based hardware accelerators designed for SampEn suffer from architectural design limitations, consuming substantial on-chip memory resources and operating at low frequencies. In this article, we propose FASE, an FPGA-based accelerator for lightweight sample entropy (LW-SampEn) with Monte Carlo (MC) sampling. The FASE design comprises two main parts: algorithm and hardware optimizations. On the algorithmic side, we introduce MC sampling into the merge-sort-based LW-SampEn algorithm, named MCLW-SampEn. MCLW-SampEn effectively reduces the computation load for large data volumes while maintaining algorithmic accuracy. For hardware, we first design efficient sorting and allocation modules to address boundary localization and load imbalance issues in previous accelerator designs. Then, we replicate the computation across the main phases to enable parallel processing. Finally, we deploy the design on the Pynq-Z2 board for validation. Experimental results show that the proposed MCLW-SampEn algorithm achieves an average speed up of <inline-formula> <tex-math>$3times $ </tex-math></inline-formula> over the LW-SampEn algorithm, with accuracy losses kept within 0.5%. Compared to state-of-the-art (SOTA) designs, FASE achieves an average speed up of <inline-formula> <tex-math>$12.8times $ </tex-math></inline-formula> while reducing power consumption by 89.3%. Ablation studies indicate that, for the same algorithm, FASE offers a <inline-formula> <tex-math>$7.4times $ </tex-math></inline-formula> speedup over related FPGA designs.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 10","pages":"2883-2896"},"PeriodicalIF":3.1,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145141710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-06DOI: 10.1109/TVLSI.2025.3589984
Rafael da Silva;Pedro T. L. Pereira;Mateus Grellert;Ricardo Reis
The versatile video coding (VVC) standard introduces several innovative tools designed to enhance coding efficiency compared with its predecessors. One example is the adoption of an alternative filter for fractional motion estimation (FME) that is part of the adaptive motion vector resolution (AMVR) extension. While this allows a more precise motion representation, it also incurs more complexity for hardware implementations that aim at supporting most of VVC features. This work introduces a low-power hardware architecture accelerator specifically designed for FME with support for the AMVR extension of VVC. The proposed solution enables a systematic exploration of the design space of cross-layer approximate computing by combining approximations at both the operator and algorithm levels. This is achieved through the design of two novel architectures (2TAxA/4T and 2TAxA/2TAxA) alongside a newly proposed approximate filter applicable to both regular and alternative interpolation modes. Furthermore, we evaluate eight different approximate adder (AA) topologies to optimize power–quality tradeoffs. Experimental results demonstrate that for a complete FME multifilter interpolation unit (MIU) and maintaining an image quality threshold of $text {SSIM} geq 0.88$ , our method achieves up to 72% power savings and 59.64% area savings.
通用视频编码(VVC)标准引入了几个创新的工具,旨在提高编码效率与它的前辈相比。一个例子是为分数运动估计(FME)采用替代滤波器,这是自适应运动矢量分辨率(AMVR)扩展的一部分。虽然这允许更精确的运动表示,但它也为旨在支持大多数VVC功能的硬件实现带来了更多的复杂性。本文介绍了一种专为FME设计的低功耗硬件架构加速器,支持VVC的AMVR扩展。提出的解决方案可以通过结合算子和算法级别的近似来系统地探索跨层近似计算的设计空间。这是通过设计两种新颖的架构(2TAxA/4T和2TAxA/2TAxA)以及新提出的适用于常规和替代插值模式的近似滤波器来实现的。此外,我们评估了八种不同的近似加法器(AA)拓扑,以优化电能质量权衡。实验结果表明,对于一个完整的FME多滤波器插值单元(MIU),在保持图像质量阈值$text {SSIM} geq 0.88$的情况下,我们的方法达到了72% power savings and 59.64% area savings.
{"title":"Cross-Layer Approximate Design of Low-Power Fractional Motion Estimation Accelerators for VVC","authors":"Rafael da Silva;Pedro T. L. Pereira;Mateus Grellert;Ricardo Reis","doi":"10.1109/TVLSI.2025.3589984","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3589984","url":null,"abstract":"The versatile video coding (VVC) standard introduces several innovative tools designed to enhance coding efficiency compared with its predecessors. One example is the adoption of an alternative filter for fractional motion estimation (FME) that is part of the adaptive motion vector resolution (AMVR) extension. While this allows a more precise motion representation, it also incurs more complexity for hardware implementations that aim at supporting most of VVC features. This work introduces a low-power hardware architecture accelerator specifically designed for FME with support for the AMVR extension of VVC. The proposed solution enables a systematic exploration of the design space of cross-layer approximate computing by combining approximations at both the operator and algorithm levels. This is achieved through the design of two novel architectures (<italic>2TAxA/4T</i> and <italic>2TAxA/2TAxA</i>) alongside a newly proposed approximate filter applicable to both regular and alternative interpolation modes. Furthermore, we evaluate eight different approximate adder (AA) topologies to optimize power–quality tradeoffs. Experimental results demonstrate that for a complete FME multifilter interpolation unit (MIU) and maintaining an image quality threshold of <inline-formula> <tex-math>$text {SSIM} geq 0.88$ </tex-math></inline-formula>, our method achieves up to 72% power savings and 59.64% area savings.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 9","pages":"2415-2423"},"PeriodicalIF":3.1,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144904817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-06DOI: 10.1109/TVLSI.2025.3592906
Zhenhao Li;Runze Yu;Xi Deng;Zhaoxu Wang;Haoming Zhang;Zhenglin Liu
Traditional processors require substantial design margins to account for process, voltage, temperature, and aging (PVTA) variations, resulting in significant energy efficiency losses. Existing dynamic timing error detection and correction (EDaC) techniques reduce these margins but incur high area overhead and design complexity. In this brief, we propose a processor based on the RI5CY core that integrates a PVTA tolerance and voltage droop mitigation adaptive voltage frequency scaling (AVFS) system. This approach reduces overhead to only 0.065%, enables ultrafine-grained frequency adaptation, and actively mitigates abrupt voltage droops. Additionally, a novel baud rate adaptive UART (BRA-UART) module ensures robust communication across all frequencies. Our processor design achieves a 163% typical performance gain and a 37.7% power reduction in the logic circuitry at near-threshold voltage (NTV), substantially improving energy efficiency.
{"title":"An Efficient Wide-Voltage Processor With PVTA Tolerance, Voltage Droop Mitigation, and Runtime Ultrafine-Grained Frequency Adaptation","authors":"Zhenhao Li;Runze Yu;Xi Deng;Zhaoxu Wang;Haoming Zhang;Zhenglin Liu","doi":"10.1109/TVLSI.2025.3592906","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3592906","url":null,"abstract":"Traditional processors require substantial design margins to account for process, voltage, temperature, and aging (PVTA) variations, resulting in significant energy efficiency losses. Existing dynamic timing error detection and correction (EDaC) techniques reduce these margins but incur high area overhead and design complexity. In this brief, we propose a processor based on the RI5CY core that integrates a PVTA tolerance and voltage droop mitigation adaptive voltage frequency scaling (AVFS) system. This approach reduces overhead to only 0.065%, enables ultrafine-grained frequency adaptation, and actively mitigates abrupt voltage droops. Additionally, a novel baud rate adaptive UART (BRA-UART) module ensures robust communication across all frequencies. Our processor design achieves a 163% typical performance gain and a 37.7% power reduction in the logic circuitry at near-threshold voltage (NTV), substantially improving energy efficiency.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 11","pages":"3196-3200"},"PeriodicalIF":3.1,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145398671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-06DOI: 10.1109/TVLSI.2025.3592460
Nengxu Zhu;Yiting Zhang;Fanyi Meng
This brief presents a novel CMOS compact, low-loss digital attenuator chip for low-power, scalable phased array systems. To achieve low-loss amplitude control at high-millimeter-wave (mmW) frequencies with minimal area overhead, the proposed design employs a novel series triple coupled transformer (STCT) structure, integrated with an impedance-tunable network (ITN) to enable multibit control within a compact footprint. Fabricated using a 65-nm-bulk CMOS technology, the 5-bit attenuator chip demonstrates an insertion loss (IL) as low as 3.3 dB across the 110–130 GHz with a 0.5-dB step resolution and an amplitude control range of 15.5 dB. The measured rms amplitude and phase errors are minimized to 0.4 dB and 5.2°, respectively. The core area of the chip is only 0.06 mm2.
{"title":"A Novel Low-Loss CMOS Digital Step Attenuator for Low-Power Scalable Phased Array Systems","authors":"Nengxu Zhu;Yiting Zhang;Fanyi Meng","doi":"10.1109/TVLSI.2025.3592460","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3592460","url":null,"abstract":"This brief presents a novel CMOS compact, low-loss digital attenuator chip for low-power, scalable phased array systems. To achieve low-loss amplitude control at high-millimeter-wave (mmW) frequencies with minimal area overhead, the proposed design employs a novel series triple coupled transformer (STCT) structure, integrated with an impedance-tunable network (ITN) to enable multibit control within a compact footprint. Fabricated using a 65-nm-bulk CMOS technology, the 5-bit attenuator chip demonstrates an insertion loss (IL) as low as 3.3 dB across the 110–130 GHz with a 0.5-dB step resolution and an amplitude control range of 15.5 dB. The measured rms amplitude and phase errors are minimized to 0.4 dB and 5.2°, respectively. The core area of the chip is only 0.06 mm<sup>2</sup>.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 11","pages":"3191-3195"},"PeriodicalIF":3.1,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145398670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The estimation model-based alternate test strategy for analog integration circuits (ICs) offers an effective way to reduce test costs. However, as a data-driven method, the estimation process of alternate test is invisible, leading to low test reliability. Hence, to address this problem, we propose an adaptive confidence interval-based alternate test (ACIT) to enhance the reliability of alternate test. Multiestimators are implemented to generate target parameters of each circuit synchronously. All estimations for the same sample are averaged to get the final result. The reliability of each final estimation is evaluated by comparing its adaptive confidence interval to the correlated parameter boundary. The proposed adaptive confidence interval is obtained from the variance of multioutputs and estimation-boundary distance. Estimations with confidence intervals crossing boundaries are identified as suspect results and returned to repeat testing by the conventional approach. The remaining results are classified as “pass” (entire confidence intervals within the qualified range) or “fail” (entire confidence intervals outside of the qualified range). Our approach is studied with simulation data and verified on commercial ICs. Results demonstrate that the ACIT can eliminate the misclassification circuits effectively by identifying unreliable estimations.
{"title":"Adaptive Confidence Interval-Based Alternate Test for Reliability Enhancement","authors":"Jiaming Zhao;Shibo Chen;Naixin Zhou;Yijiu Zhao;Guibing Zhu","doi":"10.1109/TVLSI.2025.3591196","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3591196","url":null,"abstract":"The estimation model-based alternate test strategy for analog integration circuits (ICs) offers an effective way to reduce test costs. However, as a data-driven method, the estimation process of alternate test is invisible, leading to low test reliability. Hence, to address this problem, we propose an adaptive confidence interval-based alternate test (ACIT) to enhance the reliability of alternate test. Multiestimators are implemented to generate target parameters of each circuit synchronously. All estimations for the same sample are averaged to get the final result. The reliability of each final estimation is evaluated by comparing its adaptive confidence interval to the correlated parameter boundary. The proposed adaptive confidence interval is obtained from the variance of multioutputs and estimation-boundary distance. Estimations with confidence intervals crossing boundaries are identified as suspect results and returned to repeat testing by the conventional approach. The remaining results are classified as “pass” (entire confidence intervals within the qualified range) or “fail” (entire confidence intervals outside of the qualified range). Our approach is studied with simulation data and verified on commercial ICs. Results demonstrate that the ACIT can eliminate the misclassification circuits effectively by identifying unreliable estimations.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 11","pages":"3176-3185"},"PeriodicalIF":3.1,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145398667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}