首页 > 最新文献

IEEE Transactions on Very Large Scale Integration (VLSI) Systems最新文献

英文 中文
Corrections to “An Efficient Two-Stage Pipelined Compute-in-Memory Macro for Accelerating Transformer Feed-Forward Networks” 对“用于加速变压器前馈网络的高效两阶段流水线内存计算宏”的修正
IF 3.1 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-08-26 DOI: 10.1109/TVLSI.2025.3582145
Heng Zhang;Wenhe Yin;Sunan He;Yuan Du;Li Du
In the above article [1], the die photograph on the right side of original Fig. 9 was inadvertently mirrored horizontally, as shown in Fig. 1. This occurred during the annotation process, where the image used had already been flipped without our awareness. As a result, the internal layout labeling (e.g., CIMA1, CIMA2, and ADC) appeared in reverse orientation relative to the actual die.Fig. 1.Difference clarification between the original Fig. 9 of our published article and the revised Fig. 9. Fig. 9.Die photograph and measure setup for the proposed chip.
在上述文章[1]中,原图9右侧的模具照片无意中被水平镜像,如图1所示。这发生在注释过程中,使用的图像在我们没有意识到的情况下已经被翻转了。因此,内部布局标签(例如,CIMA1, CIMA2和ADC)相对于实际模具出现相反的方向。1.我们发表文章的原始图9与修改后的图9的差异澄清。图9所示。所提出的芯片的模具照片和测量设置。
{"title":"Corrections to “An Efficient Two-Stage Pipelined Compute-in-Memory Macro for Accelerating Transformer Feed-Forward Networks”","authors":"Heng Zhang;Wenhe Yin;Sunan He;Yuan Du;Li Du","doi":"10.1109/TVLSI.2025.3582145","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3582145","url":null,"abstract":"In the above article [1], the die photograph on the right side of original Fig. 9 was inadvertently mirrored horizontally, as shown in Fig. 1. This occurred during the annotation process, where the image used had already been flipped without our awareness. As a result, the internal layout labeling (e.g., CIMA1, CIMA2, and ADC) appeared in reverse orientation relative to the actual die.Fig. 1.Difference clarification between the original Fig. 9 of our published article and the revised Fig. 9. Fig. 9.Die photograph and measure setup for the proposed chip.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 9","pages":"2602-2602"},"PeriodicalIF":3.1,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11142530","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144904873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information 超大规模集成电路(VLSI)系统学报
IF 3.1 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-08-26 DOI: 10.1109/TVLSI.2025.3598544
{"title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information","authors":"","doi":"10.1109/TVLSI.2025.3598544","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3598544","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 9","pages":"C3-C3"},"PeriodicalIF":3.1,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11142502","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144904832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic Challenge Cross-Selection Physical Unclonable Function Based on MRAM 基于MRAM的动态挑战交叉选择物理不可克隆函数
IF 3.1 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-08-22 DOI: 10.1109/TVLSI.2025.3600042
Siying Wu;Yu Gong;Jiaao Dai;Da Song;Shouzhong Peng;Yue Zhang;You Wang;Weiqiang Liu
The rapid development of Internet of Things (IoT) devices has triggered massive data transmission. Meanwhile, advances in artificial intelligence (AI) introduce new security vulnerabilities in device interactions. These challenges demand lightweight yet robust security solutions. In this context, physical unclonable functions (PUFs) serve as critical hardware security primitives, enabling reliable authentication for edge devices. Nevertheless, PUF is increasingly susceptible to novel threats, notably machine learning attacks. To address this security vulnerability to attacks, we propose a novel double-layer dynamic challenge cross-selection magnetoresistive random access memory PUF (MPUF). This design leverages the inherent process variation in spin-transfer torque magnetoresistive random access memory (STT-MRAM) as an entropy source. The proposed structure incorporates an obfuscation decode circuit (ODC) that combines xor gates and shift registers. It dynamically obfuscates interlayer relationships between two PUF arrays to enhance circuit nonlinearity. The simulation results demonstrate uniformity of 50.16%, uniqueness of 49.94%, a worst bit error rate (BER) of 2.34% for $- 25~^{circ } $ C to $125~^{circ } $ C and 1.56% for $0.5sim 1.1$ V. In addition, four common machine learning models are used to attack this PUF, achieving accuracies of 50.49%, 50.49%, 50.48%, and 58.41%, which are close to a random guess. Compared with traditional PUF implementations, this work exhibits higher reliability and enhanced security while maintaining low power consumption of approximately 9.975 fJ/bit.
物联网(IoT)设备的快速发展引发了海量数据传输。与此同时,人工智能(AI)的进步在设备交互中引入了新的安全漏洞。这些挑战需要轻量级但强大的安全解决方案。在这种情况下,物理不可克隆功能(puf)作为关键的硬件安全原语,为边缘设备提供可靠的身份验证。然而,PUF越来越容易受到新的威胁,特别是机器学习攻击。为了解决这一安全漏洞,我们提出了一种新的双层动态挑战交叉选择磁阻随机存取存储器PUF (MPUF)。本设计利用自旋传递转矩磁阻随机存取存储器(STT-MRAM)固有的过程变化作为熵源。所提出的结构包含一个混淆解码电路(ODC),它结合了xor门和移位寄存器。它动态模糊了两个PUF阵列之间的层间关系,以增强电路的非线性。仿真结果表明,均匀性为50.16%,唯一性为49.94%,$- 25~^{circ} $ C至$125~^{circ} $ C的最坏误码率(BER)为2.34%,$0.5sim 1.1$ v的最坏误码率为1.56%。此外,使用四种常见的机器学习模型来攻击该PUF,准确率分别为50.49%,50.49%,50.48%和58.41%,接近随机猜测。与传统的PUF实现相比,该工作具有更高的可靠性和增强的安全性,同时保持约9.975 fJ/bit的低功耗。
{"title":"Dynamic Challenge Cross-Selection Physical Unclonable Function Based on MRAM","authors":"Siying Wu;Yu Gong;Jiaao Dai;Da Song;Shouzhong Peng;Yue Zhang;You Wang;Weiqiang Liu","doi":"10.1109/TVLSI.2025.3600042","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3600042","url":null,"abstract":"The rapid development of Internet of Things (IoT) devices has triggered massive data transmission. Meanwhile, advances in artificial intelligence (AI) introduce new security vulnerabilities in device interactions. These challenges demand lightweight yet robust security solutions. In this context, physical unclonable functions (PUFs) serve as critical hardware security primitives, enabling reliable authentication for edge devices. Nevertheless, PUF is increasingly susceptible to novel threats, notably machine learning attacks. To address this security vulnerability to attacks, we propose a novel double-layer dynamic challenge cross-selection magnetoresistive random access memory PUF (MPUF). This design leverages the inherent process variation in spin-transfer torque magnetoresistive random access memory (STT-MRAM) as an entropy source. The proposed structure incorporates an obfuscation decode circuit (ODC) that combines <sc>xor</small> gates and shift registers. It dynamically obfuscates interlayer relationships between two PUF arrays to enhance circuit nonlinearity. The simulation results demonstrate uniformity of 50.16%, uniqueness of 49.94%, a worst bit error rate (BER) of 2.34% for <inline-formula> <tex-math>$- 25~^{circ } $ </tex-math></inline-formula>C to <inline-formula> <tex-math>$125~^{circ } $ </tex-math></inline-formula>C and 1.56% for <inline-formula> <tex-math>$0.5sim 1.1$ </tex-math></inline-formula> V. In addition, four common machine learning models are used to attack this PUF, achieving accuracies of 50.49%, 50.49%, 50.48%, and 58.41%, which are close to a random guess. Compared with traditional PUF implementations, this work exhibits higher reliability and enhanced security while maintaining low power consumption of approximately 9.975 fJ/bit.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 12","pages":"3510-3514"},"PeriodicalIF":3.1,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145595134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Fast and Energy-Efficient Level Shifter With Complementary Output Buffer for Energy-Constrained Systems 基于互补输出缓冲器的能量约束系统快速节能电平移位器
IF 3.1 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-08-21 DOI: 10.1109/TVLSI.2025.3599864
Xi Deng;Zhaoxu Wang;Zhenhao Li;Runze Yu;Haoming Zhang;Liaoyuan Li;Zhenglin Liu
This brief presents a 55-nm level shifter (LS) that enables wide voltage range conversion from 80mV to 1.2V with high energy efficiency and fast transition speed. The proposed design incorporates a complementary output buffer and an assist discharge path to suppress the short-circuit current and enhance the transition speed. A multithreshold transistor strategy is adopted to expand the input range and reduce static power. Measurement results across 15 samples demonstrate robust subthreshold performance with 4.4-ns transition delay and 49.1-fJ/transition energy during 0.3–1.2-V conversion at 1MHz. The measured average minimum convertible input voltages are 80 and 139mV at input frequencies of 50kHz and 1MHz, respectively. The compact layout occupies only 7.96 $mu $ m2. Compared to the best benchmarked prior work, the proposed LS achieves 33.8% improvement in energy-delay metrics, making it a highly efficient and scalable solution for energy-constrained systems and the Internet of Things (IoT).
本文介绍了一种55纳米电平移位器(LS),它可以实现从80mV到1.2V的宽电压范围转换,具有高能效和快速转换速度。所提出的设计包括互补输出缓冲器和辅助放电路径,以抑制短路电流并提高转换速度。采用多阈值晶体管策略,扩大了输入范围,降低了静态功率。15个样品的测量结果表明,在1MHz的0.3 - 1.2 v转换过程中,亚阈值性能稳健,转换延迟为4.4 ns,转换能量为49.1 fj /。在输入频率为50kHz和1MHz时,测量到的平均最小可转换输入电压分别为80和139mV。紧凑的布局仅占7.96 $mu $ m2。与之前的最佳基准测试工作相比,所提出的LS在能量延迟指标方面提高了33.8%,使其成为能源受限系统和物联网(IoT)的高效可扩展解决方案。
{"title":"A Fast and Energy-Efficient Level Shifter With Complementary Output Buffer for Energy-Constrained Systems","authors":"Xi Deng;Zhaoxu Wang;Zhenhao Li;Runze Yu;Haoming Zhang;Liaoyuan Li;Zhenglin Liu","doi":"10.1109/TVLSI.2025.3599864","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3599864","url":null,"abstract":"This brief presents a 55-nm level shifter (LS) that enables wide voltage range conversion from 80mV to 1.2V with high energy efficiency and fast transition speed. The proposed design incorporates a complementary output buffer and an assist discharge path to suppress the short-circuit current and enhance the transition speed. A multithreshold transistor strategy is adopted to expand the input range and reduce static power. Measurement results across 15 samples demonstrate robust subthreshold performance with 4.4-ns transition delay and 49.1-fJ/transition energy during 0.3–1.2-V conversion at 1MHz. The measured average minimum convertible input voltages are 80 and 139mV at input frequencies of 50kHz and 1MHz, respectively. The compact layout occupies only 7.96<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>m<sup>2</sup>. Compared to the best benchmarked prior work, the proposed LS achieves 33.8% improvement in energy-delay metrics, making it a highly efficient and scalable solution for energy-constrained systems and the Internet of Things (IoT).","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 12","pages":"3515-3519"},"PeriodicalIF":3.1,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145595103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Time-Domain Integration Comparison Scheme With Noise Immunity for Wake-Up Receivers 唤醒接收机抗噪声时域积分比较方案
IF 3.1 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-08-13 DOI: 10.1109/TVLSI.2025.3596260
Hongjian Lan;Rong Zhou;Jianhang Yang;Tianxu Chen;Zhen Li;Bowen Wang;Zhangming Zhu
Traditional wake-up receiver (WuRX) systems based on voltage-domain comparators for weak signal detection can readily suffer from false triggering due to noise. To address this issue, this work analyzes the mechanism of noise-induced failure in traditional voltage-domain comparator and proposes a time-domain comparator (TDCMP) scheme based on time-domain integration. By temporally integrating the input signal, the noise immunity of the comparator is significantly enhanced. To suppress process, supply voltage, and temperature (PVT) drift in the TDCMP, this work designs a frequency-locked loop (FLL) that employs a voltage-controlled oscillator (VCO) isomorphic to the TDCMP’s voltage-controlled delay line (VCDL) for drift calibration while simultaneously providing the clock signal for the WuRX. Implemented in a 65-nm CMOS process, the core chip area is 0.23 mm2, with a total system power consumption of 20.9 nW. The measurement results demonstrate that the proposed TDCMP enhances the WuRX sensitivity by 4 dB.
传统的基于电压域比较器的唤醒接收机(WuRX)系统用于微弱信号检测,由于噪声容易导致误触发。针对这一问题,本文分析了传统电压域比较器噪声失效的机理,提出了一种基于时域积分的时域比较器(TDCMP)方案。通过对输入信号进行时域积分,比较器的抗噪声能力得到显著提高。为了抑制TDCMP中的过程、电源电压和温度(PVT)漂移,本工作设计了一个锁频环(FLL),该锁频环采用与TDCMP的压控延迟线(VCDL)同态的压控振荡器(VCO)进行漂移校准,同时为WuRX提供时钟信号。采用65纳米CMOS工艺实现,核心芯片面积为0.23 mm2,系统总功耗为20.9 nW。测量结果表明,TDCMP提高了4 dB的WuRX灵敏度。
{"title":"A Time-Domain Integration Comparison Scheme With Noise Immunity for Wake-Up Receivers","authors":"Hongjian Lan;Rong Zhou;Jianhang Yang;Tianxu Chen;Zhen Li;Bowen Wang;Zhangming Zhu","doi":"10.1109/TVLSI.2025.3596260","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3596260","url":null,"abstract":"Traditional wake-up receiver (WuRX) systems based on voltage-domain comparators for weak signal detection can readily suffer from false triggering due to noise. To address this issue, this work analyzes the mechanism of noise-induced failure in traditional voltage-domain comparator and proposes a time-domain comparator (TDCMP) scheme based on time-domain integration. By temporally integrating the input signal, the noise immunity of the comparator is significantly enhanced. To suppress process, supply voltage, and temperature (PVT) drift in the TDCMP, this work designs a frequency-locked loop (FLL) that employs a voltage-controlled oscillator (VCO) isomorphic to the TDCMP’s voltage-controlled delay line (VCDL) for drift calibration while simultaneously providing the clock signal for the WuRX. Implemented in a 65-nm CMOS process, the core chip area is 0.23 mm<sup>2</sup>, with a total system power consumption of 20.9 nW. The measurement results demonstrate that the proposed TDCMP enhances the WuRX sensitivity by 4 dB.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 12","pages":"3500-3504"},"PeriodicalIF":3.1,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145595141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FASE: An FPGA-Based Accelerator for Lightweight Sample Entropy With Monte Carlo Sampling FASE:一种基于fpga的蒙特卡罗采样轻量级样本熵加速器
IF 3.1 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-08-06 DOI: 10.1109/TVLSI.2025.3593020
Jiayu Liu;Yuanhang Li;Zhengyang Huang;Chao Chen;Ruiqi Chen;Bruno da Silva
Sample entropy (SampEn) is an algorithm within information entropy that enables effective analysis of biological signals. Due to the need for extensive similarity matching operations, the SampEn calculation process is time-consuming. Although a series of fast SampEn algorithms have been proposed, they remain time-intensive when processing large data volumes. Additionally, previous field-programmable gate array (FPGA)-based hardware accelerators designed for SampEn suffer from architectural design limitations, consuming substantial on-chip memory resources and operating at low frequencies. In this article, we propose FASE, an FPGA-based accelerator for lightweight sample entropy (LW-SampEn) with Monte Carlo (MC) sampling. The FASE design comprises two main parts: algorithm and hardware optimizations. On the algorithmic side, we introduce MC sampling into the merge-sort-based LW-SampEn algorithm, named MCLW-SampEn. MCLW-SampEn effectively reduces the computation load for large data volumes while maintaining algorithmic accuracy. For hardware, we first design efficient sorting and allocation modules to address boundary localization and load imbalance issues in previous accelerator designs. Then, we replicate the computation across the main phases to enable parallel processing. Finally, we deploy the design on the Pynq-Z2 board for validation. Experimental results show that the proposed MCLW-SampEn algorithm achieves an average speed up of $3times $ over the LW-SampEn algorithm, with accuracy losses kept within 0.5%. Compared to state-of-the-art (SOTA) designs, FASE achieves an average speed up of $12.8times $ while reducing power consumption by 89.3%. Ablation studies indicate that, for the same algorithm, FASE offers a $7.4times $ speedup over related FPGA designs.
样本熵(SampEn)是信息熵中的一种算法,能够有效地分析生物信号。由于需要进行大量的相似度匹配操作,SampEn的计算过程非常耗时。尽管已经提出了一系列快速SampEn算法,但在处理大数据量时,它们仍然是耗时的。此外,以前为SampEn设计的基于现场可编程门阵列(FPGA)的硬件加速器受到架构设计的限制,需要消耗大量的片上内存资源,并且工作频率很低。在本文中,我们提出了FASE,一个基于fpga的轻量级样本熵加速器(LW-SampEn)和蒙特卡罗(MC)采样。FASE的设计主要包括两个部分:算法优化和硬件优化。在算法方面,我们将MC采样引入到基于合并排序的LW-SampEn算法中,命名为MCLW-SampEn。MCLW-SampEn有效地降低了大数据量的计算负荷,同时保持了算法的准确性。在硬件方面,我们首先设计了高效的排序和分配模块,以解决先前加速器设计中的边界定位和负载不平衡问题。然后,我们跨主要阶段复制计算以启用并行处理。最后,我们将设计部署在Pynq-Z2板上进行验证。实验结果表明,MCLW-SampEn算法的平均速度比LW-SampEn算法提高了3倍,精度损失保持在0.5%以内。与最先进的(SOTA)设计相比,FASE实现了12.8倍的平均速度提升,同时降低了89.3%的功耗。消融研究表明,对于相同的算法,FASE比相关FPGA设计提供了7.4倍的加速。
{"title":"FASE: An FPGA-Based Accelerator for Lightweight Sample Entropy With Monte Carlo Sampling","authors":"Jiayu Liu;Yuanhang Li;Zhengyang Huang;Chao Chen;Ruiqi Chen;Bruno da Silva","doi":"10.1109/TVLSI.2025.3593020","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3593020","url":null,"abstract":"Sample entropy (SampEn) is an algorithm within information entropy that enables effective analysis of biological signals. Due to the need for extensive similarity matching operations, the SampEn calculation process is time-consuming. Although a series of fast SampEn algorithms have been proposed, they remain time-intensive when processing large data volumes. Additionally, previous field-programmable gate array (FPGA)-based hardware accelerators designed for SampEn suffer from architectural design limitations, consuming substantial on-chip memory resources and operating at low frequencies. In this article, we propose FASE, an FPGA-based accelerator for lightweight sample entropy (LW-SampEn) with Monte Carlo (MC) sampling. The FASE design comprises two main parts: algorithm and hardware optimizations. On the algorithmic side, we introduce MC sampling into the merge-sort-based LW-SampEn algorithm, named MCLW-SampEn. MCLW-SampEn effectively reduces the computation load for large data volumes while maintaining algorithmic accuracy. For hardware, we first design efficient sorting and allocation modules to address boundary localization and load imbalance issues in previous accelerator designs. Then, we replicate the computation across the main phases to enable parallel processing. Finally, we deploy the design on the Pynq-Z2 board for validation. Experimental results show that the proposed MCLW-SampEn algorithm achieves an average speed up of <inline-formula> <tex-math>$3times $ </tex-math></inline-formula> over the LW-SampEn algorithm, with accuracy losses kept within 0.5%. Compared to state-of-the-art (SOTA) designs, FASE achieves an average speed up of <inline-formula> <tex-math>$12.8times $ </tex-math></inline-formula> while reducing power consumption by 89.3%. Ablation studies indicate that, for the same algorithm, FASE offers a <inline-formula> <tex-math>$7.4times $ </tex-math></inline-formula> speedup over related FPGA designs.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 10","pages":"2883-2896"},"PeriodicalIF":3.1,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145141710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-Layer Approximate Design of Low-Power Fractional Motion Estimation Accelerators for VVC VVC低功耗分数阶运动估计加速器的跨层近似设计
IF 3.1 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-08-06 DOI: 10.1109/TVLSI.2025.3589984
Rafael da Silva;Pedro T. L. Pereira;Mateus Grellert;Ricardo Reis
The versatile video coding (VVC) standard introduces several innovative tools designed to enhance coding efficiency compared with its predecessors. One example is the adoption of an alternative filter for fractional motion estimation (FME) that is part of the adaptive motion vector resolution (AMVR) extension. While this allows a more precise motion representation, it also incurs more complexity for hardware implementations that aim at supporting most of VVC features. This work introduces a low-power hardware architecture accelerator specifically designed for FME with support for the AMVR extension of VVC. The proposed solution enables a systematic exploration of the design space of cross-layer approximate computing by combining approximations at both the operator and algorithm levels. This is achieved through the design of two novel architectures (2TAxA/4T and 2TAxA/2TAxA) alongside a newly proposed approximate filter applicable to both regular and alternative interpolation modes. Furthermore, we evaluate eight different approximate adder (AA) topologies to optimize power–quality tradeoffs. Experimental results demonstrate that for a complete FME multifilter interpolation unit (MIU) and maintaining an image quality threshold of $text {SSIM} geq 0.88$ , our method achieves up to 72% power savings and 59.64% area savings.
通用视频编码(VVC)标准引入了几个创新的工具,旨在提高编码效率与它的前辈相比。一个例子是为分数运动估计(FME)采用替代滤波器,这是自适应运动矢量分辨率(AMVR)扩展的一部分。虽然这允许更精确的运动表示,但它也为旨在支持大多数VVC功能的硬件实现带来了更多的复杂性。本文介绍了一种专为FME设计的低功耗硬件架构加速器,支持VVC的AMVR扩展。提出的解决方案可以通过结合算子和算法级别的近似来系统地探索跨层近似计算的设计空间。这是通过设计两种新颖的架构(2TAxA/4T和2TAxA/2TAxA)以及新提出的适用于常规和替代插值模式的近似滤波器来实现的。此外,我们评估了八种不同的近似加法器(AA)拓扑,以优化电能质量权衡。实验结果表明,对于一个完整的FME多滤波器插值单元(MIU),在保持图像质量阈值$text {SSIM} geq 0.88$的情况下,我们的方法达到了72% power savings and 59.64% area savings.
{"title":"Cross-Layer Approximate Design of Low-Power Fractional Motion Estimation Accelerators for VVC","authors":"Rafael da Silva;Pedro T. L. Pereira;Mateus Grellert;Ricardo Reis","doi":"10.1109/TVLSI.2025.3589984","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3589984","url":null,"abstract":"The versatile video coding (VVC) standard introduces several innovative tools designed to enhance coding efficiency compared with its predecessors. One example is the adoption of an alternative filter for fractional motion estimation (FME) that is part of the adaptive motion vector resolution (AMVR) extension. While this allows a more precise motion representation, it also incurs more complexity for hardware implementations that aim at supporting most of VVC features. This work introduces a low-power hardware architecture accelerator specifically designed for FME with support for the AMVR extension of VVC. The proposed solution enables a systematic exploration of the design space of cross-layer approximate computing by combining approximations at both the operator and algorithm levels. This is achieved through the design of two novel architectures (<italic>2TAxA/4T</i> and <italic>2TAxA/2TAxA</i>) alongside a newly proposed approximate filter applicable to both regular and alternative interpolation modes. Furthermore, we evaluate eight different approximate adder (AA) topologies to optimize power–quality tradeoffs. Experimental results demonstrate that for a complete FME multifilter interpolation unit (MIU) and maintaining an image quality threshold of <inline-formula> <tex-math>$text {SSIM} geq 0.88$ </tex-math></inline-formula>, our method achieves up to 72% power savings and 59.64% area savings.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 9","pages":"2415-2423"},"PeriodicalIF":3.1,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144904817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Efficient Wide-Voltage Processor With PVTA Tolerance, Voltage Droop Mitigation, and Runtime Ultrafine-Grained Frequency Adaptation 具有PVTA容限、电压下降缓解和运行时超细粒度频率自适应的高效宽电压处理器
IF 3.1 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-08-06 DOI: 10.1109/TVLSI.2025.3592906
Zhenhao Li;Runze Yu;Xi Deng;Zhaoxu Wang;Haoming Zhang;Zhenglin Liu
Traditional processors require substantial design margins to account for process, voltage, temperature, and aging (PVTA) variations, resulting in significant energy efficiency losses. Existing dynamic timing error detection and correction (EDaC) techniques reduce these margins but incur high area overhead and design complexity. In this brief, we propose a processor based on the RI5CY core that integrates a PVTA tolerance and voltage droop mitigation adaptive voltage frequency scaling (AVFS) system. This approach reduces overhead to only 0.065%, enables ultrafine-grained frequency adaptation, and actively mitigates abrupt voltage droops. Additionally, a novel baud rate adaptive UART (BRA-UART) module ensures robust communication across all frequencies. Our processor design achieves a 163% typical performance gain and a 37.7% power reduction in the logic circuitry at near-threshold voltage (NTV), substantially improving energy efficiency.
传统处理器需要大量的设计余量来考虑工艺、电压、温度和老化(PVTA)的变化,从而导致显著的能效损失。现有的动态时序误差检测和校正(EDaC)技术减少了这些余量,但带来了高面积开销和设计复杂性。在本文中,我们提出了一种基于RI5CY内核的处理器,该处理器集成了PVTA容限和电压下降缓解自适应电压频率缩放(AVFS)系统。这种方法将开销降低到仅0.065%,实现了超细粒度的频率自适应,并积极地减轻了突然的电压下降。此外,一种新颖的波特率自适应UART (BRA-UART)模块确保了在所有频率上的鲁棒通信。我们的处理器设计在近阈值电压(NTV)下实现了163%的典型性能增益和37.7%的功耗降低,大大提高了能源效率。
{"title":"An Efficient Wide-Voltage Processor With PVTA Tolerance, Voltage Droop Mitigation, and Runtime Ultrafine-Grained Frequency Adaptation","authors":"Zhenhao Li;Runze Yu;Xi Deng;Zhaoxu Wang;Haoming Zhang;Zhenglin Liu","doi":"10.1109/TVLSI.2025.3592906","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3592906","url":null,"abstract":"Traditional processors require substantial design margins to account for process, voltage, temperature, and aging (PVTA) variations, resulting in significant energy efficiency losses. Existing dynamic timing error detection and correction (EDaC) techniques reduce these margins but incur high area overhead and design complexity. In this brief, we propose a processor based on the RI5CY core that integrates a PVTA tolerance and voltage droop mitigation adaptive voltage frequency scaling (AVFS) system. This approach reduces overhead to only 0.065%, enables ultrafine-grained frequency adaptation, and actively mitigates abrupt voltage droops. Additionally, a novel baud rate adaptive UART (BRA-UART) module ensures robust communication across all frequencies. Our processor design achieves a 163% typical performance gain and a 37.7% power reduction in the logic circuitry at near-threshold voltage (NTV), substantially improving energy efficiency.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 11","pages":"3196-3200"},"PeriodicalIF":3.1,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145398671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Novel Low-Loss CMOS Digital Step Attenuator for Low-Power Scalable Phased Array Systems 一种用于低功耗可扩展相控阵系统的新型低损耗CMOS数字阶跃衰减器
IF 3.1 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-08-06 DOI: 10.1109/TVLSI.2025.3592460
Nengxu Zhu;Yiting Zhang;Fanyi Meng
This brief presents a novel CMOS compact, low-loss digital attenuator chip for low-power, scalable phased array systems. To achieve low-loss amplitude control at high-millimeter-wave (mmW) frequencies with minimal area overhead, the proposed design employs a novel series triple coupled transformer (STCT) structure, integrated with an impedance-tunable network (ITN) to enable multibit control within a compact footprint. Fabricated using a 65-nm-bulk CMOS technology, the 5-bit attenuator chip demonstrates an insertion loss (IL) as low as 3.3 dB across the 110–130 GHz with a 0.5-dB step resolution and an amplitude control range of 15.5 dB. The measured rms amplitude and phase errors are minimized to 0.4 dB and 5.2°, respectively. The core area of the chip is only 0.06 mm2.
本文介绍了一种用于低功耗、可扩展相控阵系统的新型CMOS紧凑、低损耗数字衰减芯片。为了以最小的面积开销实现高毫米波(mmW)频率下的低损耗幅度控制,该设计采用了一种新颖的串联三耦合变压器(STCT)结构,与阻抗可调网络(ITN)集成在一起,在紧凑的占地面积内实现多位控制。该5位衰减器芯片采用65纳米体CMOS技术制造,在110-130 GHz范围内的插入损耗(IL)低至3.3 dB,阶跃分辨率为0.5 dB,幅度控制范围为15.5 dB。测量的均方根幅值和相位误差分别减小到0.4 dB和5.2°。芯片的核心面积仅为0.06 mm2。
{"title":"A Novel Low-Loss CMOS Digital Step Attenuator for Low-Power Scalable Phased Array Systems","authors":"Nengxu Zhu;Yiting Zhang;Fanyi Meng","doi":"10.1109/TVLSI.2025.3592460","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3592460","url":null,"abstract":"This brief presents a novel CMOS compact, low-loss digital attenuator chip for low-power, scalable phased array systems. To achieve low-loss amplitude control at high-millimeter-wave (mmW) frequencies with minimal area overhead, the proposed design employs a novel series triple coupled transformer (STCT) structure, integrated with an impedance-tunable network (ITN) to enable multibit control within a compact footprint. Fabricated using a 65-nm-bulk CMOS technology, the 5-bit attenuator chip demonstrates an insertion loss (IL) as low as 3.3 dB across the 110–130 GHz with a 0.5-dB step resolution and an amplitude control range of 15.5 dB. The measured rms amplitude and phase errors are minimized to 0.4 dB and 5.2°, respectively. The core area of the chip is only 0.06 mm<sup>2</sup>.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 11","pages":"3191-3195"},"PeriodicalIF":3.1,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145398670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Confidence Interval-Based Alternate Test for Reliability Enhancement 基于自适应置信区间的可靠性增强备用试验
IF 3.1 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-08-01 DOI: 10.1109/TVLSI.2025.3591196
Jiaming Zhao;Shibo Chen;Naixin Zhou;Yijiu Zhao;Guibing Zhu
The estimation model-based alternate test strategy for analog integration circuits (ICs) offers an effective way to reduce test costs. However, as a data-driven method, the estimation process of alternate test is invisible, leading to low test reliability. Hence, to address this problem, we propose an adaptive confidence interval-based alternate test (ACIT) to enhance the reliability of alternate test. Multiestimators are implemented to generate target parameters of each circuit synchronously. All estimations for the same sample are averaged to get the final result. The reliability of each final estimation is evaluated by comparing its adaptive confidence interval to the correlated parameter boundary. The proposed adaptive confidence interval is obtained from the variance of multioutputs and estimation-boundary distance. Estimations with confidence intervals crossing boundaries are identified as suspect results and returned to repeat testing by the conventional approach. The remaining results are classified as “pass” (entire confidence intervals within the qualified range) or “fail” (entire confidence intervals outside of the qualified range). Our approach is studied with simulation data and verified on commercial ICs. Results demonstrate that the ACIT can eliminate the misclassification circuits effectively by identifying unreliable estimations.
基于估计模型的模拟集成电路替代测试策略为降低测试成本提供了有效途径。然而,作为一种数据驱动的方法,交替测试的估计过程是不可见的,导致测试的可靠性较低。因此,为了解决这一问题,我们提出了一种基于自适应置信区间的替代检验(ACIT)来提高替代检验的信度。利用多估计器同步生成各电路的目标参数。对同一样本的所有估计取平均值以得到最终结果。通过将每个最终估计的自适应置信区间与相关参数边界进行比较来评估每个最终估计的可靠性。该自适应置信区间由多输出方差和估计边界距离得到。置信区间跨越边界的估计被识别为可疑结果,并通过常规方法返回重复测试。其余的结果被分类为“合格”(整个置信区间在合格范围内)或“不合格”(整个置信区间在合格范围外)。我们的方法用仿真数据进行了研究,并在商业集成电路上进行了验证。结果表明,该算法通过识别不可靠估计,可以有效地消除误分类电路。
{"title":"Adaptive Confidence Interval-Based Alternate Test for Reliability Enhancement","authors":"Jiaming Zhao;Shibo Chen;Naixin Zhou;Yijiu Zhao;Guibing Zhu","doi":"10.1109/TVLSI.2025.3591196","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3591196","url":null,"abstract":"The estimation model-based alternate test strategy for analog integration circuits (ICs) offers an effective way to reduce test costs. However, as a data-driven method, the estimation process of alternate test is invisible, leading to low test reliability. Hence, to address this problem, we propose an adaptive confidence interval-based alternate test (ACIT) to enhance the reliability of alternate test. Multiestimators are implemented to generate target parameters of each circuit synchronously. All estimations for the same sample are averaged to get the final result. The reliability of each final estimation is evaluated by comparing its adaptive confidence interval to the correlated parameter boundary. The proposed adaptive confidence interval is obtained from the variance of multioutputs and estimation-boundary distance. Estimations with confidence intervals crossing boundaries are identified as suspect results and returned to repeat testing by the conventional approach. The remaining results are classified as “pass” (entire confidence intervals within the qualified range) or “fail” (entire confidence intervals outside of the qualified range). Our approach is studied with simulation data and verified on commercial ICs. Results demonstrate that the ACIT can eliminate the misclassification circuits effectively by identifying unreliable estimations.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 11","pages":"3176-3185"},"PeriodicalIF":3.1,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145398667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1