IEEE Transactions on Very Large Scale Integration (VLSI) Systems最新文献_第9页

A Double-Data-Rate Ripple Counter With Calibration Circuits for Correlated Multiple Sampling in CMOS Image Sensors 带校准电路的双数据率纹波计数器，用于 CMOS 图像传感器中的相关多重采样

IF 2.8 2区工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Pub Date : 2024-09-04 DOI: 10.1109/TVLSI.2024.3449320

Wanbin Zha;Jiangtao Xu;Kaiming Nie;Zhiyuan Gao

This brief presents a double-data-rate (DDR) ripple counter with calibration circuits for correlated multiple sampling (CMS) in CMOS image sensors (CISs). This brief analyzes a specific type of least significant bit (LSB) error that obstructs the recording of the prior LSB count result during continuous counting processes when performing digital correlated double sampling (DDS) and digital CMS. This error stems from the transparent characteristic of the LSB in DDR counter and causes the increase of random noise. A calibration circuit is presented to calibrate the LSB error, which achieves carry propagation and retains the remainder by recording the result of the LSB after each quantization. The random noise is reduced by 25.5% based on simulation results in different CMS iterations after calibration. A

$1280times 1024$

prototype CIS is fabricated in a 110-nm 1P4M process. The experimental results show that at DDS mode, the CIS random noise is

$147~mu text { V}_{text {rms}}$

and ADC power consumption is

$21.07~mu $

W, whereas at CMS =2, the noise is

$118~mu text { V}_{text {rms}}$

and power consumption is

$28.07~mu $

W. In addition, the prototype CIS has a column FPN of 0.006%.

本文介绍了一种双数据速率（DDR）纹波计数器，该纹波计数器具有用于CMOS图像传感器（CISs）中相关多次采样（CMS）的校准电路。本文简要分析了在执行数字相关双采样（DDS）和数字CMS时，在连续计数过程中阻碍记录先前LSB计数结果的特定类型的最低有效位（LSB）错误。这种误差源于DDR计数器中LSB的透明特性，导致随机噪声的增加。提出了一种校正电路对LSB误差进行校正，通过记录每次量化后LSB的结果，实现携带传播并保留余数。根据标定后不同CMS迭代的仿真结果，随机噪声降低了25.5%。一个价值1280美元× 1024美元的原型CIS采用110纳米1P4M工艺制造。实验结果表明，在DDS模式下，CIS随机噪声为$147~mu text {V}_{text {rms}}$， ADC功耗为$21.07~mu $ W，而在CMS =2模式下，CIS随机噪声为$118~mu text {V}_{text {rms}}$，功耗为$28.07~mu $ W。此外，原型CIS的列FPN为0.006%。

{"title":"A Double-Data-Rate Ripple Counter With Calibration Circuits for Correlated Multiple Sampling in CMOS Image Sensors","authors":"Wanbin Zha;Jiangtao Xu;Kaiming Nie;Zhiyuan Gao","doi":"10.1109/TVLSI.2024.3449320","DOIUrl":"10.1109/TVLSI.2024.3449320","url":null,"abstract":"This brief presents a double-data-rate (DDR) ripple counter with calibration circuits for correlated multiple sampling (CMS) in CMOS image sensors (CISs). This brief analyzes a specific type of least significant bit (LSB) error that obstructs the recording of the prior LSB count result during continuous counting processes when performing digital correlated double sampling (DDS) and digital CMS. This error stems from the transparent characteristic of the LSB in DDR counter and causes the increase of random noise. A calibration circuit is presented to calibrate the LSB error, which achieves carry propagation and retains the remainder by recording the result of the LSB after each quantization. The random noise is reduced by 25.5% based on simulation results in different CMS iterations after calibration. A <inline-formula> <tex-math>$1280times 1024$ </tex-math></inline-formula> prototype CIS is fabricated in a 110-nm 1P4M process. The experimental results show that at DDS mode, the CIS random noise is <inline-formula> <tex-math>$147~mu text { V}_{text {rms}}$ </tex-math></inline-formula> and ADC power consumption is <inline-formula> <tex-math>$21.07~mu $ </tex-math></inline-formula> W, whereas at CMS =2, the noise is <inline-formula> <tex-math>$118~mu text { V}_{text {rms}}$ </tex-math></inline-formula> and power consumption is <inline-formula> <tex-math>$28.07~mu $ </tex-math></inline-formula> W. In addition, the prototype CIS has a column FPN of 0.006%.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 2","pages":"568-572"},"PeriodicalIF":2.8,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unveiling the True Power of the Latched Ring Oscillator for a Unified PUF and TRNG Architecture 揭示用于统一 PUF 和 TRNG 架构的锁相环振荡器的真正威力

IF 2.8 2区工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Pub Date : 2024-09-04 DOI: 10.1109/TVLSI.2024.3448503

Riccardo Della Sala;Davide Bellizia;Giuseppe Scotti

This work presents a novel proposal for utilizing the latched ring oscillator (LRO) as a reconfigurable entropy source, outperforming the existing literature on both physical unclonable functions (PUFs) and true random number generators (TRNGs). The PUF working principle and mathematical model are proposed in this manuscript for the first time as well as its performance measured on FPGA. The proposed LRO-based PUF is

$2times $

more compact than state-of-the-art PUFs on FPGA. The LRO TRNG architecture has been revisited, and an XOR-tree-based postprocessing technique has been introduced to increase the throughput from 0.76 up to 800 Mbit/s, paving the way for a novel class of high-throughput reconfigurable entropy sources. The results of NIST tests carried out also under supply voltage and temperature variations have demonstrated robust key extraction and secure random number generation for different applications. This comprehensive proposal aims to advance the state of the art in compact and high-throughput entropy sources, catering to the increasing demands of modern cryptographic hardware.

这项工作提出了一种利用锁存环振荡器（LRO）作为可重构熵源的新建议，优于现有的物理不可克隆函数（puf）和真随机数发生器（trng）。本文首次提出了PUF的工作原理和数学模型，并在FPGA上进行了性能测试。提出的基于lro的PUF比FPGA上最先进的PUF紧凑2倍。重新审视了LRO TRNG架构，并引入了基于xor树的后处理技术，将吞吐量从0.76提高到800mbit /s，为新型高吞吐量可重构熵源铺平了道路。在电源电压和温度变化下进行的NIST测试结果表明，在不同的应用中，密钥提取和安全随机数生成具有鲁棒性。这个全面的提案旨在推进紧凑和高吞吐量熵源的最新技术，以满足现代加密硬件日益增长的需求。

{"title":"Unveiling the True Power of the Latched Ring Oscillator for a Unified PUF and TRNG Architecture","authors":"Riccardo Della Sala;Davide Bellizia;Giuseppe Scotti","doi":"10.1109/TVLSI.2024.3448503","DOIUrl":"10.1109/TVLSI.2024.3448503","url":null,"abstract":"This work presents a novel proposal for utilizing the latched ring oscillator (LRO) as a reconfigurable entropy source, outperforming the existing literature on both physical unclonable functions (PUFs) and true random number generators (TRNGs). The PUF working principle and mathematical model are proposed in this manuscript for the first time as well as its performance measured on FPGA. The proposed LRO-based PUF is \u0000<inline-formula> <tex-math>$2times $ </tex-math></inline-formula>\u0000 more compact than state-of-the-art PUFs on FPGA. The LRO TRNG architecture has been revisited, and an XOR-tree-based postprocessing technique has been introduced to increase the throughput from 0.76 up to 800 Mbit/s, paving the way for a novel class of high-throughput reconfigurable entropy sources. The results of NIST tests carried out also under supply voltage and temperature variations have demonstrated robust key extraction and secure random number generation for different applications. This comprehensive proposal aims to advance the state of the art in compact and high-throughput entropy sources, catering to the increasing demands of modern cryptographic hardware.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 12","pages":"2403-2407"},"PeriodicalIF":2.8,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ReAdapt-II: Energy-Quality Optimizations for VLSI Adaptive Filters Through Automatic Reconfiguration and Built-In Iterative Dividers ReAdapt-II：通过自动重新配置和内置迭代除法器优化 VLSI 自适应滤波器的能耗质量

IF 2.8 2区工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Pub Date : 2024-09-04 DOI: 10.1109/TVLSI.2024.3446235

Pedro T. L. Pereira;Patrícia Ucker L. Costa;Eduardo da Costa;Paulo Flores;Sergio Bampi

Adaptive filters using least mean square (LMS) algorithms offer high precision, low complexity, and fast convergence, but choosing the correct algorithm can be difficult and time-consuming. In this brief, we present ReAdapt-II, a VLSI circuit that enhances energy efficiency in adaptive filters through automatic reconfiguration and built-in iterative dividers, optimizing the energy-quality (EQ) tradeoff. This design features a self-selecting, reconfigurable hardware system with four adaptive algorithms, integrating iterative-based dividers and reusing arithmetic operators. Our results show a minimum energy consumption reduction of 39.75%, a 66.61% reduction in the circuit area, and a maximum accuracy increase of 17.07% compared with the previous ReAdapt architecture.

使用最小均方（LMS）算法的自适应滤波器具有高精度、低复杂度和快速收敛性，但选择正确的算法可能很困难且耗时。在本文中，我们介绍了ReAdapt-II，这是一种VLSI电路，通过自动重新配置和内置迭代分频器来提高自适应滤波器的能效，优化了能量质量（EQ）权衡。本设计具有自选择、可重构的硬件系统，具有四种自适应算法，集成了基于迭代的除法和可重用的算术运算符。结果表明，与之前的ReAdapt架构相比，该架构最小能耗降低了39.75%，电路面积减少了66.61%，最大精度提高了17.07%。

引用次数: 0

A Second-Order Noise Shaping SAR ADC With Parallel Multiresidual Integrator 带并行多冗余积分器的二阶噪声整形 SAR ADC

IF 2.8 2区工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Pub Date : 2024-08-30 DOI: 10.1109/TVLSI.2024.3447740

Yang Zhou;Wenjie Wang;Longbin Zhu;Zhengtao Zhu;Risheng Su;Jianan Zheng;Siyuan Xie;Jihong Li;Fanyi Meng;Zhijun Zhou;Keping Wang

This brief proposes a parallel multiresidual (PMR) integrator to enhance the noise-shaping (NS) effect for successive approximation register (SAR) analog-to-digital converter (ADC). The PMR employs passive integrators in parallel to simultaneously integrate the average result of the multiple sequential residual voltages. The proposed PMR technique provides an alternative scheme to enhance the NS rather than increasing the order of the integrator to suppress the instability and power. A prototype 7-bit second-order NS-SAR ADC is designed and simulated in a 130-nm CMOS process. PMR increases the effective number of bits (ENOBs) to 10.6 bit, which enhances the NS effect of 3.6 bit. It achieves a peak signal-to-noise and distortion ratio (SNDR) of 65.84 dB over a bandwidth of 1.3 kHz at the oversampling ratio (OSR) of 16.

本文提出了一种并行多残差（PMR）积分器，以增强逐次逼近寄存器（SAR）模数转换器（ADC）的噪声整形（NS）效果。PMR 采用并联无源积分器，同时对多个连续残余电压的平均结果进行积分。所提出的 PMR 技术提供了另一种增强 NS 的方案，而不是增加积分器的阶数来抑制不稳定性和功耗。在 130 纳米 CMOS 工艺中设计并模拟了一个 7 位二阶 NS-SAR ADC 原型。PMR 将有效位数 (ENOB) 提高到 10.6 位，从而增强了 3.6 位的 NS 效果。在过采样率（OSR）为 16 的 1.3 kHz 带宽上，它实现了 65.84 dB 的峰值信噪比和失真比（SNDR）。

引用次数: 0

A 0.05–1.5-GHz PVT-Insensitive Digital-to-Time Converter for QKD Applications 用于 QKD 应用的 0.05-1.5 GHz PVT 不敏感数时转换器

IF 2.8 2区工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Pub Date : 2024-08-30 DOI: 10.1109/TVLSI.2024.3447111

Haiyue Yan;Yan Ye;Wenjia Li;Xuefei Bai

This work introduces a dual-channel digital-to-time converter (DTC) featuring a broad tuning range, which utilizes a dual delay-locked loop (DLL) architecture to achieve clock or data deskewing and precise timing adjustment effectively. The coarse- and fine-tuning mechanisms are operated in precise closed-loop schemes to lessen the effects of the ambient variations. The replica fine voltage-controlled delay line can provide subgate resolution and instantaneous switching capability. Then, the replica coarse voltage-controlled delay line can provide a wide dynamic delay range. The proposed DTC can generate variable delays for an arbitrary pseudorandom data rate of up to 3 Gb/s and is insensitive to process and temperature variation. The test chip, fabricated in a 55-nm CMOS process, operates from 0.05 to 1.5 GHz and achieves a timing resolution of 9.77 ps, a power consumption of 12 mW, and an area of 0.76 mm2. The measured maximum integral nonlinearity (INL) is 2.20 LSB in an extended delay mode. In the dual delay mode, the maximum INL of channels 0 and 1 is 1.60 and −1.08 LSB, respectively.

本工作介绍了一种具有宽调谐范围的双通道数字时间转换器（DTC），它利用双延迟锁定环（DLL）架构有效地实现时钟或数据去偏和精确定时调整。粗调和微调机构在精确的闭环方案中运行，以减少环境变化的影响。该复刻微细压控延迟线具有分门分辨率和瞬时切换能力。因此，复制粗压控延迟线可以提供较宽的动态延迟范围。所提出的DTC可以为高达3gb /s的任意伪随机数据速率产生可变延迟，并且对工艺和温度变化不敏感。该测试芯片采用55纳米CMOS工艺制造，工作频率为0.05至1.5 GHz，时序分辨率为9.77 ps，功耗为12 mW，面积为0.76 mm2。在扩展延迟模式下测量到的最大积分非线性（INL）为2.20 LSB。在双延时模式下，通道0和通道1的最大INL分别为1.60和−1.08 LSB。

引用次数: 0

Power-Efficient Analog Hardware Architecture of the Learning Vector Quantization Algorithm for Brain Tumor Classification 用于脑肿瘤分类的学习矢量量化算法的高能效模拟硬件架构

IF 2.8 2区工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Pub Date : 2024-08-30 DOI: 10.1109/TVLSI.2024.3447903

Vassilis Alimisis;Emmanouil Anastasios Serlis;Andreas Papathanasiou;Nikolaos P. Eleftheriou;Paul P. Sotiriadis

This study introduces a design methodology pertaining to analog hardware architecture for the implementation of the learning vector quantization (LVQ) algorithm. It consists of three main approaches that are separated based on the distance calculation circuit (DCC) and, more specifically; Euclidean distance, Sigmoid function, and Squarer circuits. The main building blocks of each approach are the DCC and the current comparator (CC). The operational principles of the architecture are extensively elucidated and put into practice through a power-efficient configuration (operating less than 650 nW) within a low-voltage setup (0.6 V). Each specific implementation is tested on a brain tumor classification task achieving more than 96.00% classification accuracy. The designs are realized using a 90-nm CMOS process and developed utilizing the Cadence IC Suite for both schematic and physical design. Through a comparative analysis of postlayout simulation outcomes with an equivalent software-based classifier and related works, the accuracy of the applied modeling and design methodologies is validated.

本研究介绍了实现学习矢量量化（LVQ）算法的模拟硬件架构设计方法。它包括基于距离计算电路 (DCC) 的三种主要方法，更具体地说，包括欧氏距离、西格莫函数和 Squarer 电路。每种方法的主要构件是 DCC 和电流比较器 (CC)。通过低电压设置（0.6 V）中的高能效配置（运行功耗小于 650 nW），该架构的运行原理得到了广泛阐释并付诸实践。每个具体实现都在脑肿瘤分类任务中进行了测试，分类准确率超过 96.00%。这些设计采用 90 纳米 CMOS 工艺实现，并利用 Cadence IC Suite 进行原理图和物理设计。通过将布局后仿真结果与基于软件的等效分类器和相关作品进行比较分析，验证了应用建模和设计方法的准确性。

{"title":"Power-Efficient Analog Hardware Architecture of the Learning Vector Quantization Algorithm for Brain Tumor Classification","authors":"Vassilis Alimisis;Emmanouil Anastasios Serlis;Andreas Papathanasiou;Nikolaos P. Eleftheriou;Paul P. Sotiriadis","doi":"10.1109/TVLSI.2024.3447903","DOIUrl":"10.1109/TVLSI.2024.3447903","url":null,"abstract":"This study introduces a design methodology pertaining to analog hardware architecture for the implementation of the learning vector quantization (LVQ) algorithm. It consists of three main approaches that are separated based on the distance calculation circuit (DCC) and, more specifically; Euclidean distance, Sigmoid function, and Squarer circuits. The main building blocks of each approach are the DCC and the current comparator (CC). The operational principles of the architecture are extensively elucidated and put into practice through a power-efficient configuration (operating less than 650 nW) within a low-voltage setup (0.6 V). Each specific implementation is tested on a brain tumor classification task achieving more than 96.00% classification accuracy. The designs are realized using a 90-nm CMOS process and developed utilizing the Cadence IC Suite for both schematic and physical design. Through a comparative analysis of postlayout simulation outcomes with an equivalent software-based classifier and related works, the accuracy of the applied modeling and design methodologies is validated.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 11","pages":"1969-1982"},"PeriodicalIF":2.8,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

HPR-Mul: An Area and Energy-Efficient High-Precision Redundancy Multiplier by Approximate Computing HPR-Mul：通过近似计算实现面积和能效的高精度冗余乘法器

IF 2.8 2区工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Pub Date : 2024-08-29 DOI: 10.1109/TVLSI.2024.3445108

Jafar Vafaei;Omid Akbari

For critical applications that require a higher level of reliability, the triple modular redundancy (TMR) scheme is usually employed to implement fault-tolerant arithmetic units. However, this method imposes a significant area and power/energy overhead. Also, the majority-based voter in the typical TMR designs is highly sensitive to soft errors and the design diversity of the triplicated module, which may result in an error for a small difference between the output of the TMR modules. However, a wide range of applications deployed in critical systems are inherently error-resilient, that is, they can tolerate some inexact results at their output while having a given level of reliability. In this article, we propose a high precision redundancy multiplier (HPR-Mul) that relies on the principles of approximate computing to achieve higher energy efficiency and lower area, as well as resolve the aforementioned challenges of the typical TMR schemes, while retaining the required level of reliability. The HPR-Mul is composed of full precision (FP) and two reduced precision (RP) multipliers, along with a simple voter to determine the output. Unlike the state-of-the-art RP redundancy multipliers (RPR-Muls) that require a complex voter, the voter of the proposed HPR-Mul is designed based on mathematical formulas resulting in a simpler structure. Furthermore, we use the intermediate signals of the FP multiplier as the inputs of the RP multipliers, which significantly enhance the accuracy of the HPR-Mul. The efficiency of the proposed HPR-Mul is evaluated in a 15-nm FinFET technology, where the results show up to 70% and 69% lower power consumption and area, respectively, compared to the typical TMR-based multipliers. Also, the HPR-Mul outperforms the state-of-the-art RPR-Mul by achieving up to 84% higher soft error tolerance. Moreover, by employing the HPR-Mul in different image processing applications, up to 13% higher output image quality is achieved in comparison with the state-of-the-art RPR multipliers.

对于需要更高可靠性的关键应用，通常采用三重模块冗余（TMR）方案来实现容错算术单元。然而，这种方法会带来巨大的面积和功耗/能耗开销。此外，典型的 TMR 设计中基于多数的表决器对软错误和三重模块的设计多样性高度敏感，这可能导致 TMR 模块输出之间的微小差异就会产生错误。然而，部署在关键系统中的各种应用本身都具有抗错能力，也就是说，它们可以在具有一定可靠性水平的同时，容忍输出端出现一些不精确的结果。在本文中，我们提出了一种高精度冗余乘法器（HPR-Mul），它依靠近似计算原理实现了更高的能效和更小的面积，并解决了上述典型 TMR 方案所面临的挑战，同时保持了所需的可靠性水平。HPR-Mul 由全精度（FP）乘法器和两个降低精度（RP）乘法器组成，并通过一个简单的投票器确定输出。与需要复杂投票器的最先进 RP 冗余乘法器（RPR-Muls）不同，拟议的 HPR-Mul 的投票器是根据数学公式设计的，因此结构更简单。此外，我们使用 FP 倍增器的中间信号作为 RP 倍增器的输入，这大大提高了 HPR-Mul 的精度。我们在 15 纳米 FinFET 技术中对所提出的 HPR-Mul 的效率进行了评估，结果显示，与典型的基于 TMR 的乘法器相比，功耗和面积分别降低了 70% 和 69%。此外，HPR-Mul 的软容错能力比最先进的 RPR-Mul 高出 84%。此外，在不同的图像处理应用中使用 HPR-Mul 时，输出图像质量比最先进的 RPR 乘法器高出 13%。

{"title":"HPR-Mul: An Area and Energy-Efficient High-Precision Redundancy Multiplier by Approximate Computing","authors":"Jafar Vafaei;Omid Akbari","doi":"10.1109/TVLSI.2024.3445108","DOIUrl":"10.1109/TVLSI.2024.3445108","url":null,"abstract":"For critical applications that require a higher level of reliability, the triple modular redundancy (TMR) scheme is usually employed to implement fault-tolerant arithmetic units. However, this method imposes a significant area and power/energy overhead. Also, the majority-based voter in the typical TMR designs is highly sensitive to soft errors and the design diversity of the triplicated module, which may result in an error for a small difference between the output of the TMR modules. However, a wide range of applications deployed in critical systems are inherently error-resilient, that is, they can tolerate some inexact results at their output while having a given level of reliability. In this article, we propose a high precision redundancy multiplier (HPR-Mul) that relies on the principles of approximate computing to achieve higher energy efficiency and lower area, as well as resolve the aforementioned challenges of the typical TMR schemes, while retaining the required level of reliability. The HPR-Mul is composed of full precision (FP) and two reduced precision (RP) multipliers, along with a simple voter to determine the output. Unlike the state-of-the-art RP redundancy multipliers (RPR-Muls) that require a complex voter, the voter of the proposed HPR-Mul is designed based on mathematical formulas resulting in a simpler structure. Furthermore, we use the intermediate signals of the FP multiplier as the inputs of the RP multipliers, which significantly enhance the accuracy of the HPR-Mul. The efficiency of the proposed HPR-Mul is evaluated in a 15-nm FinFET technology, where the results show up to 70% and 69% lower power consumption and area, respectively, compared to the typical TMR-based multipliers. Also, the HPR-Mul outperforms the state-of-the-art RPR-Mul by achieving up to 84% higher soft error tolerance. Moreover, by employing the HPR-Mul in different image processing applications, up to 13% higher output image quality is achieved in comparison with the state-of-the-art RPR multipliers.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 11","pages":"2012-2022"},"PeriodicalIF":2.8,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A CMOS Readout Circuit for Resistive Tactile Sensor Array Using Crosstalk Suppression and Nonuniformity Compensation Techniques 采用串音抑制和不均匀性补偿技术的电阻式触觉传感器阵列 CMOS 读出电路

IF 2.8 2区工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Pub Date : 2024-08-27 DOI: 10.1109/TVLSI.2024.3447164

Yao Li;Junfeng Geng;Mao Ye;Jiaji He;Xiaoxiao Zheng;Qiuwei Wang;Yiqiang Zhao

This article presents a novel readout circuit for the resistive tactile sensor array. Based on the 2-D scanning mechanism, a crosstalk suppression technique is proposed by combining the correlated double sampling (CDS) and zero potential method (ZPM). The output of the same sensor under different bias conditions is captured twice and amplified by a channel-parallel fully differential gain stage, performing analogous subtraction. To achieve nonuniformity compensation, the current injected into the readout channel is adjusted by the channel-parallel digital-to-analog converter (DAC). A successive approximation register (SAR) analog-to-digital converter (ADC) performs quantization, and the chip can be used as a serial peripheral interface (SPI) slave to update register values for gain configuration, power consumption control, and nonuniformity compensation. The 180-nm CMOS prototype chip occupies an area of

$4.8~text {mm}^{2}$

and consumes

$285~mu $

W. In order to validate the design, a tactile sensing system is built, using the readout circuit along with a

$10times 10$

flexible sensor array. With the techniques proposed in this article, the readout error of the sensors in array is less than 0.3‰.

本文提出了一种用于电阻式触觉传感器阵列的新型读出电路。基于二维扫描机制，提出了一种将相关双采样（CDS）与零电位法（ZPM）相结合的串扰抑制技术。同一传感器在不同偏置条件下的输出被捕获两次，并通过通道并行的全差分增益级放大，执行类似的减法。为了实现非均匀性补偿，注入读出通道的电流由通道并行数模转换器（DAC）调节。一个连续逼近寄存器（SAR）模数转换器（ADC）执行量化，芯片可以作为串行外设接口（SPI）从站来更新寄存器值，用于增益配置、功耗控制和非均匀性补偿。180nm CMOS原型芯片占地$4.8~text {mm}^{2}$，功耗$285~mu $ W.为了验证设计，我们构建了一个触觉传感系统，使用读出电路和$10 × 10$的柔性传感器阵列。采用本文提出的技术，阵列传感器的读出误差小于0.3‰。

{"title":"A CMOS Readout Circuit for Resistive Tactile Sensor Array Using Crosstalk Suppression and Nonuniformity Compensation Techniques","authors":"Yao Li;Junfeng Geng;Mao Ye;Jiaji He;Xiaoxiao Zheng;Qiuwei Wang;Yiqiang Zhao","doi":"10.1109/TVLSI.2024.3447164","DOIUrl":"10.1109/TVLSI.2024.3447164","url":null,"abstract":"This article presents a novel readout circuit for the resistive tactile sensor array. Based on the 2-D scanning mechanism, a crosstalk suppression technique is proposed by combining the correlated double sampling (CDS) and zero potential method (ZPM). The output of the same sensor under different bias conditions is captured twice and amplified by a channel-parallel fully differential gain stage, performing analogous subtraction. To achieve nonuniformity compensation, the current injected into the readout channel is adjusted by the channel-parallel digital-to-analog converter (DAC). A successive approximation register (SAR) analog-to-digital converter (ADC) performs quantization, and the chip can be used as a serial peripheral interface (SPI) slave to update register values for gain configuration, power consumption control, and nonuniformity compensation. The 180-nm CMOS prototype chip occupies an area of \u0000<inline-formula> <tex-math>$4.8~text {mm}^{2}$ </tex-math></inline-formula>\u0000 and consumes \u0000<inline-formula> <tex-math>$285~mu $ </tex-math></inline-formula>\u0000W. In order to validate the design, a tactile sensing system is built, using the readout circuit along with a \u0000<inline-formula> <tex-math>$10times 10$ </tex-math></inline-formula>\u0000 flexible sensor array. With the techniques proposed in this article, the readout error of the sensors in array is less than 0.3‰.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 12","pages":"2368-2376"},"PeriodicalIF":2.8,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Spread Spectrum-Based Countermeasures for Cryptographic RISC-V SoC 基于扩频的密码 RISC-V SoC 对策

IF 2.8 2区工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Pub Date : 2024-08-27 DOI: 10.1109/TVLSI.2024.3444851

Thai-Ha Tran;Ba-Anh Dao;Duc-Hung Le;Van-Phuc Hoang;Trong-Thuc Hoang;Cong-Kha Pham

Side-channel analysis attacks have become the primary method for exploiting the vulnerabilities of cryptographic devices. Therefore, focusing on countermeasures to enhance the security level of these implementations evolves even more urgently. This article proposes a time-based hiding countermeasure by using spread-spectrum signals. In our RISC-V system on chip (SoC), cryptographic accelerators are given by random dynamic frequency-hopping signals. We found 223 available parameter sets for a Xilinx Mixed-Mode Clock Manage primitive in spread spectrum mode and achieved better effectiveness in the occupied bandwidth (OBW) metric. The mixed mode clock managers (MMCMs) output signal and the range of frequencies within the spread will be changed randomly, resulting in multiple clocks for individual encryption. The effectiveness of this proposal is demonstrated by conducting realistic side-channel attacks (SCAs) and state-of-the-art leakage assessment methodologies on the well-known data encryption standard, i.e., the Advanced Encryption Standard (AES) accelerator. Even though we used up to five million power traces, the test results show that our defense can stand up to a regular correlation power analysis (CPA) attack as well as alignment preprocessing methods, like CPA attacks that use a sliding window or an amplitude peak location algorithm. Furthermore, the t-test methodology cannot detect any first-order information leakage in five million traces; meanwhile, the deep learning leakage assessment (DLLA) requires nearly one million power traces in the training test to detect leakage points.

侧信道分析攻击已成为利用加密设备漏洞的主要方法。因此，关注提高这些实现的安全级别的对策变得更加紧迫。本文提出了一种利用扩频信号的基于时间的隐藏对抗方法。在我们的RISC-V片上系统（SoC）中，密码加速器是由随机动态跳频信号给出的。我们在扩频模式下为Xilinx混合模式时钟管理原语找到了223个可用的参数集，并在占用带宽（OBW）度量方面取得了更好的效果。混合模式时钟管理器（mmcm）的输出信号和频率范围将随机改变，从而导致多个时钟用于单个加密。通过对众所周知的数据加密标准，即高级加密标准（AES）加速器进行实际的侧信道攻击（sca）和最先进的泄漏评估方法，证明了该建议的有效性。尽管我们使用了多达500万条功率走线，但测试结果表明，我们的防御可以承受常规的相关功率分析（CPA）攻击以及对齐预处理方法，如使用滑动窗口或幅度峰值定位算法的CPA攻击。此外，t检验方法无法在500万条轨迹中检测到任何一阶信息泄漏；同时，深度学习泄漏评估（DLLA）需要在训练测试中使用近100万条电源走线来检测泄漏点。

{"title":"Spread Spectrum-Based Countermeasures for Cryptographic RISC-V SoC","authors":"Thai-Ha Tran;Ba-Anh Dao;Duc-Hung Le;Van-Phuc Hoang;Trong-Thuc Hoang;Cong-Kha Pham","doi":"10.1109/TVLSI.2024.3444851","DOIUrl":"10.1109/TVLSI.2024.3444851","url":null,"abstract":"Side-channel analysis attacks have become the primary method for exploiting the vulnerabilities of cryptographic devices. Therefore, focusing on countermeasures to enhance the security level of these implementations evolves even more urgently. This article proposes a time-based hiding countermeasure by using spread-spectrum signals. In our RISC-V system on chip (SoC), cryptographic accelerators are given by random dynamic frequency-hopping signals. We found 223 available parameter sets for a Xilinx Mixed-Mode Clock Manage primitive in spread spectrum mode and achieved better effectiveness in the occupied bandwidth (OBW) metric. The mixed mode clock managers (MMCMs) output signal and the range of frequencies within the spread will be changed randomly, resulting in multiple clocks for individual encryption. The effectiveness of this proposal is demonstrated by conducting realistic side-channel attacks (SCAs) and state-of-the-art leakage assessment methodologies on the well-known data encryption standard, i.e., the Advanced Encryption Standard (AES) accelerator. Even though we used up to five million power traces, the test results show that our defense can stand up to a regular correlation power analysis (CPA) attack as well as alignment preprocessing methods, like CPA attacks that use a sliding window or an amplitude peak location algorithm. Furthermore, the t-test methodology cannot detect any first-order information leakage in five million traces; meanwhile, the deep learning leakage assessment (DLLA) requires nearly one million power traces in the training test to detect leakage points.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 12","pages":"2341-2354"},"PeriodicalIF":2.8,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Detect and Replace: Efficient Soft Error Protection of FPGA-Based CNN Accelerators 检测和替换：基于 FPGA 的 CNN 加速器的高效软错误保护

IF 2.8 2区工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Pub Date : 2024-08-26 DOI: 10.1109/TVLSI.2024.3443834

Zhen Gao;Yanmao Qi;Jinchang Shi;Qiang Liu;Guangjun Ge;Yu Wang;Pedro Reviriego

Convolutional neural networks (CNNs) are widely used in computer vision and natural language processing. Field-programmable gate arrays (FPGAs) are a popular accelerator for CNNs. However, FPGAs are prone to suffer soft errors, so the reliability of FPGA-based CNNs becomes a key problem when used in safety-critical applications. The convolution module based on a processing element (PE) array is the most complex part of the accelerator, so it is the key to efficient protection. Coding-based schemes have been proposed for efficient protection of the convolution module, where the processing of the PE array is modeled as parallel matrix-vector multiplications (MVMs), and every wrong output would be concurrently detected and corrected. However, these schemes cannot deal with errors in the configuration memory that affects many intermediate results. In this article, a protection scheme is proposed based on faulty PE detection and replace (DR) to deal with such configuration memory errors. The DR scheme is implemented on a CNN accelerator based on Xilinx Zynq 7000 SoC, and fault injection (FI) experiments are performed to evaluate the performance of the proposed DR scheme. The results show that it can effectively mitigate the effect of soft errors in the configuration memory with an overhead of about 1.3 times complexity and 1.4 times power consumption relative to those of the unprotected PE array. Compared with the advanced checksum-of-checksum (CoC) scheme, the DR scheme decreases power consumption by up to 30%.

卷积神经网络（cnn）广泛应用于计算机视觉和自然语言处理。现场可编程门阵列（fpga）是一种流行的cnn加速器。然而，fpga容易出现软错误，因此基于fpga的cnn在应用于安全关键应用时的可靠性成为一个关键问题。基于处理单元阵列的卷积模块是加速器中最复杂的部分，是实现高效保护的关键。为了有效地保护卷积模块，提出了基于编码的方案，其中PE阵列的处理建模为并行矩阵向量乘法（MVMs），并且每个错误输出都可以并发检测和纠正。然而，这些模式不能处理配置内存中的错误，这些错误会影响许多中间结果。本文提出了一种基于故障PE检测和替换（DR）的配置内存错误保护方案。在基于Xilinx Zynq 7000 SoC的CNN加速器上实现了该方案，并进行了故障注入（FI）实验来评估该方案的性能。结果表明，与未保护的PE阵列相比，它可以有效地减轻配置内存中的软错误的影响，其复杂性开销约为1.3倍，功耗约为1.4倍。与高级CoC （checksum of checksum）方案相比，容灾方案的功耗可降低30%。

{"title":"Detect and Replace: Efficient Soft Error Protection of FPGA-Based CNN Accelerators","authors":"Zhen Gao;Yanmao Qi;Jinchang Shi;Qiang Liu;Guangjun Ge;Yu Wang;Pedro Reviriego","doi":"10.1109/TVLSI.2024.3443834","DOIUrl":"10.1109/TVLSI.2024.3443834","url":null,"abstract":"Convolutional neural networks (CNNs) are widely used in computer vision and natural language processing. Field-programmable gate arrays (FPGAs) are a popular accelerator for CNNs. However, FPGAs are prone to suffer soft errors, so the reliability of FPGA-based CNNs becomes a key problem when used in safety-critical applications. The convolution module based on a processing element (PE) array is the most complex part of the accelerator, so it is the key to efficient protection. Coding-based schemes have been proposed for efficient protection of the convolution module, where the processing of the PE array is modeled as parallel matrix-vector multiplications (MVMs), and every wrong output would be concurrently detected and corrected. However, these schemes cannot deal with errors in the configuration memory that affects many intermediate results. In this article, a protection scheme is proposed based on faulty PE detection and replace (DR) to deal with such configuration memory errors. The DR scheme is implemented on a CNN accelerator based on Xilinx Zynq 7000 SoC, and fault injection (FI) experiments are performed to evaluate the performance of the proposed DR scheme. The results show that it can effectively mitigate the effect of soft errors in the configuration memory with an overhead of about 1.3 times complexity and 1.4 times power consumption relative to those of the unprotected PE array. Compared with the advanced checksum-of-checksum (CoC) scheme, the DR scheme decreases power consumption by up to 30%.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 1","pages":"66-74"},"PeriodicalIF":2.8,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0