首页 > 最新文献

IEEE Transactions on Very Large Scale Integration (VLSI) Systems最新文献

英文 中文
Toward High-Performance Network Coding: FPGA Acceleration With Bounded-Value Generators 迈向高性能网络编码:FPGA加速与有界值生成器
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-06-02 DOI: 10.1109/TVLSI.2025.3572517
Jiaxin Qing;Philip H. W. Leong;Kin-Hong Lee;Raymond W. Yeung
The network coding enhances performance in network communications and distributed storage by increasing throughput and robustness while reducing latency. Batched sparse (BATS) codes are a class of capacity-achieving network codes, but their practical applications are hindered by their structure, computational intensity, and power demands of finite field (FF) operations. Most literature focuses on algorithmic-level techniques to improve the coding efficiency. Optimization with an algorithm/hardware co-designing approach has long been neglected. Leveraging the unique structure of BATS codes, we first present cyclic-shift BATS (CS-BATS), a hardware-friendly variant. Next, we propose a simple but effective bounded-value (BV) generator, to reduce the size of a finite field multiplier by up to 70%. Finally, we report on a scalable and resource-efficient field-programmable gate array (FPGA)-based network coding accelerator that achieves a throughput of 27 Gb/s, a speedup of more than 300 over software.
网络编码通过增加吞吐量和鲁棒性来提高网络通信和分布式存储的性能,同时减少延迟。批处理稀疏码(Batched sparse code, BATS)是一类容量实现型网络码,但其结构、计算强度和有限域(finite field, FF)运算的功率需求等限制了其实际应用。大多数文献关注于算法级技术来提高编码效率。长期以来,算法/硬件协同设计方法的优化一直被忽视。利用BATS代码的独特结构,我们首先提出了一种硬件友好型的循环移位BATS (CS-BATS)。接下来,我们提出了一个简单但有效的有界值(BV)生成器,以减少有限域乘法器的大小高达70%。最后,我们报告了一种可扩展且资源高效的基于现场可编程门阵列(FPGA)的网络编码加速器,其吞吐量达到27 Gb/s,比软件加速300以上。
{"title":"Toward High-Performance Network Coding: FPGA Acceleration With Bounded-Value Generators","authors":"Jiaxin Qing;Philip H. W. Leong;Kin-Hong Lee;Raymond W. Yeung","doi":"10.1109/TVLSI.2025.3572517","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3572517","url":null,"abstract":"The network coding enhances performance in network communications and distributed storage by increasing throughput and robustness while reducing latency. Batched sparse (BATS) codes are a class of capacity-achieving network codes, but their practical applications are hindered by their structure, computational intensity, and power demands of finite field (FF) operations. Most literature focuses on algorithmic-level techniques to improve the coding efficiency. Optimization with an algorithm/hardware co-designing approach has long been neglected. Leveraging the unique structure of BATS codes, we first present cyclic-shift BATS (CS-BATS), a hardware-friendly variant. Next, we propose a simple but effective bounded-value (BV) generator, to reduce the size of a finite field multiplier by up to 70%. Finally, we report on a scalable and resource-efficient field-programmable gate array (FPGA)-based network coding accelerator that achieves a throughput of 27 Gb/s, a speedup of more than 300 over software.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 8","pages":"2274-2287"},"PeriodicalIF":2.8,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144702124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Compact Power-on-Reset Circuit With Configurable Brown-Out Detection 一个紧凑的电源上电复位电路与可配置的停电检测
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-04-30 DOI: 10.1109/TVLSI.2025.3561131
Yoochang Kim;Jun-Eun Park;Kwanseo Park;Young-Ha Hwang
A compact power-on-reset (POR) circuit with a configurable brown-out reset (BOR) function is presented. An integrated voltage reference (VR) circuit provides a constant bias voltage that facilitates voltage-triggered POR/BOR operation, reliably preventing POR signal generation when the ramping supply voltage ( ${V} _{text {DD}}$ ) level is too low. Moreover, the proposed POR circuit features a fast, configurable POR/BOR operation owing to an inverter-based trip point detector (TPD), which triggers the reset signal with a programmable trip point. The prototype POR circuit achieves a POR level higher than 752 mV with a maximum POR delay of $16.4~mu $ s at a 0.8–1.2-V ${V} _{text {DD}}$ , supporting a wide range of supply ramping time from $1~mu $ s to 1 s. In addition, the prototype detects brown-out events with a supply drop of 0.1–0.4 V, generating the BOR signal. Designed using a 28-nm CMOS process, the prototype has a compact active area of $995.3~mu $ m2 and a quiescent current of 162–974 nA at a 1-V ${V} _{text {DD}}$ .
提出了一种具有可配置断电复位功能的紧凑型上电复位(POR)电路。集成的基准电压(VR)电路提供恒定的偏置电压,促进电压触发的POR/BOR操作,当斜坡电源电压(${V} _{text {DD}}$)水平过低时可靠地防止POR信号的产生。此外,所提出的POR电路具有快速,可配置的POR/BOR操作,由于基于逆变器的跳闸点检测器(TPD),它触发复位信号与可编程的跳闸点。原型POR电路在0.8 - 1.2 V {V} _{text {DD}}$下实现了高于752 mV的POR电平,最大POR延迟为16.4~mu $ s,支持从$1~mu $ s到1 s的宽范围供电斜坡时间。此外,该原型检测到电源下降0.1-0.4 V的断电事件,产生BOR信号。该样机采用28纳米CMOS工艺设计,有效面积为995.3~mu $ m2,静态电流为162-974 nA,电压为1 V ${V} _{text {DD}}。
{"title":"A Compact Power-on-Reset Circuit With Configurable Brown-Out Detection","authors":"Yoochang Kim;Jun-Eun Park;Kwanseo Park;Young-Ha Hwang","doi":"10.1109/TVLSI.2025.3561131","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3561131","url":null,"abstract":"A compact power-on-reset (POR) circuit with a configurable brown-out reset (BOR) function is presented. An integrated voltage reference (VR) circuit provides a constant bias voltage that facilitates voltage-triggered POR/BOR operation, reliably preventing POR signal generation when the ramping supply voltage (<inline-formula> <tex-math>${V} _{text {DD}}$ </tex-math></inline-formula>) level is too low. Moreover, the proposed POR circuit features a fast, configurable POR/BOR operation owing to an inverter-based trip point detector (TPD), which triggers the reset signal with a programmable trip point. The prototype POR circuit achieves a POR level higher than 752 mV with a maximum POR delay of <inline-formula> <tex-math>$16.4~mu $ </tex-math></inline-formula>s at a 0.8–1.2-V <inline-formula> <tex-math>${V} _{text {DD}}$ </tex-math></inline-formula>, supporting a wide range of supply ramping time from <inline-formula> <tex-math>$1~mu $ </tex-math></inline-formula>s to 1 s. In addition, the prototype detects brown-out events with a supply drop of 0.1–0.4 V, generating the BOR signal. Designed using a 28-nm CMOS process, the prototype has a compact active area of <inline-formula> <tex-math>$995.3~mu $ </tex-math></inline-formula>m<sup>2</sup> and a quiescent current of 162–974 nA at a 1-V <inline-formula> <tex-math>${V} _{text {DD}}$ </tex-math></inline-formula>.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 7","pages":"2074-2078"},"PeriodicalIF":2.8,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Partial Recomputation-Based Fault Detection Approaches for Z-transform 基于局部重计算的z变换故障检测方法
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-04-30 DOI: 10.1109/TVLSI.2025.3560154
Saeed Aghapour;Kasra Ahmadi;Mehran Mozaffari Kermani;Reza Azarderakhsh
The Z-transform is a fundamental and strong tool being widely utilized in signal processing and various other applications such as communications and networking. By analyzing the Z-transform of a signal, one can extract critical information about its stability, causality, frequency response, energy and power, and overall behavior of the signal. However, errors caused either by environmental changes or malicious injections in large-scale integration (VLSI) implementations can critically compromise the integrity and reliability of its output. Failure to detect such faults may result in unpredictable, erroneous, and misleading function analyses. Therefore, the ability to detect soft errors and faults before accepting the results is of paramount importance. In this article, we propose an efficient fault detection method that combines algorithmic-level checks with partial recomputation to identify both transient and permanent faults with a high error coverage rate across various injection scenarios. The AMD/Xilinx field-programmable gate array (FPGA) implementation of our design demonstrated only a modest increase in time and area overhead. To the best of our knowledge, fault detection for the Z-transform function has not been previously studied.
z变换是一个基本和强大的工具,被广泛应用于信号处理和各种其他应用,如通信和网络。通过分析信号的z变换,可以提取有关其稳定性、因果关系、频率响应、能量和功率以及信号总体行为的关键信息。然而,在大规模集成(VLSI)实现中,由环境变化或恶意注入引起的错误可能严重损害其输出的完整性和可靠性。如果不能检测到这些故障,可能会导致不可预测的、错误的和误导性的功能分析。因此,在接受结果之前检测软错误和故障的能力至关重要。在本文中,我们提出了一种有效的故障检测方法,该方法将算法级检查与部分重新计算相结合,以识别瞬态和永久故障,并在各种注入场景中具有较高的错误覆盖率。我们设计的AMD/Xilinx现场可编程门阵列(FPGA)实现仅显示了时间和面积开销的适度增加。据我们所知,以前还没有研究过z变换函数的故障检测。
{"title":"Efficient Partial Recomputation-Based Fault Detection Approaches for Z-transform","authors":"Saeed Aghapour;Kasra Ahmadi;Mehran Mozaffari Kermani;Reza Azarderakhsh","doi":"10.1109/TVLSI.2025.3560154","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3560154","url":null,"abstract":"The Z-transform is a fundamental and strong tool being widely utilized in signal processing and various other applications such as communications and networking. By analyzing the Z-transform of a signal, one can extract critical information about its stability, causality, frequency response, energy and power, and overall behavior of the signal. However, errors caused either by environmental changes or malicious injections in large-scale integration (VLSI) implementations can critically compromise the integrity and reliability of its output. Failure to detect such faults may result in unpredictable, erroneous, and misleading function analyses. Therefore, the ability to detect soft errors and faults before accepting the results is of paramount importance. In this article, we propose an efficient fault detection method that combines algorithmic-level checks with partial recomputation to identify both transient and permanent faults with a high error coverage rate across various injection scenarios. The AMD/Xilinx field-programmable gate array (FPGA) implementation of our design demonstrated only a modest increase in time and area overhead. To the best of our knowledge, fault detection for the Z-transform function has not been previously studied.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 7","pages":"1983-1993"},"PeriodicalIF":2.8,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144581497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Universal Sequential Authentication Scheme for TAPC-Based Test Standards 基于tapc测试标准的通用顺序认证方案
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-04-29 DOI: 10.1109/TVLSI.2025.3562015
Guan-Rong Chen;Kuen-Jong Lee
Integrated circuits (ICs) have become extremely complex nowadays. Therefore, multiple test standards could be employed to handle different testing scenarios. Unfortunately, this also leads to serious security problems since attackers can exploit the excellent controllability and observability of test standards to steal confidential information or disrupt the circuit’s functionality. This article proposes a universal sequential authentication scheme that is compatible with test standards employing the test access port controller (TAPC) defined in IEEE Std 1149.1. The main objective is to protect multiple TAPC-based test standards with a universal security module. In this scheme, only authorized test data can be updated to the target register to control the corresponding test standard, and only the response to authorized test data can be output. The key idea is to generate different authentication keys for different test data, and even with the same set of test data, if their input sequences are different, their authentication keys will also be different. Furthermore, we develop an irreversible obfuscation mechanism to generate fake output data to confuse attackers. Due to its irreversibility, the original correct output data cannot be deduced from the fake output data. Experimental results on a typical processor, i.e., SCR1, show that the proposed scheme causes no time overhead, and the area overhead is only 1.74%.
集成电路(ic)如今已经变得极其复杂。因此,可以使用多个测试标准来处理不同的测试场景。不幸的是,这也会导致严重的安全问题,因为攻击者可以利用测试标准出色的可控性和可观察性来窃取机密信息或破坏电路的功能。本文提出了一种通用的顺序认证方案,该方案采用IEEE标准1149.1中定义的测试访问端口控制器(TAPC),与测试标准兼容。主要目标是使用通用安全模块保护多个基于tapc的测试标准。在该方案中,只有授权的测试数据才能更新到目标寄存器中以控制相应的测试标准,并且只有对授权的测试数据的响应才能输出。其关键思想是为不同的测试数据生成不同的认证密钥,即使是同一组测试数据,如果它们的输入序列不同,它们的认证密钥也会不同。此外,我们开发了一种不可逆的混淆机制来生成虚假输出数据以混淆攻击者。由于其不可逆性,无法从伪输出数据中推导出原始的正确输出数据。在典型处理器SCR1上的实验结果表明,该方案不会造成时间开销,面积开销仅为1.74%。
{"title":"A Universal Sequential Authentication Scheme for TAPC-Based Test Standards","authors":"Guan-Rong Chen;Kuen-Jong Lee","doi":"10.1109/TVLSI.2025.3562015","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3562015","url":null,"abstract":"Integrated circuits (ICs) have become extremely complex nowadays. Therefore, multiple test standards could be employed to handle different testing scenarios. Unfortunately, this also leads to serious security problems since attackers can exploit the excellent controllability and observability of test standards to steal confidential information or disrupt the circuit’s functionality. This article proposes a universal sequential authentication scheme that is compatible with test standards employing the test access port controller (TAPC) defined in IEEE Std 1149.1. The main objective is to protect multiple TAPC-based test standards with a universal security module. In this scheme, only authorized test data can be updated to the target register to control the corresponding test standard, and only the response to authorized test data can be output. The key idea is to generate different authentication keys for different test data, and even with the same set of test data, if their input sequences are different, their authentication keys will also be different. Furthermore, we develop an irreversible obfuscation mechanism to generate fake output data to confuse attackers. Due to its irreversibility, the original correct output data cannot be deduced from the fake output data. Experimental results on a typical processor, i.e., SCR1, show that the proposed scheme causes no time overhead, and the area overhead is only 1.74%.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 7","pages":"1972-1982"},"PeriodicalIF":2.8,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Novel High-Throughput FFT Processor With a Block-Level Pipeline for 5G MIMO OFDM Systems 5G MIMO OFDM系统中一种具有块级管道的新型高吞吐量FFT处理器
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-04-28 DOI: 10.1109/TVLSI.2025.3558947
Meiyu Liu;Zhijun Wang;Hanqing Luo;Shengnan Lin;Liping Liang
In fifth-generation (5G) communication systems, multiple input multiple output (MIMO) and orthogonal frequency-division multiplexing (OFDM) are two critical technologies. Fast Fourier transform (FFT), as the core processing steps of OFDM, directly affects the overall system performance. In this brief, we proposed a novel block-level pipelined architecture, which divides the FFT processor into three pipeline blocks: input, radix, and output. Each pipeline block can run in a different FFT simultaneously to achieve higher throughput. Specifically, to reduce the OFDM system-level latency of 5G applications, the FFT processor supports weighted overlap and add (WOLA) on the cyclic prefix and suffix of OFDM symbols. This architecture is implemented using TSMC 12-nm technology, with a processor die area of 0.89 mm2 and a power consumption of 568 mW at 1 GHz. The FFT processor can achieve a system-level throughput up to 2.66 GS/s.
在第五代(5G)通信系统中,多输入多输出(MIMO)和正交频分复用(OFDM)是两项关键技术。快速傅里叶变换(FFT)作为OFDM的核心处理步骤,直接影响系统的整体性能。在本文中,我们提出了一种新的块级流水线架构,它将FFT处理器划分为三个流水线块:输入、基数和输出。每个管道块可以同时在不同的FFT中运行,以实现更高的吞吐量。具体来说,为了降低5G应用的OFDM系统级延迟,FFT处理器支持OFDM符号循环前缀和后缀的加权重叠和添加(WOLA)。该架构采用台积电12纳米技术实现,处理器芯片面积为0.89 mm2, 1ghz时功耗为568 mW。FFT处理器可以实现高达2.66 GS/s的系统级吞吐量。
{"title":"A Novel High-Throughput FFT Processor With a Block-Level Pipeline for 5G MIMO OFDM Systems","authors":"Meiyu Liu;Zhijun Wang;Hanqing Luo;Shengnan Lin;Liping Liang","doi":"10.1109/TVLSI.2025.3558947","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3558947","url":null,"abstract":"In fifth-generation (5G) communication systems, multiple input multiple output (MIMO) and orthogonal frequency-division multiplexing (OFDM) are two critical technologies. Fast Fourier transform (FFT), as the core processing steps of OFDM, directly affects the overall system performance. In this brief, we proposed a novel block-level pipelined architecture, which divides the FFT processor into three pipeline blocks: input, radix, and output. Each pipeline block can run in a different FFT simultaneously to achieve higher throughput. Specifically, to reduce the OFDM system-level latency of 5G applications, the FFT processor supports weighted overlap and add (WOLA) on the cyclic prefix and suffix of OFDM symbols. This architecture is implemented using TSMC 12-nm technology, with a processor die area of 0.89 mm<sup>2</sup> and a power consumption of 568 mW at 1 GHz. The FFT processor can achieve a system-level throughput up to 2.66 GS/s.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 7","pages":"2059-2063"},"PeriodicalIF":2.8,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A 0.6-V 9.38-Bit 6.9-kS/s Capacitor-Splitting Bypass Window SAR ADC for Wearable 12-Lead ECG Acquisition Systems 用于可穿戴12导联心电采集系统的0.6 v 9.38位6.9 k /s电容分流旁路窗口SAR ADC
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-04-28 DOI: 10.1109/TVLSI.2025.3559669
Kangkang Sun;Jingjing Liu;Feng Yan;Yuan Ren;Ruihuang Wu;Bingjun Xiong;Zhipeng Li;Jian Guan
This article proposes a fully differential ten-bit energy-efficient successive approximation register (SAR) analog-to-digital converter (ADC) for wearable 12-lead electrocardiogram (ECG) acquisition system. The proposed ADC structure generates two bypass windows through capacitor splitting technique, which can skip unnecessary quantization steps. The judgment module of bypass windows only requires an XOR gate. By introducing redundant capacitors to participate in quantization, the total capacitance value is reduced by half. The proposed SAR ADC is fabricated using a standard 180-nm CMOS process. The measurement results show that it can achieve an effective number of bits (ENOBs) of 9.38 bits and a spurious-free dynamic range (SFDR) of 76.71 dB with a supply voltage of 0.6 V at a sampling rate ( $text{F}_{mathrm {S}}$ ) of 6.94 kS/s. The power consumption is 15.61 nW when subjected to a 1.17- $text{V}_{mathrm {PP}}~3.45$ -kHz sinusoidal input, resulting in a figure of merit (FoM) of 3.38 fJ/conv.-step. The average power consumption for quantizing 12-lead ECG signals is approximately 12.66 nW, demonstrating the ability to achieve ultralow-power quantization of ECG signals.
本文提出了一种用于可穿戴12导联心电图采集系统的全差分10位节能逐次逼近寄存器(SAR)数模转换器(ADC)。该ADC结构通过电容分裂技术产生两个旁路窗口,可以跳过不必要的量化步骤。旁路窗的判断模块只需要一个异或门。通过引入冗余电容参与量化,使总电容值减小一半。所提出的SAR ADC采用标准的180纳米CMOS工艺制造。测量结果表明,在电源电压为0.6 V、采样率($text{F}_{ maththrm {S}}$)为6.94 kS/ S的情况下,有效位元数(ENOBs)为9.38位,无杂散动态范围(SFDR)为76.71 dB。当受到1.17- $ $text{V}_{ maththrm {PP}}~3.45$ - khz正弦输入时,功耗为15.61 nW,从而产生3.38 fJ/ conv.step的品质因数(FoM)。量化12导联心电信号的平均功耗约为12.66 nW,证明了实现心电信号超低功耗量化的能力。
{"title":"A 0.6-V 9.38-Bit 6.9-kS/s Capacitor-Splitting Bypass Window SAR ADC for Wearable 12-Lead ECG Acquisition Systems","authors":"Kangkang Sun;Jingjing Liu;Feng Yan;Yuan Ren;Ruihuang Wu;Bingjun Xiong;Zhipeng Li;Jian Guan","doi":"10.1109/TVLSI.2025.3559669","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3559669","url":null,"abstract":"This article proposes a fully differential ten-bit energy-efficient successive approximation register (SAR) analog-to-digital converter (ADC) for wearable 12-lead electrocardiogram (ECG) acquisition system. The proposed ADC structure generates two bypass windows through capacitor splitting technique, which can skip unnecessary quantization steps. The judgment module of bypass windows only requires an <sc>XOR</small> gate. By introducing redundant capacitors to participate in quantization, the total capacitance value is reduced by half. The proposed SAR ADC is fabricated using a standard 180-nm CMOS process. The measurement results show that it can achieve an effective number of bits (ENOBs) of 9.38 bits and a spurious-free dynamic range (SFDR) of 76.71 dB with a supply voltage of 0.6 V at a sampling rate (<inline-formula> <tex-math>$text{F}_{mathrm {S}}$ </tex-math></inline-formula>) of 6.94 kS/s. The power consumption is 15.61 nW when subjected to a 1.17-<inline-formula> <tex-math>$text{V}_{mathrm {PP}}~3.45$ </tex-math></inline-formula>-kHz sinusoidal input, resulting in a figure of merit (FoM) of 3.38 fJ/conv.-step. The average power consumption for quantizing 12-lead ECG signals is approximately 12.66 nW, demonstrating the ability to achieve ultralow-power quantization of ECG signals.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 7","pages":"1838-1847"},"PeriodicalIF":2.8,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A High-Density eDRAM Macro With Programmable Sense Amplifier and TG-Shifter for Logical-Instruction-Based In-Memory Computing 具有可编程感测放大器和tg移位器的高密度eDRAM宏用于基于逻辑指令的内存计算
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-04-28 DOI: 10.1109/TVLSI.2025.3561507
Kunyao Lai;Enyi Yao;Zhenxing Li;Yongkui Yang
Embedded DRAM (eDRAM) has been widely adopted as on-chip cache memory in modern processors due to its high density. In this article, we propose a 2T gain-cell eDRAM-based macro that functions not only as traditional cache memory but also as an in-memory computing unit capable of performing logic operations. Furthermore, this eDRAM macro features in situ storing, completely eliminating the need for external memory or register access during computation. The sense amplifier in this macro is equipped with a programmable voltage reference, enabling support for various Boolean logic operations, including and/nand, or/nor, and not. In addition, the macro integrates a transmission-gate (TG)-based shifter cluster to perform data shifting, which is commonly required in general computations. To enhance functionality, we design an instruction set that supports compound logic computations, allowing Boolean logic, shifting, and in situ storage to be executed within a single instruction. We validated this eDRAM macro in a 32-kb bitcell array using the 40-nm logic CMOS technology. Compared with state-of-the-art designs, our macro achieves a relatively high density of 729.2 kb/mm2 and a competitive logic energy of 14.1 fJ/bit.
嵌入式DRAM (eDRAM)由于其高密度特性,在现代处理器中被广泛应用于片上高速缓存。在本文中,我们提出了一个基于2T增益单元edram的宏,它不仅可以作为传统的缓存存储器,还可以作为能够执行逻辑运算的内存计算单元。此外,这个eDRAM宏具有就地存储的特点,完全消除了在计算过程中对外部存储器或寄存器访问的需要。该宏中的感测放大器配备了可编程电压基准,支持各种布尔逻辑运算,包括and/nand, or/nor和not。此外,宏集成了一个基于传输门(TG)的移位器集群来执行数据移位,这在一般计算中通常需要。为了增强功能,我们设计了一个支持复合逻辑计算的指令集,允许在单个指令中执行布尔逻辑、移位和原位存储。我们使用40纳米逻辑CMOS技术在32 kb位元阵列中验证了该eDRAM宏。与最先进的设计相比,我们的宏实现了729.2 kb/mm2的相对较高的密度和14.1 fJ/bit的竞争逻辑能量。
{"title":"A High-Density eDRAM Macro With Programmable Sense Amplifier and TG-Shifter for Logical-Instruction-Based In-Memory Computing","authors":"Kunyao Lai;Enyi Yao;Zhenxing Li;Yongkui Yang","doi":"10.1109/TVLSI.2025.3561507","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3561507","url":null,"abstract":"Embedded DRAM (eDRAM) has been widely adopted as on-chip cache memory in modern processors due to its high density. In this article, we propose a 2T gain-cell eDRAM-based macro that functions not only as traditional cache memory but also as an in-memory computing unit capable of performing logic operations. Furthermore, this eDRAM macro features in situ storing, completely eliminating the need for external memory or register access during computation. The sense amplifier in this macro is equipped with a programmable voltage reference, enabling support for various Boolean logic operations, including <sc>and</small>/<sc>nand</small>, <sc>or</small>/<sc>nor</small>, and <sc>not</small>. In addition, the macro integrates a transmission-gate (TG)-based shifter cluster to perform data shifting, which is commonly required in general computations. To enhance functionality, we design an instruction set that supports compound logic computations, allowing Boolean logic, shifting, and in situ storage to be executed within a single instruction. We validated this eDRAM macro in a 32-kb bitcell array using the 40-nm logic CMOS technology. Compared with state-of-the-art designs, our macro achieves a relatively high density of 729.2 kb/mm<sup>2</sup> and a competitive logic energy of 14.1 fJ/bit.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 7","pages":"2069-2073"},"PeriodicalIF":2.8,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CapsBeam: Accelerating Capsule Network-Based Beamformer for Ultrasound Nonsteered Plane-Wave Imaging on Field-Programmable Gate Array CapsBeam:用于现场可编程门阵列超声无操纵平面波成像的加速胶囊网络波束形成器
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-04-25 DOI: 10.1109/TVLSI.2025.3559403
Abdul Rahoof;Vivek Chaturvedi;Mahesh Raveendranatha Panicker;Muhammad Shafique
In recent years, there has been a growing trend in accelerating computationally complex nonreal-time beamforming algorithms in ultrasound imaging using deep learning models. However, due to the large size and complexity, these state-of-the-art deep learning techniques pose significant challenges when deploying on resource-constrained edge devices. In this work, we propose a novel capsule network-based beamformer called CapsBeam, designed to operate on raw radio frequency data and provide an envelope of beamformed data through nonsteered plane-wave insonification. In experiments on in vivo data, CapsBeam reduced artifacts compared to the standard Delay-and-Sum (DAS) beamforming. For in vitro data, CapsBeam demonstrated a 32.31% increase in contrast, along with gains of 16.54% and 6.7% in axial and lateral resolution compared to the DAS. Similarly, in silico data showed a 26% enhancement in contrast, along with improvements of 13.6% and 21.5% in axial and lateral resolution, respectively, compared to the DAS. To reduce the parameter redundancy and enhance the computational efficiency, we pruned the model using our multilayer look-ahead kernel pruning (LAKP-ML) methodology, achieving a compression ratio of 85% without affecting the image quality. Additionally, the hardware complexity of the proposed model is reduced by applying quantization, simplification of nonlinear operations, and parallelizing operations. Finally, we proposed a specialized accelerator architecture for the pruned and optimized CapsBeam model, implemented on a Xilinx ZU7EV FPGA. The proposed accelerator achieved a throughput of 30 GOPS for the convolution operation and 17.4 GOPS for the dynamic routing operation.
近年来,利用深度学习模型加速超声成像中计算复杂的非实时波束形成算法已成为一种发展趋势。然而,由于规模大和复杂性,这些最先进的深度学习技术在资源受限的边缘设备上部署时会带来重大挑战。在这项工作中,我们提出了一种新型的基于胶囊网络的波束形成器,称为CapsBeam,旨在对原始射频数据进行操作,并通过非操纵平面波不相干提供波束形成数据的包络。在体内数据实验中,与标准的延迟和和(DAS)波束形成相比,CapsBeam减少了伪影。对于体外数据,与DAS相比,CapsBeam的轴向和横向分辨率分别提高了16.54%和6.7%,相比之下,CapsBeam的对比度提高了32.31%。同样,与DAS相比,计算机数据显示对比度增强了26%,轴向和横向分辨率分别提高了13.6%和21.5%。为了减少参数冗余并提高计算效率,我们使用多层前瞻性核修剪(LAKP-ML)方法对模型进行修剪,在不影响图像质量的情况下实现了85%的压缩比。此外,通过量化、简化非线性运算和并行运算,降低了模型的硬件复杂度。最后,我们提出了一个专门的加速器架构,用于修剪和优化CapsBeam模型,并在Xilinx ZU7EV FPGA上实现。该加速器的卷积运算吞吐量为30 GOPS,动态路由运算吞吐量为17.4 GOPS。
{"title":"CapsBeam: Accelerating Capsule Network-Based Beamformer for Ultrasound Nonsteered Plane-Wave Imaging on Field-Programmable Gate Array","authors":"Abdul Rahoof;Vivek Chaturvedi;Mahesh Raveendranatha Panicker;Muhammad Shafique","doi":"10.1109/TVLSI.2025.3559403","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3559403","url":null,"abstract":"In recent years, there has been a growing trend in accelerating computationally complex nonreal-time beamforming algorithms in ultrasound imaging using deep learning models. However, due to the large size and complexity, these state-of-the-art deep learning techniques pose significant challenges when deploying on resource-constrained edge devices. In this work, we propose a novel capsule network-based beamformer called CapsBeam, designed to operate on raw radio frequency data and provide an envelope of beamformed data through nonsteered plane-wave insonification. In experiments on in vivo data, CapsBeam reduced artifacts compared to the standard Delay-and-Sum (DAS) beamforming. For in vitro data, CapsBeam demonstrated a 32.31% increase in contrast, along with gains of 16.54% and 6.7% in axial and lateral resolution compared to the DAS. Similarly, in silico data showed a 26% enhancement in contrast, along with improvements of 13.6% and 21.5% in axial and lateral resolution, respectively, compared to the DAS. To reduce the parameter redundancy and enhance the computational efficiency, we pruned the model using our multilayer look-ahead kernel pruning (LAKP-ML) methodology, achieving a compression ratio of 85% without affecting the image quality. Additionally, the hardware complexity of the proposed model is reduced by applying quantization, simplification of nonlinear operations, and parallelizing operations. Finally, we proposed a specialized accelerator architecture for the pruned and optimized CapsBeam model, implemented on a Xilinx ZU7EV FPGA. The proposed accelerator achieved a throughput of 30 GOPS for the convolution operation and 17.4 GOPS for the dynamic routing operation.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 7","pages":"1934-1944"},"PeriodicalIF":2.8,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information 超大规模集成电路(VLSI)系统学报
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-04-25 DOI: 10.1109/TVLSI.2025.3557605
{"title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information","authors":"","doi":"10.1109/TVLSI.2025.3557605","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3557605","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 5","pages":"C3-C3"},"PeriodicalIF":2.8,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10977653","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143875189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Publication Information IEEE 超大规模集成 (VLSI) 系统论文集 出版信息
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-04-25 DOI: 10.1109/TVLSI.2025.3557603
{"title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems Publication Information","authors":"","doi":"10.1109/TVLSI.2025.3557603","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3557603","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 5","pages":"C2-C2"},"PeriodicalIF":2.8,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10977654","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143875162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1