IEEE Transactions on Very Large Scale Integration (VLSI) Systems最新文献

英文中文

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information 电气和电子工程师学会超大规模集成 (VLSI) 系统学会论文集信息

IF 2.8 2区工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Pub Date : 2024-10-25 DOI: 10.1109/TVLSI.2024.3474954

引用次数: 0

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Publication Information IEEE 超大规模集成 (VLSI) 系统论文集出版信息

IF 2.8 2区工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Pub Date : 2024-10-25 DOI: 10.1109/TVLSI.2024.3474952

引用次数: 0

Electrical and Thermal Characteristics Optimization in Interposer-Based 2.5-D Integrated Circuits 基于互贴器的 2.5-D 集成电路中的电气和热特性优化

IF 2.8 2区工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Pub Date : 2024-10-25 DOI: 10.1109/TVLSI.2024.3478846

Changle Zhi;Gang Dong;Deguang Yang;Daihang Liu;Yinghao Feng;Yang Wang;Zhangming Zhu

In this work, a comprehensive analysis and optimization method of electrical and thermal characteristics in 2.5-D integrated circuits (ICs) is performed, including rapid heat distribution modeling, integrated voltage regulator (IVR) chip modeling, and power delivery network (PDN) modeling. Based on the proposed method, chiplet placement, decoupling placement, and IVR parameter settings that compromise the total PDN impedance, IVR impedance, and thermal distribution characteristics can be obtained. First, a rapid thermal analysis method for multiple heat sources is proposed by integrating the equivalent thermal resistance method and commercial tools. The thermal method significantly improves the computational efficiency and reduces the memory usage. Then, we analyze the electrical characteristics of a typical low dropout (LDO) and model the complete 2.5-D PDN, including interposers, chiplets, IVRs, through-silicon vias (TSVs), bumps, decoupling capacitors, and other components. The electrical and thermal problems in the 2.5-D system are formulated and a Metropolis rule-based algorithm is used to derive optimal solutions. Finally, the optimal placement schemes and parameter settings are iterated under different constraints. This method allows for the adjustment of target impedance, noise current, thermal limit, and other constraints based on varying practical situations. In the time-domain analysis, it can be found that the capacitance value is reduced while maintaining the power supply performance. With high accuracy in thermal and electrical modeling, this work provides an in-depth reference for the co-design of chiplet-based 2.5-D ICs.

{"title":"Electrical and Thermal Characteristics Optimization in Interposer-Based 2.5-D Integrated Circuits","authors":"Changle Zhi;Gang Dong;Deguang Yang;Daihang Liu;Yinghao Feng;Yang Wang;Zhangming Zhu","doi":"10.1109/TVLSI.2024.3478846","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3478846","url":null,"abstract":"In this work, a comprehensive analysis and optimization method of electrical and thermal characteristics in 2.5-D integrated circuits (ICs) is performed, including rapid heat distribution modeling, integrated voltage regulator (IVR) chip modeling, and power delivery network (PDN) modeling. Based on the proposed method, chiplet placement, decoupling placement, and IVR parameter settings that compromise the total PDN impedance, IVR impedance, and thermal distribution characteristics can be obtained. First, a rapid thermal analysis method for multiple heat sources is proposed by integrating the equivalent thermal resistance method and commercial tools. The thermal method significantly improves the computational efficiency and reduces the memory usage. Then, we analyze the electrical characteristics of a typical low dropout (LDO) and model the complete 2.5-D PDN, including interposers, chiplets, IVRs, through-silicon vias (TSVs), bumps, decoupling capacitors, and other components. The electrical and thermal problems in the 2.5-D system are formulated and a Metropolis rule-based algorithm is used to derive optimal solutions. Finally, the optimal placement schemes and parameter settings are iterated under different constraints. This method allows for the adjustment of target impedance, noise current, thermal limit, and other constraints based on varying practical situations. In the time-domain analysis, it can be found that the capacitance value is reduced while maintaining the power supply performance. With high accuracy in thermal and electrical modeling, this work provides an in-depth reference for the co-design of chiplet-based 2.5-D ICs.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 3","pages":"627-637"},"PeriodicalIF":2.8,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143489223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Area/Power-Efficient Noise-Shaping SAR ADC for Implantable Biosensor Applications Featuring a Unique Auxiliary Feedback Loop

IF 2.8 2区工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Pub Date : 2024-10-23 DOI: 10.1109/TVLSI.2024.3477717

Weihao Wang;Kong-Pang Pun

Wide bandwidth and power-area efficient front-ends are required in emerging implantable biosensor applications, e.g., wireless artificial vision systems for the limited vision. This work presents a low-power 0.031 mm

$^{2}~3$

-MHz bandwidth noise-shaping (NS) successive approximation register (SAR) analog-to-digital converter (ADC) tailored for implantable biosensor applications. A highly energy- and area-efficient reset-free residue-processing scheme is proposed. This scheme facilitates noise transfer function (NTF) optimization by enabling control over complex poles. It establishes a unique auxiliary feedback path from the infinite impulse response (IIR) back to the finite impulse response (FIR) by eliminating the need for the FIR filter’s reset phase. This auxiliary feedback loop, which could not be achieved in previous FIR-IIR structures, is a key enabler for achieving high energy efficiency and design flexibility. The in-band quantization noise suppression is boosted by the proposed complex conjugate poles optimization technique, while a low

$kT/C$

input-referred noise has resulted. The prototype, fabricated in 65-nm CMOS technology with a 7-bit digital-to-analog converter (DAC), consumes

$370~mu $

W from a 1-V supply and occupies an active area of only

$0.17times 0.18$

mm (0.031-mm2). Operating at an 80-MHz oversampling frequency, the proposed ADC achieves a signal-to-noise and distortion ratio (SNDR) of 70.2 dB over a 3 MHz bandwidth, demonstrating its small area, high efficiency, and high performance for implantable biosensor applications that require high data rates.

{"title":"An Area/Power-Efficient Noise-Shaping SAR ADC for Implantable Biosensor Applications Featuring a Unique Auxiliary Feedback Loop","authors":"Weihao Wang;Kong-Pang Pun","doi":"10.1109/TVLSI.2024.3477717","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3477717","url":null,"abstract":"Wide bandwidth and power-area efficient front-ends are required in emerging implantable biosensor applications, e.g., wireless artificial vision systems for the limited vision. This work presents a low-power 0.031 mm<inline-formula> <tex-math>$^{2}~3$ </tex-math></inline-formula>-MHz bandwidth noise-shaping (NS) successive approximation register (SAR) analog-to-digital converter (ADC) tailored for implantable biosensor applications. A highly energy- and area-efficient reset-free residue-processing scheme is proposed. This scheme facilitates noise transfer function (NTF) optimization by enabling control over complex poles. It establishes a unique auxiliary feedback path from the infinite impulse response (IIR) back to the finite impulse response (FIR) by eliminating the need for the FIR filter’s reset phase. This auxiliary feedback loop, which could not be achieved in previous FIR-IIR structures, is a key enabler for achieving high energy efficiency and design flexibility. The in-band quantization noise suppression is boosted by the proposed complex conjugate poles optimization technique, while a low <inline-formula> <tex-math>$kT/C$ </tex-math></inline-formula> input-referred noise has resulted. The prototype, fabricated in 65-nm CMOS technology with a 7-bit digital-to-analog converter (DAC), consumes <inline-formula> <tex-math>$370~mu $ </tex-math></inline-formula>W from a 1-V supply and occupies an active area of only <inline-formula> <tex-math>$0.17times 0.18$ </tex-math></inline-formula> mm (0.031-mm2). Operating at an 80-MHz oversampling frequency, the proposed ADC achieves a signal-to-noise and distortion ratio (SNDR) of 70.2 dB over a 3 MHz bandwidth, demonstrating its small area, high efficiency, and high performance for implantable biosensor applications that require high data rates.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 3","pages":"685-696"},"PeriodicalIF":2.8,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143489213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A 4.2-to-0.5-V, 0.8-μA–0.8-mA, Power-Efficient Three-Level SIMO Buck Converter for a Quad-Voltage RISC-V Microprocessor 一种用于四电压RISC-V微处理器的4.2- 0.5 v, 0.8 μA - 0.8 ma，节能的三电平SIMO降压转换器

IF 2.8 2区工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Pub Date : 2024-10-23 DOI: 10.1109/TVLSI.2024.3477632

Dongkwun Kim;Zhaoqing Wang;Paul Xuanyuanliang Huang;Pavan Kumar Chundi;Suhwan Kim;Andrés A. Blanco;Ram K. Krishnamurthy;Mingoo Seok

This article presents a Li-ion battery-compatible single-inductor-multiple-output (SIMO) buck converter that fulfills the power management need of an integrated sub-mW RISC-V microprocessor. The proposed converter can directly take a 4.2-V battery voltage and produce four power rails ranging from 1.8 V for I/O to 0.5 V for the processor core. The three-level input stage is chosen to reduce the inductor ripple size and switching loss, thus increasing power conversion efficiency (PCE). In addition, the fully digital implementation using novel domino flash analog-digital converters (ADCs) enables low static current. Also, pulse frequency modulation (PFM) results in a wide dynamic range. The proposed three-level SIMO converter has been prototyped in a 65-nm CMOS technology with the 32-bit RISC-V processor. Measurement results show that the converter achieves a

$1000times $

load current range (

$0.8~mu $

A–0.8 mA) to support the active or sleep modes of the processor. The converter marks the PCE of 56.2%–72.8%. Compared to the ideal buck-low-dropout voltage regulator (LDO) architecture (LDO-only), it improves the PCE by 23.8% (46.4%).

本文提出了一种锂离子电池兼容的单电感多输出（SIMO）降压转换器，它满足了集成的毫瓦以下RISC-V微处理器的电源管理需求。所提出的转换器可以直接采用4.2 V的电池电压，并产生从1.8 V的I/O到0.5 V的处理器核心的四个电源轨。选择三电平输入级是为了减小电感纹波大小和开关损耗，从而提高功率转换效率。此外，采用新型多米诺闪存模数转换器（adc）的全数字实现可以实现低静态电流。此外，脉冲频率调制（PFM）具有宽的动态范围。所提出的三电平SIMO转换器已采用65纳米CMOS技术和32位RISC-V处理器进行原型设计。测量结果表明，该转换器实现了$1000times $负载电流范围（$0.8~ $ a - 0.8 mA），以支持处理器的活动或休眠模式。转换器的PCE为56.2%-72.8%。与理想的buck-low-dropout电压调节器（LDO）架构（仅LDO）相比，PCE提高了23.8%（46.4%）。

{"title":"A 4.2-to-0.5-V, 0.8-μA–0.8-mA, Power-Efficient Three-Level SIMO Buck Converter for a Quad-Voltage RISC-V Microprocessor","authors":"Dongkwun Kim;Zhaoqing Wang;Paul Xuanyuanliang Huang;Pavan Kumar Chundi;Suhwan Kim;Andrés A. Blanco;Ram K. Krishnamurthy;Mingoo Seok","doi":"10.1109/TVLSI.2024.3477632","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3477632","url":null,"abstract":"This article presents a Li-ion battery-compatible single-inductor-multiple-output (SIMO) buck converter that fulfills the power management need of an integrated sub-mW RISC-V microprocessor. The proposed converter can directly take a 4.2-V battery voltage and produce four power rails ranging from 1.8 V for I/O to 0.5 V for the processor core. The three-level input stage is chosen to reduce the inductor ripple size and switching loss, thus increasing power conversion efficiency (PCE). In addition, the fully digital implementation using novel domino flash analog-digital converters (ADCs) enables low static current. Also, pulse frequency modulation (PFM) results in a wide dynamic range. The proposed three-level SIMO converter has been prototyped in a 65-nm CMOS technology with the 32-bit RISC-V processor. Measurement results show that the converter achieves a \u0000<inline-formula> <tex-math>$1000times $ </tex-math></inline-formula>\u0000 load current range (\u0000<inline-formula> <tex-math>$0.8~mu $ </tex-math></inline-formula>\u0000A–0.8 mA) to support the active or sleep modes of the processor. The converter marks the PCE of 56.2%–72.8%. Compared to the ideal buck-low-dropout voltage regulator (LDO) architecture (LDO-only), it improves the PCE by 23.8% (46.4%).","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 1","pages":"193-206"},"PeriodicalIF":2.8,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142918375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improving a Ka-Band Integrated Balanced Power Amplifier Performance by Compensating Quadrature Hybrid Mismatch Effects 通过补偿正交杂化失配效应改善ka波段集成平衡功率放大器性能

IF 2.8 2区工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Pub Date : 2024-10-21 DOI: 10.1109/TVLSI.2024.3475810

Jere Rusanen;Negar Shabanzadeh;Aarno Pärssinen;Timo Rahkonen;Janne P. Aikio

This article presents an integrated quadrature balanced power amplifier (PA) operating at a 26-GHz frequency range and techniques to mitigate the frequency-dependent amplitude response of quadrature hybrids used in the balanced amplifier design. The overall structure consists of two stacked pseudo-differential PAs and transformer-based quadrature hybrids designed with 22-nm CMOS FDSOI. Two techniques to compensate frequency-dependent amplitude response of the quadrature hybrid when operating away from the center frequency are proposed. The first one involves a dual input drive and the second one involves asymmetric biasing. With distortion contribution analysis, it is shown that asymmetric biasing compensates quadrature hybrid asymmetry but also produces mutually compensating third-order nonlinearity, resulting in improved linearity. Measurements with continuous wave (CW) and high dynamic range fifth generation (5G) modulated signal demonstrate that the described techniques improve output power that can be reached within the linearity specifications when operating away from the center frequency.

本文介绍了一种在 26 GHz 频率范围内工作的集成正交平衡功率放大器 (PA)，以及减轻平衡放大器设计中使用的正交混合放大器的频率相关振幅响应的技术。整体结构包括两个堆叠式伪差分功率放大器和基于变压器的正交混合放大器，采用 22 纳米 CMOS FDSOI 设计。在远离中心频率工作时，提出了两种补偿正交混合放大器频率相关振幅响应的技术。第一种涉及双输入驱动，第二种涉及非对称偏置。通过失真贡献分析表明，非对称偏置不仅能补偿正交混合放大器的不对称，还能产生相互补偿的三阶非线性，从而提高线性度。连续波（CW）和高动态范围第五代（5G）调制信号的测量结果表明，所述技术提高了输出功率，在远离中心频率工作时，输出功率可达到线性度规范要求。

{"title":"Improving a Ka-Band Integrated Balanced Power Amplifier Performance by Compensating Quadrature Hybrid Mismatch Effects","authors":"Jere Rusanen;Negar Shabanzadeh;Aarno Pärssinen;Timo Rahkonen;Janne P. Aikio","doi":"10.1109/TVLSI.2024.3475810","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3475810","url":null,"abstract":"This article presents an integrated quadrature balanced power amplifier (PA) operating at a 26-GHz frequency range and techniques to mitigate the frequency-dependent amplitude response of quadrature hybrids used in the balanced amplifier design. The overall structure consists of two stacked pseudo-differential PAs and transformer-based quadrature hybrids designed with 22-nm CMOS FDSOI. Two techniques to compensate frequency-dependent amplitude response of the quadrature hybrid when operating away from the center frequency are proposed. The first one involves a dual input drive and the second one involves asymmetric biasing. With distortion contribution analysis, it is shown that asymmetric biasing compensates quadrature hybrid asymmetry but also produces mutually compensating third-order nonlinearity, resulting in improved linearity. Measurements with continuous wave (CW) and high dynamic range fifth generation (5G) modulated signal demonstrate that the described techniques improve output power that can be reached within the linearity specifications when operating away from the center frequency.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 12","pages":"2198-2209"},"PeriodicalIF":2.8,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10726609","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142821280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Novel Prediction-Based Two-Tiered ECC for Mitigating SWD Errors in HBM 一种新的基于预测的双层ECC减轻HBM中SWD误差

IF 2.8 2区工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Pub Date : 2024-10-21 DOI: 10.1109/TVLSI.2024.3474791

Youngki Moon;Seung Ho Shin;Seokjun Jang;Duyeon Won;Sungho Kang

Errors emerge as a major issue in the reliability of dynamic random access memory (DRAM). To enhance reliability, a two-tiered error correction code (ECC) architecture that comprises on-die ECC (OD-ECC) and system ECC (S-ECC) is adopted as a part of the standard for state-of-the-art high-bandwidth memory (HBM). However, conventional ECCs are insufficient to mitigate malfunctions of subwordline drivers (SWDs), a primary cause of errors. Moreover, the efficient co-design of two-tiered ECCs has not been sufficiently studied. To address these issues without increasing the size of check bits, this article proposes a two-tiered ECC architecture comprising an OD-ECC based on prediction and an S-ECC with data deinterleaving. The proposed OD-ECC predicts the SWD errors by leveraging the detection capabilities of two interleaved Reed-Solomon (RS) engines. In addition, the proposed S-ECC not only preserves strong error detection capability but also masks the misprediction effect of OD-ECC, where data deinterleaving renders additional errors caused by misprediction of OD-ECC to be bounded in the detectable range of the employed cyclic redundancy check (CRC). The experimental results demonstrate that the proposed two-tiered ECC can significantly enhance the error correction capability for SWD errors while maintaining the correction capability for other types of errors.

错误是动态随机存取存储器（DRAM）可靠性中的一个主要问题。为了提高可靠性，采用了两层ECC （error correction code）架构，包括片上ECC （OD-ECC）和系统ECC (S-ECC)，作为最先进的高带宽内存（HBM）标准的一部分。然而，传统的ECCs不足以减轻子词线驱动程序（swd）的故障，这是导致错误的主要原因。此外，两层ECCs的有效协同设计还没有得到充分研究。为了在不增加校验位大小的情况下解决这些问题，本文提出了一种两层ECC架构，包括基于预测的OD-ECC和具有数据去交错的S-ECC。提出的OD-ECC通过利用两个交错Reed-Solomon （RS）引擎的检测能力来预测SWD误差。此外，本文提出的S-ECC不仅保留了强大的错误检测能力，而且掩盖了OD-ECC的误预测效应，其中数据去交错使得OD-ECC误预测引起的额外错误被限制在所采用的循环冗余校验（CRC）的可检测范围内。实验结果表明，本文提出的两层ECC在保持对其他类型错误的纠错能力的同时，显著增强了对SWD错误的纠错能力。

{"title":"A Novel Prediction-Based Two-Tiered ECC for Mitigating SWD Errors in HBM","authors":"Youngki Moon;Seung Ho Shin;Seokjun Jang;Duyeon Won;Sungho Kang","doi":"10.1109/TVLSI.2024.3474791","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3474791","url":null,"abstract":"Errors emerge as a major issue in the reliability of dynamic random access memory (DRAM). To enhance reliability, a two-tiered error correction code (ECC) architecture that comprises on-die ECC (OD-ECC) and system ECC (S-ECC) is adopted as a part of the standard for state-of-the-art high-bandwidth memory (HBM). However, conventional ECCs are insufficient to mitigate malfunctions of subwordline drivers (SWDs), a primary cause of errors. Moreover, the efficient co-design of two-tiered ECCs has not been sufficiently studied. To address these issues without increasing the size of check bits, this article proposes a two-tiered ECC architecture comprising an OD-ECC based on prediction and an S-ECC with data deinterleaving. The proposed OD-ECC predicts the SWD errors by leveraging the detection capabilities of two interleaved Reed-Solomon (RS) engines. In addition, the proposed S-ECC not only preserves strong error detection capability but also masks the misprediction effect of OD-ECC, where data deinterleaving renders additional errors caused by misprediction of OD-ECC to be bounded in the detectable range of the employed cyclic redundancy check (CRC). The experimental results demonstrate that the proposed two-tiered ECC can significantly enhance the error correction capability for SWD errors while maintaining the correction capability for other types of errors.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 2","pages":"488-498"},"PeriodicalIF":2.8,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Designing Precharge-Free Energy-Efficient Content-Addressable Memories 设计无预充电的高能效内容可寻址存储器

IF 2.8 2区工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Pub Date : 2024-10-18 DOI: 10.1109/TVLSI.2024.3475036

Ramiro Taco;Esteban Garzón;Robert Hanhan;Adam Teman;Leonid Yavits;Marco Lanuzza

Content-addressable memory (CAM) is a specialized type of memory that facilitates massively parallel comparison of a search pattern against its entire content. State-of-the-art (SOTA) CAM solutions are either fast but power-hungry (NOR CAM) or slow while consuming less power (nand CAM). These limitations stem from the dynamic precharge operation, leading to excessive power consumption in NOR CAMs and charge-sharing issues in NAND CAMs. In this work, we propose a precharge-free CAM (PCAM) class for energy-efficient applications. By avoiding precharge operation, PCAM consumes less energy than a NAND CAM, while achieving search speed comparable to a NOR CAM. PCAM was designed using a 65-nm CMOS technology and comprehensively evaluated under extensive Monte Carlo (MC) simulations while taking into account layout parasitics. When benchmarked against conventional NAND CAM, PCAM demonstrates improved search run time (reduced by more than 30%) and 15% less search energy. Moreover, PCAM can cut energy consumption by more than 75% when compared to conventional NOR CAM. We further extend our analysis to the application level, functionally evaluating the CAM designs as a fully associative cache using a CPU simulator running various benchmark workloads. This analysis confirms that PCAMs represent an optimal energy-performance design choice for associative memories and their broad spectrum of applications.

内容可寻址内存（CAM）是一种专门的内存类型，便于大规模并行比较搜索模式与整个内容。最先进的 CAM 解决方案要么速度快但功耗高（NOR CAM），要么速度慢但功耗低（nand CAM）。这些限制源于动态预充电操作，导致 NOR CAM 的功耗过高和 NAND CAM 的充电共享问题。在这项工作中，我们为高能效应用提出了一种免预充电 CAM（PCAM）类别。通过避免预充电操作，PCAM 的能耗低于 NAND CAM，同时搜索速度与 NOR CAM 相当。PCAM 采用 65 纳米 CMOS 技术设计，并在广泛的蒙特卡罗（MC）模拟中进行了全面评估，同时考虑了布局寄生效应。与传统的 NAND CAM 相比，PCAM 的搜索运行时间缩短了 30% 以上，搜索能耗降低了 15%。此外，与传统的 NOR CAM 相比，PCAM 可以减少 75% 以上的能耗。我们进一步将分析扩展到应用层面，使用运行各种基准工作负载的 CPU 模拟器，将 CAM 设计作为全关联高速缓存进行功能评估。这项分析证实，PCAM 是关联存储器及其广泛应用的最佳能效设计选择。

{"title":"Designing Precharge-Free Energy-Efficient Content-Addressable Memories","authors":"Ramiro Taco;Esteban Garzón;Robert Hanhan;Adam Teman;Leonid Yavits;Marco Lanuzza","doi":"10.1109/TVLSI.2024.3475036","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3475036","url":null,"abstract":"Content-addressable memory (CAM) is a specialized type of memory that facilitates massively parallel comparison of a search pattern against its entire content. State-of-the-art (SOTA) CAM solutions are either fast but power-hungry (NOR CAM) or slow while consuming less power (nand CAM). These limitations stem from the dynamic precharge operation, leading to excessive power consumption in NOR CAMs and charge-sharing issues in NAND CAMs. In this work, we propose a precharge-free CAM (PCAM) class for energy-efficient applications. By avoiding precharge operation, PCAM consumes less energy than a NAND CAM, while achieving search speed comparable to a NOR CAM. PCAM was designed using a 65-nm CMOS technology and comprehensively evaluated under extensive Monte Carlo (MC) simulations while taking into account layout parasitics. When benchmarked against conventional NAND CAM, PCAM demonstrates improved search run time (reduced by more than 30%) and 15% less search energy. Moreover, PCAM can cut energy consumption by more than 75% when compared to conventional NOR CAM. We further extend our analysis to the application level, functionally evaluating the CAM designs as a fully associative cache using a CPU simulator running various benchmark workloads. This analysis confirms that PCAMs represent an optimal energy-performance design choice for associative memories and their broad spectrum of applications.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 12","pages":"2303-2314"},"PeriodicalIF":2.8,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10723100","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142821282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Study on Nonlinearity in Mixers Using a Time-Varying Volterra-Based Distortion Contribution Analysis Tool 使用基于时变伏特拉失真贡献分析工具的混频器非线性研究

IF 2.8 2区工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Pub Date : 2024-10-17 DOI: 10.1109/TVLSI.2024.3474183

Negar Shabanzadeh;Aarno Pärssinen;Timo Rahkonen

To optimize the linearity of mixers, one needs to recognize the origins and mixing mechanisms of dominant nonlinearities. This article presents a time-varying (TV) nonlinear analysis, where TV polynomial models are used to yield harmonic mixing functions in four simple mixer structures. Local oscillator (LO) waveform engineering has been utilized to tailor the nonlinearity coefficients, which were later fed into a numerical distortion contribution analysis tool to calculate the nonlinearity using a Volterra series-based approach. The spectra of the nonlinear voltages have been drawn, followed by plots breaking the final IM3 distortion down into its contributions. The effect of filtering on distortion has been illustrated in the end.

为了优化混频器的线性度，我们需要认识主要非线性的起源和混合机制。本文介绍了一种时变（TV）非线性分析，利用 TV 多项式模型在四个简单混频器结构中产生谐波混合函数。本地振荡器（LO）波形工程被用来定制非线性系数，然后将其输入数值失真贡献分析工具，利用基于 Volterra 系列的方法计算非线性。绘制了非线性电压的频谱，然后绘制了将最终 IM3 失真分解为其贡献的图谱。最后还说明了滤波对失真产生的影响。

引用次数: 0

Bandwidth-Latency-Thermal Co-Optimization of Interconnect-Dominated Many-Core 3D-IC 互连主导多核3D-IC的带宽-延迟-热协同优化

IF 2.8 2区工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Pub Date : 2024-10-16 DOI: 10.1109/TVLSI.2024.3467148

Sudipta Das;Samuel Riedel;Mohamed Naeim;Moritz Brunion;Marco Bertuletti;Luca Benini;Julien Ryckaert;James Myers;Dwaipayan Biswas;Dragomir Milojevic

The ongoing integration of advanced functionalities in contemporary system-on-chips (SoCs) poses significant challenges related to memory bandwidth, capacity, and thermal stability. These challenges are further amplified with the advancement of artificial intelligence (AI), necessitating enhanced memory and interconnect bandwidth and latency. This article presents a comprehensive study encompassing architectural modifications of an interconnect-dominated many-core SoC targeting the significant increase of intermediate, on-chip cache memory bandwidth and access latency tuning. The proposed SoC has been implemented in 3-D using A10 nanosheet technology and early thermal analysis has been performed. Our workload simulations reveal, respectively, up to 12- and 2.5-fold acceleration in the 64-core and 16-core versions of the SoC. Such speed-up comes at 40% increase in die-area and a 60% rise in power dissipation when implemented in 2-D. In contrast, the 3-D counterpart not only minimizes the footprint but also yields 20% power savings, attributable to a 40% reduction in wirelength. The article further highlights the importance of pipeline restructuring to leverage the potential of 3-D technology for achieving lower latency and more efficient memory access. Finally, we discuss the thermal implications of various 3-D partitioning schemes in High Performance Computing (HPC) and mobile applications. Our analysis reveals that, unlike high-power density HPC cases, 3-D mobile case increases

$T_{max }$

only by

$2~^{circ } $

C–

$3~^{circ } $

C compared to 2-D, while the HPC scenario analysis requires multiconstrained efficient partitioning for 3-D implementations.

现代系统芯片（soc）不断集成高级功能，对内存带宽、容量和热稳定性提出了重大挑战。随着人工智能（AI）的进步，这些挑战进一步放大，需要增强内存、互连带宽和延迟。本文提出了一项全面的研究，包括互连主导的多核SoC的架构修改，目标是显著增加中间，片上缓存存储器带宽和访问延迟调优。所提出的SoC已经使用A10纳米片技术实现了3d，并进行了早期热分析。我们的工作负载模拟显示，64核和16核版本的SoC分别具有高达12倍和2.5倍的加速。当在2d中实现时，这种加速带来了40%的模面积增加和60%的功耗增加。相比之下，3d打印不仅可以最大限度地减少占地面积，还可以节省20%的电力，因为无线长度减少了40%。本文进一步强调了管道重组的重要性，以利用3d技术的潜力来实现更低的延迟和更高效的内存访问。最后，我们讨论了各种三维分区方案在高性能计算（HPC）和移动应用中的热影响。我们的分析表明，与高功率密度HPC情况不同，3- d移动情况下，与2- d相比，$T_{max}$仅增加$2~^{circ}$ C - $3~^{circ}$ C，而HPC场景分析需要多约束的高效分区来实现3- d实现。

{"title":"Bandwidth-Latency-Thermal Co-Optimization of Interconnect-Dominated Many-Core 3D-IC","authors":"Sudipta Das;Samuel Riedel;Mohamed Naeim;Moritz Brunion;Marco Bertuletti;Luca Benini;Julien Ryckaert;James Myers;Dwaipayan Biswas;Dragomir Milojevic","doi":"10.1109/TVLSI.2024.3467148","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3467148","url":null,"abstract":"The ongoing integration of advanced functionalities in contemporary system-on-chips (SoCs) poses significant challenges related to memory bandwidth, capacity, and thermal stability. These challenges are further amplified with the advancement of artificial intelligence (AI), necessitating enhanced memory and interconnect bandwidth and latency. This article presents a comprehensive study encompassing architectural modifications of an interconnect-dominated many-core SoC targeting the significant increase of intermediate, on-chip cache memory bandwidth and access latency tuning. The proposed SoC has been implemented in 3-D using A10 nanosheet technology and early thermal analysis has been performed. Our workload simulations reveal, respectively, up to 12- and 2.5-fold acceleration in the 64-core and 16-core versions of the SoC. Such speed-up comes at 40% increase in die-area and a 60% rise in power dissipation when implemented in 2-D. In contrast, the 3-D counterpart not only minimizes the footprint but also yields 20% power savings, attributable to a 40% reduction in wirelength. The article further highlights the importance of pipeline restructuring to leverage the potential of 3-D technology for achieving lower latency and more efficient memory access. Finally, we discuss the thermal implications of various 3-D partitioning schemes in High Performance Computing (HPC) and mobile applications. Our analysis reveals that, unlike high-power density HPC cases, 3-D mobile case increases <inline-formula> <tex-math>$T_{max }$ </tex-math></inline-formula> only by <inline-formula> <tex-math>$2~^{circ } $ </tex-math></inline-formula>C–<inline-formula> <tex-math>$3~^{circ } $ </tex-math></inline-formula>C compared to 2-D, while the HPC scenario analysis requires multiconstrained efficient partitioning for 3-D implementations.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 2","pages":"346-357"},"PeriodicalIF":2.8,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142992942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀