首页 > 最新文献

IEEE Solid-State Circuits Letters最新文献

英文 中文
A 0.19-PEF Bandwidth/Power Scalable Dynamic Amplifier 一个0.19 pef带宽/功率可扩展的动态放大器
IF 2 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-02-27 DOI: 10.1109/LSSC.2026.3668876
Yihang Cheng;Yihan Zhang;Huaqiang Wu;Sining Pan
This letter presents an energy-efficient dynamic amplifier. It utilizes source-coupled input boosting and time-domain differential sampling techniques to boost the effective input signal by $4times $ compared to its floating inverter amplifier (FIA) prototype without noise or power penalties. With discharge-based dynamic biasing, the bandwidth (BW) and power of the amplifier can be scaled by $100times $ . Fabricated in a standard 0.18- $mu $ m CMOS technology, the amplifier achieves a state-of-the-art power efficiency factor (PEF) of 0.19, which is $16times $ better than that of a standard FIA. It also achieves a scalable BW/power range from 0.5 kHz/2.3 nW to 50 kHz/206 nW.
这封信介绍了一种节能的动态放大器。它利用源耦合输入增强和时域差分采样技术,与浮动逆变器放大器(FIA)原型相比,有效输入信号增强了4倍,没有噪声或功率损失。利用基于放电的动态偏置,放大器的带宽(BW)和功率可以缩放100倍。该放大器采用标准的0.18- $ $ μ $ m CMOS技术制造,功率效率系数(PEF)为0.19,是标准FIA的16倍。它还实现了可扩展的BW/功率范围从0.5 kHz/2.3 nW到50 kHz/206 nW。
{"title":"A 0.19-PEF Bandwidth/Power Scalable Dynamic Amplifier","authors":"Yihang Cheng;Yihan Zhang;Huaqiang Wu;Sining Pan","doi":"10.1109/LSSC.2026.3668876","DOIUrl":"https://doi.org/10.1109/LSSC.2026.3668876","url":null,"abstract":"This letter presents an energy-efficient dynamic amplifier. It utilizes source-coupled input boosting and time-domain differential sampling techniques to boost the effective input signal by <inline-formula> <tex-math>$4times $ </tex-math></inline-formula> compared to its floating inverter amplifier (FIA) prototype without noise or power penalties. With discharge-based dynamic biasing, the bandwidth (BW) and power of the amplifier can be scaled by <inline-formula> <tex-math>$100times $ </tex-math></inline-formula>. Fabricated in a standard 0.18-<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>m CMOS technology, the amplifier achieves a state-of-the-art power efficiency factor (PEF) of 0.19, which is <inline-formula> <tex-math>$16times $ </tex-math></inline-formula> better than that of a standard FIA. It also achieves a scalable BW/power range from 0.5 kHz/2.3 nW to 50 kHz/206 nW.","PeriodicalId":13032,"journal":{"name":"IEEE Solid-State Circuits Letters","volume":"9 ","pages":"93-96"},"PeriodicalIF":2.0,"publicationDate":"2026-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147440659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A 7056-PPI Pixel Circuit With Low-Leakage Structure for Active-Matrix Monochrome Micro-LED Displays 一种用于有源矩阵单色微led显示屏的7056-PPI低漏结构像素电路
IF 2 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-02-24 DOI: 10.1109/LSSC.2026.3667520
Chih-Lung Lin;Cheng-Han Ke;Yu-Chang Chiu;Yuan-Yu Lai;Yi-Chien Chen;Cheng-Rui Lu;Yu-Hsiang Fu;Chih-Chuan Huang
This work presents a 5T2C pixel circuit for active-matrix (AM) micro-displays in near-eye display applications. The circuit supports monochrome micro light-emitting diode (micro-LED) displays with ultrahigh resolution of 7056 pixels per inch (PPI). The circuit is designed and fabricated based on medium-voltage (MV) devices from the 55-nm high-voltage (HV) CMOS process. The total storage capacitance is only 6.588 fF, and the stored charge is susceptible to transistor leakage current, resulting in significant deviations in the average driving currents. To mitigate the deviations, the circuit isolates the switching transistor (TS) from the data signals, increases the body voltage for TS, and minimizes the body area of TS. The low-leakage structure effectively reduces off-leakage current (IOFF) and reverse-bias p-n-junction leakage current (IREV) of the transistors. Simulation results demonstrate that the proposed circuit produces slight current deviations compared to a conventional 2T1C circuit. Measurement results also confirm that the deviation rate is less than 6.23% for all gray levels (GLs).
本文提出了一种5T2C像素电路,用于近眼显示应用中的有源矩阵(AM)微显示器。该电路支持单色微型发光二极管(micro- led)显示器,其超高分辨率为每英寸7056像素(PPI)。该电路是基于55纳米高压(HV) CMOS工艺的中压(MV)器件设计和制造的。总存储电容仅为6.588 fF,存储电荷易受晶体管漏电流影响,导致平均驱动电流偏差较大。为了减轻这种偏差,该电路将开关晶体管(TS)与数据信号隔离,提高TS的体电压,减小TS的体面积,低漏结构有效地降低了晶体管的关漏电流(IOFF)和反偏置pn结漏电流(IREV)。仿真结果表明,与传统的2T1C电路相比,该电路产生的电流偏差较小。测量结果也证实了所有灰度(GLs)的偏差率小于6.23%。
{"title":"A 7056-PPI Pixel Circuit With Low-Leakage Structure for Active-Matrix Monochrome Micro-LED Displays","authors":"Chih-Lung Lin;Cheng-Han Ke;Yu-Chang Chiu;Yuan-Yu Lai;Yi-Chien Chen;Cheng-Rui Lu;Yu-Hsiang Fu;Chih-Chuan Huang","doi":"10.1109/LSSC.2026.3667520","DOIUrl":"https://doi.org/10.1109/LSSC.2026.3667520","url":null,"abstract":"This work presents a 5T2C pixel circuit for active-matrix (AM) micro-displays in near-eye display applications. The circuit supports monochrome micro light-emitting diode (micro-LED) displays with ultrahigh resolution of 7056 pixels per inch (PPI). The circuit is designed and fabricated based on medium-voltage (MV) devices from the 55-nm high-voltage (HV) CMOS process. The total storage capacitance is only 6.588 fF, and the stored charge is susceptible to transistor leakage current, resulting in significant deviations in the average driving currents. To mitigate the deviations, the circuit isolates the switching transistor (TS) from the data signals, increases the body voltage for TS, and minimizes the body area of TS. The low-leakage structure effectively reduces <sc>off</small>-leakage current (IOFF) and reverse-bias p-n-junction leakage current (IREV) of the transistors. Simulation results demonstrate that the proposed circuit produces slight current deviations compared to a conventional 2T1C circuit. Measurement results also confirm that the deviation rate is less than 6.23% for all gray levels (GLs).","PeriodicalId":13032,"journal":{"name":"IEEE Solid-State Circuits Letters","volume":"9 ","pages":"89-92"},"PeriodicalIF":2.0,"publicationDate":"2026-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147362403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analysis and Design of Power Amplifier Using Parallel-Combined Multisegment Transformer 并联多段变压器功率放大器的分析与设计
IF 2 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-02-16 DOI: 10.1109/LSSC.2026.3665218
Geuntae Kim;Hyunjin Ahn;Kyutaek Oh;Ilku Nam;Ockgoo Lee
This letter presents a highly efficient power amplifier (PA) using a parallel-combined vertical multisegment transformer for 5G new radio (NR) applications operating in bands n257 and n258, in a 65-nm bulk CMOS process. A multisegment transformer facilitates a lower provided input impedance than a conventional transformer, enabling the PA to achieve a higher output power. The vertical multisegment transformer also offers lower insertion loss than the conventional transformer, owing to its high coupling factor. In addition, this work demonstrates that utilizing the compact parallel T-line with vertical multisegment transformers, a low-loss characteristic can be achieved by providing an appropriately low input impedance for high output power. To achieve high efficiency in the low-power (LP) region, the PA operates in dual-mode by applying discrete power control. The PA achieves 24.87-dBm saturated output power with 40.27% peak power-added efficiency (PAE) at 27 GHz. Furthermore, it achieves average output powers of 17.6 and 13.8 dBm with average PAEs of 14.02% and 12.31% in the high- and LP modes, respectively, using 800 Msym/s of 5G OFDM using 64-QAM with −25 dB error vector magnitude.
本文介绍了一种高效功率放大器(PA),采用并联组合垂直多段变压器,用于运行在n257和n258频段的5G新无线电(NR)应用,采用65nm大块CMOS工艺。多段变压器比传统变压器提供更低的输入阻抗,使PA能够实现更高的输出功率。由于其高耦合系数,垂直多段变压器也比传统变压器提供更低的插入损耗。此外,这项工作表明,利用紧凑的并联t线与垂直多段变压器,可以通过为高输出功率提供适当的低输入阻抗来实现低损耗特性。为了在低功耗(LP)区域实现高效率,PA通过应用离散功率控制在双模式下工作。该放大器在27 GHz时的饱和输出功率为24.87 dbm,峰值功率增加效率(PAE)为40.27%。此外,在使用64-QAM的800 Msym/s的5G OFDM,误差矢量幅度为−25 dB时,在高模式和低模式下,它的平均输出功率分别为17.6和13.8 dBm,平均PAEs分别为14.02%和12.31%。
{"title":"Analysis and Design of Power Amplifier Using Parallel-Combined Multisegment Transformer","authors":"Geuntae Kim;Hyunjin Ahn;Kyutaek Oh;Ilku Nam;Ockgoo Lee","doi":"10.1109/LSSC.2026.3665218","DOIUrl":"https://doi.org/10.1109/LSSC.2026.3665218","url":null,"abstract":"This letter presents a highly efficient power amplifier (PA) using a parallel-combined vertical multisegment transformer for 5G new radio (NR) applications operating in bands n257 and n258, in a 65-nm bulk CMOS process. A multisegment transformer facilitates a lower provided input impedance than a conventional transformer, enabling the PA to achieve a higher output power. The vertical multisegment transformer also offers lower insertion loss than the conventional transformer, owing to its high coupling factor. In addition, this work demonstrates that utilizing the compact parallel T-line with vertical multisegment transformers, a low-loss characteristic can be achieved by providing an appropriately low input impedance for high output power. To achieve high efficiency in the low-power (LP) region, the PA operates in dual-mode by applying discrete power control. The PA achieves 24.87-dBm saturated output power with 40.27% peak power-added efficiency (PAE) at 27 GHz. Furthermore, it achieves average output powers of 17.6 and 13.8 dBm with average PAEs of 14.02% and 12.31% in the high- and LP modes, respectively, using 800 Msym/s of 5G OFDM using 64-QAM with −25 dB error vector magnitude.","PeriodicalId":13032,"journal":{"name":"IEEE Solid-State Circuits Letters","volume":"9 ","pages":"85-88"},"PeriodicalIF":2.0,"publicationDate":"2026-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Aging-Robust 32-MHz RC Frequency Reference With 0.4-ppm Allan Deviation and ±1550-ppm Inaccuracy From −40 °C to 125 °C After a 1-Point Trim 一个老化稳健的32 mhz RC频率参考与0.4 ppm的艾伦偏差和±1550 ppm的误差从- 40°C至125°C 1点修剪后
IF 2 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-02-13 DOI: 10.1109/LSSC.2026.3664504
Sining Pan;Xiaohan Liu;Junlong Zeng;Yihang Cheng;Kofi. A. A. Makinwa;Huaqiang Wu
This letter presents an aging-robust 32-MHz RC frequency reference based on a frequency-locked-loop (FLL). With a temperature compensation scheme that combines BJTs and aging-robust diffusion resistors, the FLL achieves ±1550-ppm inaccuracy from $-40~^{circ }$ C to $125~^{circ }$ C after batch calibration and a low-cost 1-point trim, which increases to ±2350-ppm after accelerated aging. Due to the extensive use of dynamic error-correction techniques, the FLL also achieves a state-of-the-art Allan deviation floor of 0.4 ppm.
这封信提出了一个基于频率锁环(FLL)的32 mhz RC频率参考。通过结合bjt和耐老化扩散电阻的温度补偿方案,FLL在批量校准和低成本的1点微调后实现了±1550-ppm的误差,误差范围从-40~^{circ}$ C到125~^{circ}$ C,加速老化后误差增加到±2350-ppm。由于广泛使用动态纠错技术,FLL还实现了最先进的0.4 ppm的Allan偏差下限。
{"title":"An Aging-Robust 32-MHz RC Frequency Reference With 0.4-ppm Allan Deviation and ±1550-ppm Inaccuracy From −40 °C to 125 °C After a 1-Point Trim","authors":"Sining Pan;Xiaohan Liu;Junlong Zeng;Yihang Cheng;Kofi. A. A. Makinwa;Huaqiang Wu","doi":"10.1109/LSSC.2026.3664504","DOIUrl":"https://doi.org/10.1109/LSSC.2026.3664504","url":null,"abstract":"This letter presents an aging-robust 32-MHz RC frequency reference based on a frequency-locked-loop (FLL). With a temperature compensation scheme that combines BJTs and aging-robust diffusion resistors, the FLL achieves ±1550-ppm inaccuracy from <inline-formula> <tex-math>$-40~^{circ }$ </tex-math></inline-formula>C to <inline-formula> <tex-math>$125~^{circ }$ </tex-math></inline-formula>C after batch calibration and a low-cost 1-point trim, which increases to ±2350-ppm after accelerated aging. Due to the extensive use of dynamic error-correction techniques, the FLL also achieves a state-of-the-art Allan deviation floor of 0.4 ppm.","PeriodicalId":13032,"journal":{"name":"IEEE Solid-State Circuits Letters","volume":"9 ","pages":"81-84"},"PeriodicalIF":2.0,"publicationDate":"2026-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A 70-GHz Bandwidth Amplifier With Integrated Differential Bridged T-coil Peaking and Uniform Group Delay 一种集成差分桥接t圈峰值和均匀群延迟的70 ghz带宽放大器
IF 2 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-02-06 DOI: 10.1109/LSSC.2026.3661366
Giovanni Scarlato;John R. Long
A two-stage amplifier in 22-nm FD-SOI CMOS integrates a fully-differential bridged T-coil for the first time. Circuit performance is benchmarked against an identical amplifier topology designed with single-ended T-coils (pseudo-differential) and an unpeaked reference. It realizes 70-GHz bandwidth with $12~pm ~2$ -ps group delay and >10-dB return loss across 90 GHz. Bandwidth is 2.2x greater than the unpeaked reference circuit and is 58% larger than a pseudo-differential equivalent. The 54 x $56~mu $ m2 differential T-coil occupies one-third the area of two single-ended coils.
22nm FD-SOI CMOS两级放大器首次集成了全差分桥接t型线圈。电路性能的基准测试是针对一个相同的放大器拓扑设计的单端t型线圈(伪差分)和一个非峰值参考。它实现了70 GHz的带宽,组延迟为$12~pm ~2$ -ps, 90 GHz时回波损耗为$ 10 ~ db。带宽比非峰值参考电路大2.2倍,比伪微分等效电路大58%。54 × $56~mu $ m2差动t型线圈的面积为两个单端线圈的三分之一。
{"title":"A 70-GHz Bandwidth Amplifier With Integrated Differential Bridged T-coil Peaking and Uniform Group Delay","authors":"Giovanni Scarlato;John R. Long","doi":"10.1109/LSSC.2026.3661366","DOIUrl":"https://doi.org/10.1109/LSSC.2026.3661366","url":null,"abstract":"A two-stage amplifier in 22-nm FD-SOI CMOS integrates a fully-differential bridged T-coil for the first time. Circuit performance is benchmarked against an identical amplifier topology designed with single-ended T-coils (pseudo-differential) and an unpeaked reference. It realizes 70-GHz bandwidth with <inline-formula> <tex-math>$12~pm ~2$ </tex-math></inline-formula>-ps group delay and >10-dB return loss across 90 GHz. Bandwidth is 2.2x greater than the unpeaked reference circuit and is 58% larger than a pseudo-differential equivalent. The 54 x <inline-formula> <tex-math>$56~mu $ </tex-math></inline-formula>m2 differential T-coil occupies one-third the area of two single-ended coils.","PeriodicalId":13032,"journal":{"name":"IEEE Solid-State Circuits Letters","volume":"9 ","pages":"65-68"},"PeriodicalIF":2.0,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Folded-Differential Switched-Capacitor SRAM CIM Macro With Scalable MAC Sizes for TinyML Inference 用于TinyML推理的可扩展MAC大小的折叠差分开关电容SRAM CIM宏
IF 2 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-02-06 DOI: 10.1109/LSSC.2026.3662477
Zhonghao Chen;Ling-An Cheong;Tianyi Yu;Yiming Chen;Guodong Yin;Teng Yi;Yongpan Liu;Huazhong Yang;Xueqing Li
This letter presents a switched-capacitor SRAM compute-in-memory macro optimized for TinyML inference. Key features include: 1) an area-efficient folded-differential multiply-and-accumulate (FD-MAC) scheme to double the signal margin; 2) a closed-loop floating-inverter amplifier (FIA)-based charge accumulation technique for signal-to-noise ratio enhancement and multiply-and-accumulate (MAC) voltage integration; and 3) a sparsity-aware multistep MAC method to reduce A/D conversions and improve utilization. Fabricated in a 28-nm process, the 32-kb prototype achieves 68.7 TOPS/W energy efficiency and 1.74 TOPS/mm2 area efficiency in 8-bit mode.
这封信提出了一个开关电容SRAM计算内存宏优化的TinyML推理。主要特点包括:1)面积效率高的折叠差分乘法累加(FD-MAC)方案,使信号裕度加倍;2)基于闭环浮动逆变器放大器(FIA)的电荷积累技术,实现信噪比增强和电压积乘;3)稀疏感知多步MAC方法,以减少a /D转换并提高利用率。采用28纳米工艺制造的32kb原型机在8位模式下实现了68.7 TOPS/W的能量效率和1.74 TOPS/mm2的面积效率。
{"title":"A Folded-Differential Switched-Capacitor SRAM CIM Macro With Scalable MAC Sizes for TinyML Inference","authors":"Zhonghao Chen;Ling-An Cheong;Tianyi Yu;Yiming Chen;Guodong Yin;Teng Yi;Yongpan Liu;Huazhong Yang;Xueqing Li","doi":"10.1109/LSSC.2026.3662477","DOIUrl":"https://doi.org/10.1109/LSSC.2026.3662477","url":null,"abstract":"This letter presents a switched-capacitor SRAM compute-in-memory macro optimized for TinyML inference. Key features include: 1) an area-efficient folded-differential multiply-and-accumulate (FD-MAC) scheme to double the signal margin; 2) a closed-loop floating-inverter amplifier (FIA)-based charge accumulation technique for signal-to-noise ratio enhancement and multiply-and-accumulate (MAC) voltage integration; and 3) a sparsity-aware multistep MAC method to reduce A/D conversions and improve utilization. Fabricated in a 28-nm process, the 32-kb prototype achieves 68.7 TOPS/W energy efficiency and 1.74 TOPS/mm2 area efficiency in 8-bit mode.","PeriodicalId":13032,"journal":{"name":"IEEE Solid-State Circuits Letters","volume":"9 ","pages":"73-76"},"PeriodicalIF":2.0,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A 3-D HBI Compliant 1.536 TB/s/mm2 Bandwidth Scalable Attention Accelerator With 22.5-GOPS Throughput High Speed SoftMax for Quantized Transformers in Intel 3 3- d HBI兼容1.536 TB/s/mm2带宽可扩展注意力加速器与22.5 gops吞吐量高速SoftMax量化变压器在Intel 3
IF 2 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-29 DOI: 10.1109/LSSC.2026.3659575
Prerna Budhkar;Mirco Sciulli;Srivatsa Rangachar Srinivasa;Gauthaman Murali;Ragh Kuttappa;Paolo Aseron;Trang Nguyen;Vinayak Honkote;Tanay Karnik
This letter presents a novel hardware accelerator compatible with <3- $mu $ m pitch 3-D Cu-Cu hybrid bonding interconnect (HBI) technology, particularly designed to efficiently execute multihead attention (MHA) of encoder transformer models. We present an accelerator that addresses performance losses due to low precision models by incorporating specialized hardware optimizations for quantizer and SoftMax engines. The proposed design features extremely wide SRAM/logic bandwidth for generic matrix multiplication (GEMM) parallelism, including on-the-fly transpose logic and high speed 2-Pass SoftMax, delivering 22.5-GOPS throughput. The Intel 3 prototype with a 3-D footprint of 1.2 mm2 achieves 25 668 Attention/s with no accuracy loss while running I-BERT.
本文介绍了一种新型硬件加速器,兼容$mu $ m螺距3-D Cu-Cu混合键合互连(HBI)技术,特别设计用于有效执行编码器变压器模型的多头注意(MHA)。我们提出了一种加速器,通过结合量化器和SoftMax引擎的专门硬件优化,解决了由于低精度模型造成的性能损失。提出的设计具有非常宽的SRAM/逻辑带宽,用于通用矩阵乘法(GEMM)并行性,包括动态转置逻辑和高速2-Pass SoftMax,提供22.5 gops吞吐量。在运行I-BERT时,3- d尺寸为1.2 mm2的英特尔3原型机达到了25668注意力/秒,没有精度损失。
{"title":"A 3-D HBI Compliant 1.536 TB/s/mm2 Bandwidth Scalable Attention Accelerator With 22.5-GOPS Throughput High Speed SoftMax for Quantized Transformers in Intel 3","authors":"Prerna Budhkar;Mirco Sciulli;Srivatsa Rangachar Srinivasa;Gauthaman Murali;Ragh Kuttappa;Paolo Aseron;Trang Nguyen;Vinayak Honkote;Tanay Karnik","doi":"10.1109/LSSC.2026.3659575","DOIUrl":"https://doi.org/10.1109/LSSC.2026.3659575","url":null,"abstract":"This letter presents a novel hardware accelerator compatible with <3-<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>m pitch 3-D Cu-Cu hybrid bonding interconnect (HBI) technology, particularly designed to efficiently execute multihead attention (MHA) of encoder transformer models. We present an accelerator that addresses performance losses due to low precision models by incorporating specialized hardware optimizations for quantizer and SoftMax engines. The proposed design features extremely wide SRAM/logic bandwidth for generic matrix multiplication (GEMM) parallelism, including on-the-fly transpose logic and high speed 2-Pass SoftMax, delivering 22.5-GOPS throughput. The Intel 3 prototype with a 3-D footprint of 1.2 mm2 achieves 25 668 Attention/s with no accuracy loss while running I-BERT.","PeriodicalId":13032,"journal":{"name":"IEEE Solid-State Circuits Letters","volume":"9 ","pages":"69-72"},"PeriodicalIF":2.0,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A 39.4-mW 300 MHz-BW 70.9 dB-SNDR Hybrid ADC With Resistive Input and 200 fs, rms-Jitter Tolerance 一个39.4 mw 300 MHz-BW 70.9 dB-SNDR混合ADC,具有电阻输入和200 fs, rms-抖动容限
IF 2 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-20 DOI: 10.1109/LSSC.2026.3656180
Yanquan Luo;Mingtao Zhan;Yi Zhong;Nan Sun
This letter presents a power-efficient hybrid ADC architecture: a low-resolution continuous-time (CT) delta-sigma modulator (DSM) followed by a time-interleaved pipeline stage which further quantizes the quantization noise of the DSM. In the frontend CT DSM, the resistive input makes the ADC easy-to-drive, and the direct-charge-dump feedback (DCD FB) provides a high jitter-immunity; the quantization of the backend is mainly performed by SAR ADCs, providing a high power efficiency. Capacitor flipping is proposed in the frontend to implement an intrinsically linear 1.5b DCD FB. Nested time-interleaving is proposed in the backend in order to assign the major quantization work to SAR ADCs. Primary–secondary sampling with improved timing is utilized to eliminate timing skew issue while gain more available sampling time and relax backend noise requirement. The ADC is fabricated in 28-nm CMOS process and achieves 70.9-dB SNDR in 300-MHz BW with 39.4-mW power consumption, yielding 169.7-dB Schreier FoM, and the band-edge performance is preserved up to 200 fs, rms clock jitter.
本文介绍了一种低功耗混合ADC架构:低分辨率连续时间(CT) delta-sigma调制器(DSM),然后是时间交错管道级,进一步量化DSM的量化噪声。在前端CT DSM中,电阻输入使ADC易于驱动,直接电荷转储反馈(DCD FB)提供了高抗抖动性;后端量化主要由SAR adc完成,具有较高的功率效率。在前端提出了电容翻转来实现本质线性的15 b DCD FB。为了将主要的量化工作分配给SAR adc,在后端提出了嵌套时间交错。利用改进时序的主次采样消除了时序倾斜问题,同时获得了更多的可用采样时间,降低了后端噪声要求。该ADC采用28纳米CMOS工艺制作,在300 mhz的BW下实现70.9 db的SNDR,功耗为39.4 mw,产生169.7 db的Schreier FoM,并且在200fs, rms的时钟抖动下保持带边性能。
{"title":"A 39.4-mW 300 MHz-BW 70.9 dB-SNDR Hybrid ADC With Resistive Input and 200 fs, rms-Jitter Tolerance","authors":"Yanquan Luo;Mingtao Zhan;Yi Zhong;Nan Sun","doi":"10.1109/LSSC.2026.3656180","DOIUrl":"https://doi.org/10.1109/LSSC.2026.3656180","url":null,"abstract":"This letter presents a power-efficient hybrid ADC architecture: a low-resolution continuous-time (CT) delta-sigma modulator (DSM) followed by a time-interleaved pipeline stage which further quantizes the quantization noise of the DSM. In the frontend CT DSM, the resistive input makes the ADC easy-to-drive, and the direct-charge-dump feedback (DCD FB) provides a high jitter-immunity; the quantization of the backend is mainly performed by SAR ADCs, providing a high power efficiency. Capacitor flipping is proposed in the frontend to implement an intrinsically linear 1.5b DCD FB. Nested time-interleaving is proposed in the backend in order to assign the major quantization work to SAR ADCs. Primary–secondary sampling with improved timing is utilized to eliminate timing skew issue while gain more available sampling time and relax backend noise requirement. The ADC is fabricated in 28-nm CMOS process and achieves 70.9-dB SNDR in 300-MHz BW with 39.4-mW power consumption, yielding 169.7-dB Schreier FoM, and the band-edge performance is preserved up to 200 fs, rms clock jitter.","PeriodicalId":13032,"journal":{"name":"IEEE Solid-State Circuits Letters","volume":"9 ","pages":"61-64"},"PeriodicalIF":2.0,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146082230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On-Chip Charge-Trap-Transistor-Based Mismatch Calibration of an 8-Bit Thermometer Current-Source DAC 基于片上电荷阱晶体管的8位温度计电流源DAC失配校准
IF 2 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-20 DOI: 10.1109/LSSC.2026.3656261
Mohammadreza Zeinali;Sudhakar Pamarti
This letter presents an on-chip mismatch calibration technique for current-source digital-to-analog converters (DACs) using charge-trap transistors (CTTs) in 22-nm FDSOI technology. The proposed method exploits programmable threshold voltage (VTH) shifts in CTTs to locally tune the current of near-minimum-sized devices without external trimming. A compact 8-bit thermometer DAC is implemented to demonstrate the concept. The on-chip calibration loop iteratively measures and programs each CTT using short high-voltage pulses until the CTT current matches a reference, achieving device-level current uniformity. Measurement results show an $8times $ reduction in current-source mismatch and linearity improvements to 0.1/0.5 LSB DNL/INL. The proposed approach provides a scalable, low-cost, and nonvolatile solution for analog calibration in deeply scaled CMOS technologies.
本文介绍了一种采用22nm FDSOI技术的电荷阱晶体管(ctt)的电流源数模转换器(dac)的片上失配校准技术。所提出的方法利用可编程阈值电压(VTH)移位在ctt局部调谐电流接近最小尺寸的器件,而不需要外部修整。实现了一个紧凑的8位温度计DAC来演示该概念。片上校准回路使用短高压脉冲迭代测量和编程每个CTT,直到CTT电流与参考电流匹配,实现器件级电流均匀性。测量结果表明,电流源失配降低了8倍,线性度提高到0.1/0.5 LSB DNL/INL。该方法为深度缩放CMOS技术中的模拟校准提供了一种可扩展、低成本和非易失性的解决方案。
{"title":"On-Chip Charge-Trap-Transistor-Based Mismatch Calibration of an 8-Bit Thermometer Current-Source DAC","authors":"Mohammadreza Zeinali;Sudhakar Pamarti","doi":"10.1109/LSSC.2026.3656261","DOIUrl":"https://doi.org/10.1109/LSSC.2026.3656261","url":null,"abstract":"This letter presents an on-chip mismatch calibration technique for current-source digital-to-analog converters (DACs) using charge-trap transistors (CTTs) in 22-nm FDSOI technology. The proposed method exploits programmable threshold voltage (VTH) shifts in CTTs to locally tune the current of near-minimum-sized devices without external trimming. A compact 8-bit thermometer DAC is implemented to demonstrate the concept. The on-chip calibration loop iteratively measures and programs each CTT using short high-voltage pulses until the CTT current matches a reference, achieving device-level current uniformity. Measurement results show an <inline-formula> <tex-math>$8times $ </tex-math></inline-formula> reduction in current-source mismatch and linearity improvements to 0.1/0.5 LSB DNL/INL. The proposed approach provides a scalable, low-cost, and nonvolatile solution for analog calibration in deeply scaled CMOS technologies.","PeriodicalId":13032,"journal":{"name":"IEEE Solid-State Circuits Letters","volume":"9 ","pages":"53-56"},"PeriodicalIF":2.0,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146082193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Ring-Oscillator-Based Digital Harmonic-Mixing Fractional-N PLL 基于环振的数字混频分数n锁相环
IF 2 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-14 DOI: 10.1109/LSSC.2026.3654239
Hongyu Lu;Nader Fathy;Patrick P. Mercier
This letter presents a low-jitter digital harmonic-mixing fractional- $N$ phase-locked loop (PLL) using a ring oscillator. To extend the loop bandwidth, a mixer with unity gain in the phase domain is adopted, which helps suppress phase noise of the phase detector and delta-sigma modulator. Furthermore, to reduce mixing harmonics that would otherwise dominate the in-band jitter, the sinusoidal reference is buffered by a linear source follower, in contrast to the inverters used in other LC-oscillator-based harmonic-mixing PLLs. Implemented in 65 nm CMOS, the proposed PLL achieves 603.8 fs root-mean-square jitter with a 100 MHz integration bandwidth. It occupies $0.12~text {mm}^{2}$ of silicon area and consumes 12.72 mW of power.
本文介绍了一种使用环形振荡器的低抖动数字混频分数阶锁相环(PLL)。为了扩大环路带宽,在相位域采用了单位增益的混频器,有助于抑制鉴相器和δ - σ调制器的相位噪声。此外,为了减少混合谐波,否则会主导带内抖动,正弦参考由线性源跟随器缓冲,与其他基于lc振荡器的谐波混合锁相环中使用的逆变器形成对比。该锁相环采用65nm CMOS实现,具有603.8 fs的均方根抖动和100mhz的集成带宽。它占用$0.12~text {mm}^{2}$的硅面积,消耗12.72 mW的功率。
{"title":"A Ring-Oscillator-Based Digital Harmonic-Mixing Fractional-N PLL","authors":"Hongyu Lu;Nader Fathy;Patrick P. Mercier","doi":"10.1109/LSSC.2026.3654239","DOIUrl":"https://doi.org/10.1109/LSSC.2026.3654239","url":null,"abstract":"This letter presents a low-jitter digital harmonic-mixing fractional-<inline-formula> <tex-math>$N$ </tex-math></inline-formula> phase-locked loop (PLL) using a ring oscillator. To extend the loop bandwidth, a mixer with unity gain in the phase domain is adopted, which helps suppress phase noise of the phase detector and delta-sigma modulator. Furthermore, to reduce mixing harmonics that would otherwise dominate the in-band jitter, the sinusoidal reference is buffered by a linear source follower, in contrast to the inverters used in other LC-oscillator-based harmonic-mixing PLLs. Implemented in 65 nm CMOS, the proposed PLL achieves 603.8 fs root-mean-square jitter with a 100 MHz integration bandwidth. It occupies <inline-formula> <tex-math>$0.12~text {mm}^{2}$ </tex-math></inline-formula> of silicon area and consumes 12.72 mW of power.","PeriodicalId":13032,"journal":{"name":"IEEE Solid-State Circuits Letters","volume":"9 ","pages":"57-60"},"PeriodicalIF":2.0,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146082231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Solid-State Circuits Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1