2021 IEEE International Solid- State Circuits Conference (ISSCC)最新文献_第5页

A 60GHz 186.5dBc/Hz FoM Quad-Core Fundamental VCO Using Circular Triple-Coupled Transformer with No Mode Ambiguity in 65nm CMOS 基于环形三耦变压器的60GHz 186.5dBc/Hz四核FoM基频压控振荡器

2021 IEEE International Solid- State Circuits Conference (ISSCC)

Pub Date : 2021-02-13 DOI: 10.1109/ISSCC42613.2021.9366036

Haikun Jia, W. Deng, Pingda Guan, Zhihua Wang, B. Chi

The recent development of the 5th-generation (5G) communication sytems has set increasingly strict requirements on the spectral purity of millimeter-wave (mm-wave) local oscillators (LO). Low phase noise is crucial to enable advanced modulation formats for high communication data-rates. Much effort has been made to improve the phase noise performance of the mm-wave LOs. A lower frequency voltage-controlled oscillator (VCO) together with a frequency multiplier can lower the phase noise [1]; however, the high-order harmonic components in VCOs are usually very weak, which requires additional power-consuming mm-wave amplification stages to satisfy the LO swing requirement. For single mm-wave fundamental VCOs, the minimal achievable phase noise is bounded by the smallest realizable inductor that displays a high Q factor. To avoid the “small inductor” problem, N oscillators with relatively large inductors can be coupled together to improve the phase noise by $10log _{10}(mathrm{N})[2 -5]$. Authors in [2] presented a quad-core bipolar VCO working around 15GHz as shown in Fig. 20.3.1 (Left), where four one-turn inductors are star-connected with the active cores placed in the middle. Resistors (Rc) are added to avoid undesired multi-tone concurrent oscillations. However, the four one-turn inductors still suffer from the Q-factor drop when the inductance decreases, thus limiting the highest achievable oscillation frequency. Besides, $mathrm{V}_{DD}$ at the inductor central taps and $mathrm{V}_{SS}$ at the tail current source are far from each other, making the $mathrm{V}_{DD}- mathrm{V}_{SS}$ current return path long. This path has to be carefully modeled in simulations, especially in the mm-wave frequency range, where the return path inductance is comparable to the tank inductance. Instead of the star-connected topology, authors in [3] presented a circular-connected quad-core VCO working close to 30GHz, where the inductors are arranged in a circular topology as shown in Fig. 20.3.1 (Middle). The destructive coupling between the inner edges inside a small inductor is eliminated. Therefore, the minimal realizable inductance is further reduced while keeping a high Q factor. The central taps are connected by narrow metal traces to avoid latching and mode ambiguity. The VCO adopts a CMOS configuration, which limits the highest operating frequency. It would be difficult for this topology to be adopted in NMOS-only VCOs because the central taps have to be resistively isolated to suppress unwanted modes; therefore, they cannot be connected to the AC-ground power supply simultaneously as required by the NMOS-only configuration. Due to the lack of harmonic impedance control in the circular inductors, extra tail filtering transformers are added to improve the phase noise.

随着第五代通信系统(5G)的发展，对毫米波本振(LO)的频谱纯度提出了越来越严格的要求。低相位噪声对于实现高通信数据速率的高级调制格式至关重要。为了提高毫米波LOs的相位噪声性能，人们做了大量的工作。低频压控振荡器(VCO)和倍频器可以降低相位噪声[1];然而，压控振荡器中的高次谐波分量通常非常微弱，这需要额外的功耗毫米波放大级来满足本振摆幅的要求。对于单毫米波基态压控振荡器，最小可实现相位噪声由显示高Q因子的最小可实现电感器限定。为了避免“小电感”问题，可以将N个具有较大电感的振荡器耦合在一起，从而将相位噪声提高$10log _{10}(mathrm{N})[2 -5]$。[2]中的作者提出了一种工作在15GHz左右的四核双极压控振荡器，如图20.3.1(左)所示，其中四个单匝电感星形连接，有源磁芯位于中间。电阻(Rc)的增加，以避免不必要的多音并发振荡。然而，当电感减小时，4个单匝电感仍然受到q因子下降的影响，从而限制了可达到的最高振荡频率。此外，电感器中心抽头处的$mathrm{V}_{DD}$与尾电流源处的$mathrm{V}_{SS}$相距较远，使得$mathrm{V}_{DD}- mathrm{V}_{SS}$电流返回路径较长。这条路径必须在模拟中仔细建模，特别是在毫米波频率范围内，其中返回路径电感与坦克电感相当。与星形连接拓扑不同，[3]中的作者提出了一种工作在30GHz附近的圆连接四核VCO，其中电感器排列成圆形拓扑，如图20.3.1(中)所示。消除了小型电感器内部边缘之间的破坏性耦合。因此，在保持高Q因数的同时，可实现的最小电感进一步减小。中央抽头由狭窄的金属线连接，以避免闭锁和模式模糊。VCO采用CMOS配置，限制最高工作频率。这种拓扑结构很难用于仅nmos的vco，因为中心抽头必须电阻隔离以抑制不需要的模式;因此，不能同时接入交流接地电源，这是nmos专用配置的要求。由于环形电感缺乏谐波阻抗控制，增加了额外的尾滤波变压器以改善相位噪声。

{"title":"A 60GHz 186.5dBc/Hz FoM Quad-Core Fundamental VCO Using Circular Triple-Coupled Transformer with No Mode Ambiguity in 65nm CMOS","authors":"Haikun Jia, W. Deng, Pingda Guan, Zhihua Wang, B. Chi","doi":"10.1109/ISSCC42613.2021.9366036","DOIUrl":"https://doi.org/10.1109/ISSCC42613.2021.9366036","url":null,"abstract":"The recent development of the 5th-generation (5G) communication sytems has set increasingly strict requirements on the spectral purity of millimeter-wave (mm-wave) local oscillators (LO). Low phase noise is crucial to enable advanced modulation formats for high communication data-rates. Much effort has been made to improve the phase noise performance of the mm-wave LOs. A lower frequency voltage-controlled oscillator (VCO) together with a frequency multiplier can lower the phase noise [1]; however, the high-order harmonic components in VCOs are usually very weak, which requires additional power-consuming mm-wave amplification stages to satisfy the LO swing requirement. For single mm-wave fundamental VCOs, the minimal achievable phase noise is bounded by the smallest realizable inductor that displays a high Q factor. To avoid the “small inductor” problem, N oscillators with relatively large inductors can be coupled together to improve the phase noise by $10log _{10}(mathrm{N})[2 -5]$. Authors in [2] presented a quad-core bipolar VCO working around 15GHz as shown in Fig. 20.3.1 (Left), where four one-turn inductors are star-connected with the active cores placed in the middle. Resistors (Rc) are added to avoid undesired multi-tone concurrent oscillations. However, the four one-turn inductors still suffer from the Q-factor drop when the inductance decreases, thus limiting the highest achievable oscillation frequency. Besides, $mathrm{V}_{DD}$ at the inductor central taps and $mathrm{V}_{SS}$ at the tail current source are far from each other, making the $mathrm{V}_{DD}- mathrm{V}_{SS}$ current return path long. This path has to be carefully modeled in simulations, especially in the mm-wave frequency range, where the return path inductance is comparable to the tank inductance. Instead of the star-connected topology, authors in [3] presented a circular-connected quad-core VCO working close to 30GHz, where the inductors are arranged in a circular topology as shown in Fig. 20.3.1 (Middle). The destructive coupling between the inner edges inside a small inductor is eliminated. Therefore, the minimal realizable inductance is further reduced while keeping a high Q factor. The central taps are connected by narrow metal traces to avoid latching and mode ambiguity. The VCO adopts a CMOS configuration, which limits the highest operating frequency. It would be difficult for this topology to be adopted in NMOS-only VCOs because the central taps have to be resistively isolated to suppress unwanted modes; therefore, they cannot be connected to the AC-ground power supply simultaneously as required by the NMOS-only configuration. Due to the lack of harmonic impedance control in the circular inductors, extra tail filtering transformers are added to improve the phase noise.","PeriodicalId":371093,"journal":{"name":"2021 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131704048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

A Distortion-Free VCO-Based Sensor-to-Digital Front-End Achieving 178.9dB FoM and 128dB SFDR with a Calibration-Free Differential Pulse-Code Modulation Technique 基于无失真vco的传感器-数字前端，采用免校准差分脉冲编码调制技术实现178.9dB FoM和128dB SFDR

2021 IEEE International Solid- State Circuits Conference (ISSCC)

Pub Date : 2021-02-13 DOI: 10.1109/ISSCC42613.2021.9365950

Jiannan Huang, P. Mercier

Motion and stimulation artifacts encountered in wearable sensors present difficult dynamic range (DR) and linearity challenges: AFEs need to be able to resolve $mu mathrm{V} -$ level signals in the presence of artifacts up to 100s of mV in amplitude while maintaining linearity without saturation, such that the signal of interest can be readily recovered during post-processing. Since it is not possible to build an amplifier with appreciable gain and linearity for $ gt 100$ mV inputs under $ lt 1mathrm{V}$ SoC-compatible supply, most high-DR AFEs instead incorporate an LNA into $mathrm{a}Delta sum -$ based ADC-direct architecture [1] –[3]. However, as many emerging wearable devices desire single-chip integration in scaled CMOS for size and digital performance considerations, conventional $Delta sum$ Ms, which rely on voltage-domain building blocks, suffer from reduced intrinsic gain and headroom. Instead, time-domain quantization through VCO-based AFEs benefits from scaled CMOS and offers intrinsic $1 ^{st} -$ order noise shaping. However, the non-linear V-F conversion of conventional VCO-based AFEs makes achieving a large and linear DR difficult [1]. To address this, [3] adopts a differential pulse code modulation (DPCM) technique that enables the VCO to process only a small prediction error, VERR, by subtracting from $V_{IN}mathrm{a}$ digital predictor value fed through a DAC (Fig. 28.1.1 top). Maximal linearity would be achieved if the predictor was perfect, resulting in $V_{ERR},approx 0$; however, this requires a highperformance and power-expensive DAC. Therefore, [3] truncates the predictor’s output, reducing the DAC requirements to 9b, but adding truncation error, ET. If the gain of paths P1 and P2 are made equal, which is enforced in [3] via a gain error calibration (GEC) circuit, ET will ideally cancel at the output. However, it is not possible to achieve perfect ET cancellation, and any residual ET will degrade SQNR, limiting the extent to which truncation can be used to relax the DAC’s resolution. In addition, GEC itself introduces power overhead.

在可穿戴传感器中遇到的运动和刺激伪影存在困难的动态范围(DR)和线性挑战:AFEs需要能够在伪影存在的情况下解析$mu mathrm{V} -$电平信号，同时保持无饱和的线性，以便在后处理过程中很容易恢复感兴趣的信号。由于不可能在$ lt 1mathrm{V}$ soc兼容电源下为$ gt 100$ mV输入构建具有可观增益和线性度的放大器，因此大多数高dr afe将LNA纳入基于$mathrm{a}Delta sum -$的ADC-direct架构[1]-[3]。然而，由于许多新兴的可穿戴设备出于尺寸和数字性能的考虑，希望将单芯片集成到缩放的CMOS中，传统的$Delta sum$ m依赖于电压域构建模块，因此固有增益和净空空间降低。相反，通过基于vco的afe进行时域量化受益于缩放CMOS，并提供固有的$1 ^{st} -$阶噪声整形。然而，传统的基于vco的AFEs的非线性V-F转换使得实现大的线性DR变得困难[1]。为了解决这个问题，[3]采用差分脉冲编码调制(DPCM)技术，通过减去通过DAC馈送的$V_{IN}mathrm{a}$数字预测值(图28.1.1顶部)，使VCO仅处理很小的预测误差VERR。如果预测器是完美的，将实现最大的线性，结果为$V_{ERR},approx 0$;然而，这需要高性能且功耗昂贵的DAC。因此，[3]截断了预测器的输出，将DAC要求降低到9b，但增加了截断误差ET。如果路径P1和P2的增益相等，这在[3]中通过增益误差校准(GEC)电路强制执行，ET将在输出处理想地抵消。然而，不可能实现完美的ET抵消，任何残留的ET都会降低SQNR，限制了截断可以用来放松DAC分辨率的程度。此外，GEC本身还引入了功率开销。

{"title":"A Distortion-Free VCO-Based Sensor-to-Digital Front-End Achieving 178.9dB FoM and 128dB SFDR with a Calibration-Free Differential Pulse-Code Modulation Technique","authors":"Jiannan Huang, P. Mercier","doi":"10.1109/ISSCC42613.2021.9365950","DOIUrl":"https://doi.org/10.1109/ISSCC42613.2021.9365950","url":null,"abstract":"Motion and stimulation artifacts encountered in wearable sensors present difficult dynamic range (DR) and linearity challenges: AFEs need to be able to resolve $mu mathrm{V} -$ level signals in the presence of artifacts up to 100s of mV in amplitude while maintaining linearity without saturation, such that the signal of interest can be readily recovered during post-processing. Since it is not possible to build an amplifier with appreciable gain and linearity for $ gt 100$ mV inputs under $ lt 1mathrm{V}$ SoC-compatible supply, most high-DR AFEs instead incorporate an LNA into $mathrm{a}Delta sum -$ based ADC-direct architecture [1] –[3]. However, as many emerging wearable devices desire single-chip integration in scaled CMOS for size and digital performance considerations, conventional $Delta sum$ Ms, which rely on voltage-domain building blocks, suffer from reduced intrinsic gain and headroom. Instead, time-domain quantization through VCO-based AFEs benefits from scaled CMOS and offers intrinsic $1 ^{st} -$ order noise shaping. However, the non-linear V-F conversion of conventional VCO-based AFEs makes achieving a large and linear DR difficult [1]. To address this, [3] adopts a differential pulse code modulation (DPCM) technique that enables the VCO to process only a small prediction error, VERR, by subtracting from $V_{IN}mathrm{a}$ digital predictor value fed through a DAC (Fig. 28.1.1 top). Maximal linearity would be achieved if the predictor was perfect, resulting in $V_{ERR},approx 0$; however, this requires a highperformance and power-expensive DAC. Therefore, [3] truncates the predictor’s output, reducing the DAC requirements to 9b, but adding truncation error, ET. If the gain of paths P1 and P2 are made equal, which is enforced in [3] via a gain error calibration (GEC) circuit, ET will ideally cancel at the output. However, it is not possible to achieve perfect ET cancellation, and any residual ET will degrade SQNR, limiting the extent to which truncation can be used to relax the DAC’s resolution. In addition, GEC itself introduces power overhead.","PeriodicalId":371093,"journal":{"name":"2021 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123076140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

28.8 Multi-Modal Peripheral Nerve Active Probe and Microstimulator with On-Chip Dual-Coil Power/Data Transmission and 64 2nd-Order Opamp-Less ΔΣ ADCs 28.8带有片上双线圈电源/数据传输和64个二阶Opamp-Less ΔΣ adc的多模态外周神经主动探针和微刺激器

2021 IEEE International Solid- State Circuits Conference (ISSCC)

Pub Date : 2021-02-13 DOI: 10.1109/ISSCC42613.2021.9365856

Maged ElAnsary, Jianxiong Xu, J. S. Filho, Gairik Dutta, L. Long, Aly Shoukry, Camilo Tejeiro, Chenxi Tang, Enver G. Kilinc, Jaimin Joshi, P. Sabetian, Samantha Unger, J. Zariffa, P. Yoo, R. Genov

The peripheral nervous system (PNS) enables communication between the central nervous system and various organs, for example by conveying sensory information and relaying motor commands. Electrical stimulation of peripheral nerves has been shown effective in treating major intractable disorders ranging from autoimmune disorder to chronic pain. It acts on specific nerves and avoids significant side effects of most drugs. Closed-loop PNS neurostimulators offer the additional benefits of personalization and optimality of the treatment. Such medical devices infer physiological function from measurable nerve action potentials and deliver custom-tailored electrical stimulation to elicit desired clinical outcomes.

外周神经系统(PNS)能够在中枢神经系统和各种器官之间进行交流，例如通过传递感觉信息和传递运动指令。电刺激周围神经已被证明有效治疗主要顽固性疾病，从自身免疫性疾病到慢性疼痛。它作用于特定的神经，避免了大多数药物的显著副作用。闭环PNS神经刺激器提供个性化和最佳治疗的额外好处。这种医疗设备从可测量的神经动作电位推断生理功能，并提供定制的电刺激，以获得期望的临床结果。

引用次数: 5

A 5.0-to-6.36GHz Wideband-Harmonic-Shaping VCO Achieving 196.9dBc/Hz Peak FoM and 90-to-180kHz 1/f3 PN Corner Without Harmonic Tuning 一种5.0 ~ 6.36 ghz宽带谐波整形压控振荡器，实现196.9dBc/Hz峰值波形和90 ~ 180khz 1/f3 PN角

2021 IEEE International Solid- State Circuits Conference (ISSCC)

Pub Date : 2021-02-13 DOI: 10.1109/ISSCC42613.2021.9365761

Hao Guo, Yong Chen, Pui-in Mak, R. Martins

Since 2001 [1], LC VCOs have been demonstrating significant improvements of figureof-merit (FoM) and 1/f phase noise (PN) corner [2–5] by exploring common-mode (CM) resonance at twice the oscillation frequency $left(2 F_{0 s c}right)$. In addition, for area reduction, the shaping of the impulse sensitivity function (ISF) has evolved from explicit with two coils [1] to implicit with one coil [2]. Yet, as depicted in Fig. 20.1.1, the latter suffers from large CM magnetic-flux cancellation, resulting in a much lower CM impedance $left|Z_{c M}right|$ that is $sim 0.64$ of its differential-mode (DM) impedance $left|Z_{D M}right|$. The VCO in [3] achieves a high FoM $_{circledast 10 mathrm{M}+mathrm{z}}$ up to $191.4 mathrm{dBc} / mathrm{Hz}$ by boosting $left|mathrm{Z}_{mathrm{CM}}right|$ at $2 mathrm{~F}_{0 mathrm{sc}}$ and $left|mathrm{Z}_{mathrm{DM}}right|$ at $3 mathrm{~F}_{{osc. }}$ Yet, to uphold an optimal performance over the tuning range (TR), the VCO in [3]still requires manual harmonic tuning for aligning the $1^{{st }}-t 0-2^{{nd }}$ and $1^{{st }-t 0-3^{{d }} { harmonic resonances. This }}$ denotes a narrowband effect. For the VCO in [4], which features a four-winding transformer with no harmonic tuning, there is a large variation of FoM $_{circledast 10 mathrm{MHz}}(190.7 mathrm{t} mathrm{t}$ $196.5 mathrm{dBc} / mathrm{Hz}$) and 1/f PN corner (60 to 600kHz) across the TR.

自2001年以来[1]，LC压控振荡器通过在两倍振荡频率$left(2 F_{0 s c}right)$下探索共模(CM)共振，已经证明了性能因数(FoM)和1/f相位噪声(PN)角[2-5]的显著改进。此外，为了减小面积，脉冲灵敏度函数(ISF)的形状也从两个线圈的显式形状[1]演变为一个线圈的隐式形状[2]。然而，如图20.1.1所示，后者遭受较大的CM磁通抵消，导致CM阻抗$left|Z_{c M}right|$低得多，即其差模(DM)阻抗$left|Z_{D M}right|$的$sim 0.64$。[3]中的VCO通过在$2 mathrm{~F}_{0 mathrm{sc}}$和$3 mathrm{~F}_{{osc. }}$上分别提高$left|mathrm{Z}_{mathrm{CM}}right|$和$left|mathrm{Z}_{mathrm{DM}}right|$来实现高FoM $_{circledast 10 mathrm{M}+mathrm{z}}$至$191.4 mathrm{dBc} / mathrm{Hz}$。然而，为了在调谐范围(TR)内保持最佳性能，[3]中的VCO仍然需要手动谐波调谐以对准$1^{{st }}-t 0-2^{{nd }}$和$1^{{st }-t 0-3^{{d }} { harmonic resonances. This }}$，这表示窄带效应。对于[4]中的VCO，其特点是没有谐波调谐的四绕组变压器，在整个TR中有很大的FoM ($_{circledast 10 mathrm{MHz}}(190.7 mathrm{t} mathrm{t}$$196.5 mathrm{dBc} / mathrm{Hz}$)和1/f PN角(60至600kHz)变化。

{"title":"A 5.0-to-6.36GHz Wideband-Harmonic-Shaping VCO Achieving 196.9dBc/Hz Peak FoM and 90-to-180kHz 1/f3 PN Corner Without Harmonic Tuning","authors":"Hao Guo, Yong Chen, Pui-in Mak, R. Martins","doi":"10.1109/ISSCC42613.2021.9365761","DOIUrl":"https://doi.org/10.1109/ISSCC42613.2021.9365761","url":null,"abstract":"Since 2001 [1], LC VCOs have been demonstrating significant improvements of figureof-merit (FoM) and 1/f phase noise (PN) corner [2–5] by exploring common-mode (CM) resonance at twice the oscillation frequency $left(2 F_{0 s c}right)$. In addition, for area reduction, the shaping of the impulse sensitivity function (ISF) has evolved from explicit with two coils [1] to implicit with one coil [2]. Yet, as depicted in Fig. 20.1.1, the latter suffers from large CM magnetic-flux cancellation, resulting in a much lower CM impedance $left|Z_{c M}right|$ that is $sim 0.64$ of its differential-mode (DM) impedance $left|Z_{D M}right|$. The VCO in [3] achieves a high FoM $_{circledast 10 mathrm{M}+mathrm{z}}$ up to $191.4 mathrm{dBc} / mathrm{Hz}$ by boosting $left|mathrm{Z}_{mathrm{CM}}right|$ at $2 mathrm{~F}_{0 mathrm{sc}}$ and $left|mathrm{Z}_{mathrm{DM}}right|$ at $3 mathrm{~F}_{{osc. }}$ Yet, to uphold an optimal performance over the tuning range (TR), the VCO in [3]still requires manual harmonic tuning for aligning the $1^{{st }}-t 0-2^{{nd }}$ and $1^{{st }-t 0-3^{{d }} { harmonic resonances. This }}$ denotes a narrowband effect. For the VCO in [4], which features a four-winding transformer with no harmonic tuning, there is a large variation of FoM $_{circledast 10 mathrm{MHz}}(190.7 mathrm{t} mathrm{t}$ $196.5 mathrm{dBc} / mathrm{Hz}$) and 1/f PN corner (60 to 600kHz) across the TR.","PeriodicalId":371093,"journal":{"name":"2021 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"51 1-2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120930727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

7.8 A 1-inch 17Mpixel 1000fps Block-Controlled Coded-Exposure Back-Illuminated Stacked CMOS Image Sensor for Computational Imaging and Adaptive Dynamic Range Control 7.8用于计算成像和自适应动态范围控制的1英寸17Mpixel 1000fps块控制编码曝光背照堆叠CMOS图像传感器

2021 IEEE International Solid- State Circuits Conference (ISSCC)

Pub Date : 2021-02-13 DOI: 10.1109/ISSCC42613.2021.9365740

T. Hirata, Hironobu Murata, Hideaki Matsuda, Yojiro Tezuka, Shiro Tsunai

In recent developments, image sensors are no longer simply a means for collecting optical signals, but rather, are increasingly expected to serve as intelligent systems with surrounding configurations. Coded exposure (CE) is one of the methods applied in intelligent systems approaches, and various functions can be realized by the selection of the integration variable in the plenoptic function. High DR can be realized if the integration variable is time. A variety of means to achieve high DR have been proposed in the literature, for example, a method that provides a plurality of detection capacitors (LOFIC, [1]) or a method of preventing saturation by adding low-sensitivity pixels [2]. These methods often require an enlarged pixel size. Alternatively, high-speed readout like an array parallel stacked structure [3] is useful for integrating multiple frames to realize high DR. However, this leads to an increase in noise and needs faster readout to reduce motion artifacts. In order to mitigate the adverse effects, a method has been proposed in which a pixel array is divided into a plurality of blocks and the signal integration time of each block is individually controlled [4]. Another method was described in which CE was demonstrated by using pixel level control of the exposure time [5]. However, in these methods, it was necessary to arrange the readout path and control circuitry within the same plane because these are non-stacked sensors, so the pixel size was relatively large and high resolution was difficult to realize. To overcome the above problems, we report a sensor that can simultaneously achieve 4K×4K resolution and 1000fps high-speed readout. Using a stacked structure, we demonstrate coded exposure capability by individually controlling exposure time for each block of pixels.

在最近的发展中，图像传感器不再仅仅是一种收集光学信号的手段，而是越来越多地被期望作为具有周围配置的智能系统。编码暴露(code exposure, CE)是应用于智能系统研究的方法之一，通过选择全视函数中的积分变量可以实现多种功能。当积分变量为时间时，可以实现高DR。文献中提出了多种实现高DR的方法，例如，提供多个检测电容器的方法(LOFIC，[1])或通过添加低灵敏度像素来防止饱和的方法[2]。这些方法通常需要放大像素大小。或者，像阵列并行堆叠结构[3]这样的高速读出对于集成多帧以实现高dr是有用的。然而，这会导致噪声的增加，并且需要更快的读出来减少运动伪影。为了减轻不利影响，提出了一种将像素阵列划分为多个块并单独控制每个块的信号集成时间的方法[4]。另一种方法是通过像素级控制曝光时间来证明CE[5]。但这些方法由于是非堆叠传感器，需要将读出路径和控制电路布置在同一平面内，像素尺寸较大，难以实现高分辨率。为了克服上述问题，我们报告了一种可以同时实现4K×4K分辨率和1000fps高速读出的传感器。使用堆叠结构，我们通过单独控制每个像素块的曝光时间来演示编码曝光能力。

{"title":"7.8 A 1-inch 17Mpixel 1000fps Block-Controlled Coded-Exposure Back-Illuminated Stacked CMOS Image Sensor for Computational Imaging and Adaptive Dynamic Range Control","authors":"T. Hirata, Hironobu Murata, Hideaki Matsuda, Yojiro Tezuka, Shiro Tsunai","doi":"10.1109/ISSCC42613.2021.9365740","DOIUrl":"https://doi.org/10.1109/ISSCC42613.2021.9365740","url":null,"abstract":"In recent developments, image sensors are no longer simply a means for collecting optical signals, but rather, are increasingly expected to serve as intelligent systems with surrounding configurations. Coded exposure (CE) is one of the methods applied in intelligent systems approaches, and various functions can be realized by the selection of the integration variable in the plenoptic function. High DR can be realized if the integration variable is time. A variety of means to achieve high DR have been proposed in the literature, for example, a method that provides a plurality of detection capacitors (LOFIC, [1]) or a method of preventing saturation by adding low-sensitivity pixels [2]. These methods often require an enlarged pixel size. Alternatively, high-speed readout like an array parallel stacked structure [3] is useful for integrating multiple frames to realize high DR. However, this leads to an increase in noise and needs faster readout to reduce motion artifacts. In order to mitigate the adverse effects, a method has been proposed in which a pixel array is divided into a plurality of blocks and the signal integration time of each block is individually controlled [4]. Another method was described in which CE was demonstrated by using pixel level control of the exposure time [5]. However, in these methods, it was necessary to arrange the readout path and control circuitry within the same plane because these are non-stacked sensors, so the pixel size was relatively large and high resolution was difficult to realize. To overcome the above problems, we report a sensor that can simultaneously achieve 4K×4K resolution and 1000fps high-speed readout. Using a stacked structure, we demonstrate coded exposure capability by individually controlling exposure time for each block of pixels.","PeriodicalId":371093,"journal":{"name":"2021 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121169457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

8 An Output-Bandwidth-Optimized 200Gb/s PAM-4 100Gb/s NRZ Transmitter with 5-Tap FFE in 28nm CMOS 一种输出带宽优化的200Gb/s PAM-4 100Gb/s NRZ发射机，带5分路FFE, 28nm CMOS

2021 IEEE International Solid- State Circuits Conference (ISSCC)

Pub Date : 2021-02-13 DOI: 10.1109/ISSCC42613.2021.9366012

Minsoo Choi, Zhongkai Wang, Kyoungtae Lee, Kwanseo Park, Zhaokai Liu, Ayan Biswas, Jaeduk Han, E. Alon

The ever-expanding demand for ultra-high-speed interconnects has driven the development of wireline TXs operating at >100Gb/s per lane [1]–[4]. This paper presents a PAM-4 TX achieving 200Gb/s with improved output bandwidth and output swing by minimizing the driver capacitance with pull-up current sources, multiplexing with flexible clock timing control, and employing a fully reconfigurable 5-tap FFE architecture.

对超高速互连的需求不断扩大，推动了每通道>100Gb/s的有线TXs的发展[1]-[4]。本文介绍了一种PAM-4 TX实现200Gb/s的输出带宽和输出摆幅，通过使用上拉电流源最小化驱动器电容，灵活的时钟时序控制多路复用，并采用完全可重构的5抽头FFE架构。

引用次数: 6

10.6 A 12b 16GS/s RF-Sampling Capacitive DAC for Multi-Band Soft-Radio Base-Station Applications with On-Chip Transmission-Line Matching Network in 16nm FinFET 10.6基于16nm FinFET片上传输在线匹配网络的多频带软无线电基站应用的12b 16GS/s射频采样电容式DAC

2021 IEEE International Solid- State Circuits Conference (ISSCC)

Pub Date : 2021-02-13 DOI: 10.1109/ISSCC42613.2021.9365744

Daniel Gruber, M. Clara, R. Sanchez-Perez, Yu-shan Wang, C. Duller, Gerald Rauter, Patrick Torta, K. Azadet

Future multi-band software-defined-radio base-stations for digital beamforming and massive MIMO applications depend heavily on the availability of highly linear and compact data converters with good power efficiency, while at the same time offering multi-GHz signal-bandwidth at sampling rates well in excess of 10GS/s. Wideband RF-sampling D/A-converters have traditionally been implemented in current-steering architectures, mostly with extensive calibration infrastructure [1] –[3]. The transistor stack required to achieve the necessary static and dynamic output impedance for the code-steered current sources leads to limited supply voltage scalability, while the capacitive self-loading by the current-source array makes true wideband matching at the RF-output inherently difficult. Capacitive digital-to-analog converters (C-DAC) have been widely used as RF DAC or switched-capacitor power amplifiers. Up to now digital transmitters have used C-DACs with inherent mixing functionality in polar or IQ systems for synthesis of high-power RF signals of moderate bandwidth of up to 160MHz [4] –[6]. This work uses a capacitive DAC as a direct RF-sampling DAC with moderate output power level for direct signal synthesis over a bandwidth from 0.5GHz up to at least 8GHz.

未来用于数字波束形成和大规模MIMO应用的多频带软件定义无线电基站在很大程度上依赖于具有良好功率效率的高线性和紧凑数据转换器的可用性，同时在采样率超过10GS/s的情况下提供多ghz信号带宽。宽带射频采样D/ a转换器传统上是在电流转向架构中实现的，大多数具有广泛的校准基础设施[1]-[3]。为实现代码控制电流源所需的静态和动态输出阻抗所需的晶体管堆栈导致电源电压的可扩展性有限，而电流源阵列的电容性自负载使得rf输出的真正宽带匹配本身就很困难。电容式数模转换器(C-DAC)广泛应用于射频数模转换器或开关电容功率放大器。到目前为止，数字发射机已经在极性或IQ系统中使用具有固有混音功能的c - dac来合成中等带宽高达160MHz[4] -[6]的大功率RF信号。这项工作使用电容式DAC作为直接rf采样DAC，具有中等输出功率水平，可在0.5GHz至至少8GHz的带宽范围内进行直接信号合成。

{"title":"10.6 A 12b 16GS/s RF-Sampling Capacitive DAC for Multi-Band Soft-Radio Base-Station Applications with On-Chip Transmission-Line Matching Network in 16nm FinFET","authors":"Daniel Gruber, M. Clara, R. Sanchez-Perez, Yu-shan Wang, C. Duller, Gerald Rauter, Patrick Torta, K. Azadet","doi":"10.1109/ISSCC42613.2021.9365744","DOIUrl":"https://doi.org/10.1109/ISSCC42613.2021.9365744","url":null,"abstract":"Future multi-band software-defined-radio base-stations for digital beamforming and massive MIMO applications depend heavily on the availability of highly linear and compact data converters with good power efficiency, while at the same time offering multi-GHz signal-bandwidth at sampling rates well in excess of 10GS/s. Wideband RF-sampling D/A-converters have traditionally been implemented in current-steering architectures, mostly with extensive calibration infrastructure [1] –[3]. The transistor stack required to achieve the necessary static and dynamic output impedance for the code-steered current sources leads to limited supply voltage scalability, while the capacitive self-loading by the current-source array makes true wideband matching at the RF-output inherently difficult. Capacitive digital-to-analog converters (C-DAC) have been widely used as RF DAC or switched-capacitor power amplifiers. Up to now digital transmitters have used C-DACs with inherent mixing functionality in polar or IQ systems for synthesis of high-power RF signals of moderate bandwidth of up to 160MHz [4] –[6]. This work uses a capacitive DAC as a direct RF-sampling DAC with moderate output power level for direct signal synthesis over a bandwidth from 0.5GHz up to at least 8GHz.","PeriodicalId":371093,"journal":{"name":"2021 IEEE International Solid- State Circuits Conference (ISSCC)","volume":" 73","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113948677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A 176-Stacked 512Gb 3b/Cell 3D-NAND Flash with 10.8Gb/mm2 Density with a Peripheral Circuit Under Cell Array Architecture Cell阵列架构下具有外围电路的176堆叠512Gb 3b/Cell 3d nand闪存，其密度为10.8Gb/mm2

2021 IEEE International Solid- State Circuits Conference (ISSCC)

Pub Date : 2021-02-13 DOI: 10.1109/ISSCC42613.2021.9365809

Jae-Woo Park, Doogon Kim, Sunghwa Ok, Jaebeom Park, Taeheui Kwon, Hyun-Seob Lee, Sungmook Lim, Sun-Young Jung, Hyeong-Jin Choi, Taikyu Kang, Gwan Park, Chulwoo Yang, Jeong-Gil Choi, Gwihan Ko, Jae-Hyeon Shin, Ingon Yang, Junghoon Nam, H. Sohn, Seok-in Hong, Yohan Jeong, Sung-Wook Choi, Changwoon Choi, Hyun-Soo Shin, Ju-Young Lim, Dongkyu Youn, Sanghyuk Nam, Juyeab Lee, M. Ahn, Hoseok Lee, Seungpil Lee, Jongmin Park, Kichang Gwon, Woopyo Jeong, Jungdal Choi, Jinkook Kim, K. Jin

With an explosive growth of data generated by various applications, one of the most important topics of the current era is to increase the storage capacity. The evolution from 2D planar NAND to 3D NAND enables the development of high-density storage by increasing the number of stacked word-lines (WLs) in a smaller footprint. The industry has moved beyond 96-stacked-WL and achieved a 128-stacked 3D NAND. A 128-stacked 3b/cell 3D NAND with a density of 7.8Gb/mm 2 was reported recently, based on a peripheral circuit under cell array (PUC) structure [1]. Nevertheless, due to the constant demand for increased density, 3D NAND faces the following challenges [2,3]: (1) a reduced PUC area due to an increasing WL stack, (2) increased load due to a higher number of stacks and a reduced spacing between WLs, (3) rising WL-channel capacitance due to an increasing number of strings, and (4) variation in the RC delay between WLs due to the non-uniformity of plug critical dimension (CD). Not only do these problems limit the density improvement of 3D NAND, but they also increase the WL rise time, which degrades read and write performance. This paper proposes the following techniques to overcome these challenges: (1) a 12-stage page buffer (PB) with one-to-one (1:1) PBUS(PB to cache connection bus), (2) a variable stage and frequency charge pump with a boosted local pump, (3) center X-decoder (XDEC) and half-plane activation, (4) an unselected string boosting scheme, and (5) adaptive WL overdrive (OVD). By applying these techniques, we achieved a density of 10.8Gb/mm 2 in a 176stacked 3D NAND using 3b/cell.

随着各种应用程序产生的数据爆炸式增长，增加存储容量是当前时代最重要的主题之一。从2D平面NAND到3D NAND的演变通过在更小的占地面积内增加堆叠字行(wl)的数量来实现高密度存储的发展。业界已经超越了96层堆叠的3D NAND，实现了128层堆叠的3D NAND。最近报道了一种基于单元阵列(PUC)结构下外围电路的128层3b/cell 3D NAND，其密度为7.8Gb/ mm2[1]。然而，由于不断提高密度的需求，3D NAND面临以下挑战[2,3]:(1)由于WL堆叠的增加而减小的PUC面积，(2)由于堆叠数量的增加和WL之间间距的减小而增加的负载，(3)由于字符串数量的增加而增加的WL通道电容，以及(4)由于插入临界尺寸(CD)的不均匀性而导致WL之间RC延迟的变化。这些问题不仅限制了3D NAND密度的提高，而且还增加了WL上升时间，从而降低了读写性能。本文提出了以下技术来克服这些挑战:(1)带有一对一(1:1)PBUS(PB到缓存连接总线)的12级页面缓冲区(PB)，(2)带增强本地泵的可变级和频率电荷泵，(3)中心x解码器(XDEC)和半平面激活，(4)非选择串增强方案，以及(5)自适应WL超速驱动(OVD)。通过应用这些技术，我们在使用3b/cell的176堆叠3D NAND中实现了10.8Gb/mm 2的密度。

{"title":"A 176-Stacked 512Gb 3b/Cell 3D-NAND Flash with 10.8Gb/mm2 Density with a Peripheral Circuit Under Cell Array Architecture","authors":"Jae-Woo Park, Doogon Kim, Sunghwa Ok, Jaebeom Park, Taeheui Kwon, Hyun-Seob Lee, Sungmook Lim, Sun-Young Jung, Hyeong-Jin Choi, Taikyu Kang, Gwan Park, Chulwoo Yang, Jeong-Gil Choi, Gwihan Ko, Jae-Hyeon Shin, Ingon Yang, Junghoon Nam, H. Sohn, Seok-in Hong, Yohan Jeong, Sung-Wook Choi, Changwoon Choi, Hyun-Soo Shin, Ju-Young Lim, Dongkyu Youn, Sanghyuk Nam, Juyeab Lee, M. Ahn, Hoseok Lee, Seungpil Lee, Jongmin Park, Kichang Gwon, Woopyo Jeong, Jungdal Choi, Jinkook Kim, K. Jin","doi":"10.1109/ISSCC42613.2021.9365809","DOIUrl":"https://doi.org/10.1109/ISSCC42613.2021.9365809","url":null,"abstract":"With an explosive growth of data generated by various applications, one of the most important topics of the current era is to increase the storage capacity. The evolution from 2D planar NAND to 3D NAND enables the development of high-density storage by increasing the number of stacked word-lines (WLs) in a smaller footprint. The industry has moved beyond 96-stacked-WL and achieved a 128-stacked 3D NAND. A 128-stacked 3b/cell 3D NAND with a density of 7.8Gb/mm 2 was reported recently, based on a peripheral circuit under cell array (PUC) structure [1]. Nevertheless, due to the constant demand for increased density, 3D NAND faces the following challenges [2,3]: (1) a reduced PUC area due to an increasing WL stack, (2) increased load due to a higher number of stacks and a reduced spacing between WLs, (3) rising WL-channel capacitance due to an increasing number of strings, and (4) variation in the RC delay between WLs due to the non-uniformity of plug critical dimension (CD). Not only do these problems limit the density improvement of 3D NAND, but they also increase the WL rise time, which degrades read and write performance. This paper proposes the following techniques to overcome these challenges: (1) a 12-stage page buffer (PB) with one-to-one (1:1) PBUS(PB to cache connection bus), (2) a variable stage and frequency charge pump with a boosted local pump, (3) center X-decoder (XDEC) and half-plane activation, (4) an unselected string boosting scheme, and (5) adaptive WL overdrive (OVD). By applying these techniques, we achieved a density of 10.8Gb/mm 2 in a 176stacked 3D NAND using 3b/cell.","PeriodicalId":371093,"journal":{"name":"2021 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"311 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123768194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

26.3 A mm-Wave Power Amplifier for 5G Communication Using a Dual-Drive Topology Exhibiting a Maximum PAE of 50% and Maximum DE of 60% at 30GHz 26.3用于5G通信的毫米波功率放大器，采用双驱动拓扑，30GHz时最大PAE为50%，最大DE为60%

2021 IEEE International Solid- State Circuits Conference (ISSCC)

Pub Date : 2021-02-13 DOI: 10.1109/ISSCC42613.2021.9365830

E. Garay, D. Munzer, Hua Wang

The mm-wave spectrum is opening a new opportunity for TRx systems to operate at high-Gb/s data-rates. However, this opportunity is also imposing stringent requirements for power amplifiers (PAs) in terms of efficiency and linearity. To this date, all PA designs focus on increasing the peak/power-back-off (PBO) PAE and output power $(max mathrm{P}_{out})$ by either presenting multi-harmonic terminations or improving on existing topologies, such as stacked, outphasing, and Doherty PAs [1 –3]. However, in highly scaled silicon processes with low supply voltages, these reported techniques see diminishing returns on PAE and $mathrm{P}_{out}$ since the transistor knee voltage $(mathrm{V}_{knee})$ becomes a significant portion of the supply voltage [5]. Moreover, an extra reduction in supply voltage is often performed in practical deployment to ensure device reliability. This is especially relevant for mm-wave array operations, where array element couplings result in substantial antenna impedance mismatches and undesired large PA voltage swings [6]. Although the reported techniques have improved overall PA efficiency at mm-wave, fundamentally they are incapable of surpassing the theoretical PA core efficiency at the same conduction angle (e.g., Class-B common-source (CS) PA) without resorting to device switching, or harmonic shaping.

毫米波频谱为TRx系统在高gb /s数据速率下运行提供了新的机会。然而，这个机会也对功率放大器(pa)在效率和线性方面提出了严格的要求。到目前为止，所有的放大器设计都专注于提高峰值/功率返回(PBO) PAE和输出功率$(max maththrm {P}_{out})$，方法是采用多谐波终端或改进现有拓扑，如堆叠、同相和Doherty放大器[1 -3]。然而，在具有低电源电压的高规模硅工艺中，由于晶体管膝电压$( mathm {V}_{膝})$成为电源电压的重要部分，因此这些报道的技术在PAE和$ mathm {P}_{out}$上的回报递减[5]。此外，在实际部署中，通常会额外降低电源电压，以确保器件的可靠性。这与毫米波阵列操作尤其相关，其中阵列元件耦合导致大量天线阻抗不匹配和不希望的大PA电压波动[6]。虽然所报道的技术提高了毫米波下的整体PA效率，但从根本上说，如果不采用器件开关或谐波整形，它们无法在相同导通角(例如b类共源(CS) PA)下超越理论PA核心效率。

{"title":"26.3 A mm-Wave Power Amplifier for 5G Communication Using a Dual-Drive Topology Exhibiting a Maximum PAE of 50% and Maximum DE of 60% at 30GHz","authors":"E. Garay, D. Munzer, Hua Wang","doi":"10.1109/ISSCC42613.2021.9365830","DOIUrl":"https://doi.org/10.1109/ISSCC42613.2021.9365830","url":null,"abstract":"The mm-wave spectrum is opening a new opportunity for TRx systems to operate at high-Gb/s data-rates. However, this opportunity is also imposing stringent requirements for power amplifiers (PAs) in terms of efficiency and linearity. To this date, all PA designs focus on increasing the peak/power-back-off (PBO) PAE and output power $(max mathrm{P}_{out})$ by either presenting multi-harmonic terminations or improving on existing topologies, such as stacked, outphasing, and Doherty PAs [1 –3]. However, in highly scaled silicon processes with low supply voltages, these reported techniques see diminishing returns on PAE and $mathrm{P}_{out}$ since the transistor knee voltage $(mathrm{V}_{knee})$ becomes a significant portion of the supply voltage [5]. Moreover, an extra reduction in supply voltage is often performed in practical deployment to ensure device reliability. This is especially relevant for mm-wave array operations, where array element couplings result in substantial antenna impedance mismatches and undesired large PA voltage swings [6]. Although the reported techniques have improved overall PA efficiency at mm-wave, fundamentally they are incapable of surpassing the theoretical PA core efficiency at the same conduction angle (e.g., Class-B common-source (CS) PA) without resorting to device switching, or harmonic shaping.","PeriodicalId":371093,"journal":{"name":"2021 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123786144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

16.3 A 28nm 384kb 6T-SRAM Computation-in-Memory Macro with 8b Precision for AI Edge Chips 16.3用于AI边缘芯片的28nm 384kb 6T-SRAM内存宏计算，精度为8b

2021 IEEE International Solid- State Circuits Conference (ISSCC)

Pub Date : 2021-02-13 DOI: 10.1109/ISSCC42613.2021.9365984

Jian-Wei Su, Yen-Chi Chou, Ruhui Liu, Ta-Wei Liu, Pei-Jung Lu, P. Wu, Yen-Lin Chung, Li-Yang Hung, Jin-Sheng Ren, Tianlong Pan, Sih-Han Li, Shih-Chieh Chang, S. Sheu, W. Lo, Chih-I Wu, Xin Si, C. Lo, Ren-Shuo Liu, C. Hsieh, K. Tang, Meng-Fan Chang

Recent SRAM-based computation-in-memory (CIM) macros enable mid-to-high precision multiply-and-accumulate (MAC) operations with improved energy efficiency using ultra-small/small capacity (0.4-8KB) memory devices. However, advanced CIM-based edge-AI chips favor multiple mid/large capacity SRAM-CIM macros: with high input (IN) and weight (W) precision to reduce the frequency of data reloads from external DRAM, and to avoid the need for additional SRAM buffers or ultra-large on-chip weight buffers. However, enlarging memory capacity and throughput increases the delay parasitics on WLs and BLs, and the number of parallel computing elements; resulting in longer compute latency (tAC), lower energy-efficiency (EF), degraded signal margin, and larger fluctuations in power consumption across data-patterns (see Fig. 16.3.1). Recent SRAM-CIM macros tend to not use in-lab SRAM cells, with a logic-based layout, in favor of foundry provided compact-layout 8T [2], 3, [5] or 6T cells with local-computing cells (LCCs) [4], [6] to reduce the cell-array area and facilitate manufacturing. This paper presents a SRAM-CIM structure using (1) a segmented-BL charge-sharing (SBCS) scheme for MAC operations, with low energy consumption and a consistently high signal margin across MAC values (MACV); (2) An new LCC cell, called a source-injection local-multiplication cell (SILMC), to support the SBCS scheme with a consistent signal margin against transistor process variation; and (3) A prioritized-hybrid-ADC (Ph-ADC) to achieve a small area and power overhead for analog readout. A 28nm 384kb SRAM-CIM macro was fabricated using a foundry compact-6T cell with support for MAC operations with 16 accumulations of 8b-inputs and 8b-weights with near-full precision output (20b). This macro achieves a 7.2ns tAC and a 22.75TOPS/W EF for 8b-MAC operations with an FoM (IN-precision × W-precision × output-ratio × output-channel × EF/tAC) 6× higher than prior work.

最近基于sram的内存计算(CIM)宏使用超小型/小容量(0.4-8KB)内存设备，可以实现中高精度的乘法累加(MAC)操作，并提高了能效。然而，先进的基于cim的边缘ai芯片支持多个中/大容量SRAM- cim宏:具有高输入(IN)和重量(W)精度，以减少从外部DRAM重新加载数据的频率，并避免需要额外的SRAM缓冲区或超大片上重量缓冲区。然而，内存容量和吞吐量的增加增加了时延寄生在wl和bl上，并增加了并行计算单元的数量;导致更长的计算延迟(tAC)、更低的能效(EF)、退化的信号裕度，以及跨数据模式的更大功耗波动(见图16.3.1)。最近的SRAM- cim宏倾向于不使用实验室SRAM单元，具有基于逻辑的布局，而倾向于代工提供紧凑布局的8T[2]， 3，[5]或带有本地计算单元(lcc)[4]，[6]的6T单元，以减少单元阵列面积并促进制造。本文提出了一种SRAM-CIM结构，使用(1)MAC操作的分段bl电荷共享(SBCS)方案，具有低能耗和跨MAC值(MACV)一致的高信号余量;(2)一种新的LCC单元，称为源注入局部倍增单元(SILMC)，以支持SBCS方案，具有一致的信号裕度，可以抵抗晶体管工艺变化;(3)优先混合adc (Ph-ADC)，以实现模拟读出的小面积和功耗开销。一个28nm 384kb的SRAM-CIM宏是使用代工紧凑的6t单元制造的，支持MAC操作，具有16个8b输入和8b权重的累积，具有接近全精度的输出(20b)。对于8b-MAC操作，该宏实现了7.2ns tAC和22.75TOPS/W EF, FoM (in精度× W精度×输出比×输出通道× EF/tAC)比以前的工作高6倍。

{"title":"16.3 A 28nm 384kb 6T-SRAM Computation-in-Memory Macro with 8b Precision for AI Edge Chips","authors":"Jian-Wei Su, Yen-Chi Chou, Ruhui Liu, Ta-Wei Liu, Pei-Jung Lu, P. Wu, Yen-Lin Chung, Li-Yang Hung, Jin-Sheng Ren, Tianlong Pan, Sih-Han Li, Shih-Chieh Chang, S. Sheu, W. Lo, Chih-I Wu, Xin Si, C. Lo, Ren-Shuo Liu, C. Hsieh, K. Tang, Meng-Fan Chang","doi":"10.1109/ISSCC42613.2021.9365984","DOIUrl":"https://doi.org/10.1109/ISSCC42613.2021.9365984","url":null,"abstract":"Recent SRAM-based computation-in-memory (CIM) macros enable mid-to-high precision multiply-and-accumulate (MAC) operations with improved energy efficiency using ultra-small/small capacity (0.4-8KB) memory devices. However, advanced CIM-based edge-AI chips favor multiple mid/large capacity SRAM-CIM macros: with high input (IN) and weight (W) precision to reduce the frequency of data reloads from external DRAM, and to avoid the need for additional SRAM buffers or ultra-large on-chip weight buffers. However, enlarging memory capacity and throughput increases the delay parasitics on WLs and BLs, and the number of parallel computing elements; resulting in longer compute latency (tAC), lower energy-efficiency (EF), degraded signal margin, and larger fluctuations in power consumption across data-patterns (see Fig. 16.3.1). Recent SRAM-CIM macros tend to not use in-lab SRAM cells, with a logic-based layout, in favor of foundry provided compact-layout 8T [2], 3, [5] or 6T cells with local-computing cells (LCCs) [4], [6] to reduce the cell-array area and facilitate manufacturing. This paper presents a SRAM-CIM structure using (1) a segmented-BL charge-sharing (SBCS) scheme for MAC operations, with low energy consumption and a consistently high signal margin across MAC values (MACV); (2) An new LCC cell, called a source-injection local-multiplication cell (SILMC), to support the SBCS scheme with a consistent signal margin against transistor process variation; and (3) A prioritized-hybrid-ADC (Ph-ADC) to achieve a small area and power overhead for analog readout. A 28nm 384kb SRAM-CIM macro was fabricated using a foundry compact-6T cell with support for MAC operations with 16 accumulations of 8b-inputs and 8b-weights with near-full precision output (20b). This macro achieves a 7.2ns tAC and a 22.75TOPS/W EF for 8b-MAC operations with an FoM (IN-precision × W-precision × output-ratio × output-channel × EF/tAC) 6× higher than prior work.","PeriodicalId":371093,"journal":{"name":"2021 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131149292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 76