Pub Date : 2022-11-01DOI: 10.1109/OJSSCS.2022.3218494
Justin Yonghui Kim;Antonio Liscidini
A power-scalable RF front-end using quantized analog signal processing is presented. The front-end is based on a voltage-mode power-scalable approach which allows the power dissipation to be scaled upon the operative scenario and to perform an agile calibration for mismatch impairments. Power and input dynamic range can be scaled upon the desired 1-dB compression point (1dBCP) (from −15.3 to 0.5 dBm) while keeping the same sensitivity with 2.5-dB NF. Signal path power can vary between 3.3 and 6.4 mW while clock generation and distribution power can vary between 1.6 and 18.5 mW/GHz, with a phase noise as low as −171.2 dBc/Hz. After calibration, IM2 and IM3 improved up to 33 dB while 1dBCP improved by 1 dB, which resulted in achieving an IIP3 of 26.1 dBm and IIP2 of 71 dBm at 0-dBm 1dBCP.
{"title":"A Reconfigurable Power-Efficient Quantized Analog RF Front-End With Smart Calibration","authors":"Justin Yonghui Kim;Antonio Liscidini","doi":"10.1109/OJSSCS.2022.3218494","DOIUrl":"https://doi.org/10.1109/OJSSCS.2022.3218494","url":null,"abstract":"A power-scalable RF front-end using quantized analog signal processing is presented. The front-end is based on a voltage-mode power-scalable approach which allows the power dissipation to be scaled upon the operative scenario and to perform an agile calibration for mismatch impairments. Power and input dynamic range can be scaled upon the desired 1-dB compression point (1dBCP) (from −15.3 to 0.5 dBm) while keeping the same sensitivity with 2.5-dB NF. Signal path power can vary between 3.3 and 6.4 mW while clock generation and distribution power can vary between 1.6 and 18.5 mW/GHz, with a phase noise as low as −171.2 dBc/Hz. After calibration, IM2 and IM3 improved up to 33 dB while 1dBCP improved by 1 dB, which resulted in achieving an IIP3 of 26.1 dBm and IIP2 of 71 dBm at 0-dBm 1dBCP.","PeriodicalId":100633,"journal":{"name":"IEEE Open Journal of the Solid-State Circuits Society","volume":"2 ","pages":"165-174"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8782712/9733783/09933817.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67868140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-31DOI: 10.1109/OJSSCS.2022.3217759
Yue Ma;Can Wu;Nicholas M. Fata;Prakhar Kumar;Sigurd Wagner;James C. Sturm;Naveen Verma
Recent progress has substantially increased the operating frequency of large-area electronic (LAE) devices. Their integration into circuits has enabled unprecedented system-level capabilities, toward future wireless applications for the Internet of Things (IoT) and 5G/6G. These exploit large dimensions and flexible form factors. In this work, we focus on giga-Hertz (GHz) zinc-oxide (ZnO) thin-film transistors (TFTs) as a foundational device for enabling GHz LAE circuits and systems. To further understand their operation and limits in the newly possible frequency regime, we incorporate the effects of temperature and of non-quasi-static (NQS) physics into the device models. We then analyze operation including these effects on a fundamental circuit block, the cross-coupled inductor-capacitor (LC) oscillator. It is used in representative LAE systems, namely, a 13.56-MHz radio-frequency identification (RFID) reader array for near-field energy transfer, and a 1-GHz phased array for far-field radiation beam steering. The co-design of devices, circuits, and systems is essential for achieving flexible and meter-scale monolithic-integrated LAE wireless systems. For these, understanding temperature limitations and the NQS effect is crucial.
{"title":"Device, Circuit, and System Design for Enabling Giga-Hertz Large-Area Electronics","authors":"Yue Ma;Can Wu;Nicholas M. Fata;Prakhar Kumar;Sigurd Wagner;James C. Sturm;Naveen Verma","doi":"10.1109/OJSSCS.2022.3217759","DOIUrl":"https://doi.org/10.1109/OJSSCS.2022.3217759","url":null,"abstract":"Recent progress has substantially increased the operating frequency of large-area electronic (LAE) devices. Their integration into circuits has enabled unprecedented system-level capabilities, toward future wireless applications for the Internet of Things (IoT) and 5G/6G. These exploit large dimensions and flexible form factors. In this work, we focus on giga-Hertz (GHz) zinc-oxide (ZnO) thin-film transistors (TFTs) as a foundational device for enabling GHz LAE circuits and systems. To further understand their operation and limits in the newly possible frequency regime, we incorporate the effects of temperature and of non-quasi-static (NQS) physics into the device models. We then analyze operation including these effects on a fundamental circuit block, the cross-coupled inductor-capacitor (LC) oscillator. It is used in representative LAE systems, namely, a 13.56-MHz radio-frequency identification (RFID) reader array for near-field energy transfer, and a 1-GHz phased array for far-field radiation beam steering. The co-design of devices, circuits, and systems is essential for achieving flexible and meter-scale monolithic-integrated LAE wireless systems. For these, understanding temperature limitations and the NQS effect is crucial.","PeriodicalId":100633,"journal":{"name":"IEEE Open Journal of the Solid-State Circuits Society","volume":"2 ","pages":"177-192"},"PeriodicalIF":0.0,"publicationDate":"2022-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8782712/9733783/09933352.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67868134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-25DOI: 10.1109/OJSSCS.2022.3217019
Enne Wittenhagen;Patrick James Artz;Philipp Scholz;Friedel Gerfers
In this article, a 3-GS/s time-interleaved (TI) RF track-and-hold (TaH) amplifier designed in a 22-nm SOI technology is presented. The TaH amplifier is designed to drive an ADC, which can be either two pipeline-ADCs or two rows of SAR-ADCs. Both TI TaH are driven by a single RF-matched wide-band bulk-controlled front-end (FE) buffer. The measured TaH amplifier has an SFDR beyond 70 dBc up to 2.5 GHz and remains above 67 dBc till 3 GHz enabling subsampling. An overall system bandwidth of 4.5 GHz is achieved with an SNR above 55 dBFS. The ultralow-jitter clock regeneration has only 45 fs rms jitter not limiting the SNR up to 3 GHz. Two-tone and multitone measurements reveal a third intermodulation and interband nonlinearity with >72 and >82 dBFS, respectively. Off-chip calibration of offset/gain mismatch and time-skew between both TaH-lanes reduce interleaving spurs >75 dBFS utilizing a 37-tap fractional delay FIR filter. The efficient body-bias control of the technology is used to dynamically body-bias the TaH sample-switch increasing bandwidth by 10% improving settling performance while at the same time the leakage decreases. Static body-biasing is also applied to the common-mode feedback by using the bulk as a control node. The TaH amplifier including the clock generation consumes only 178 mW from a triple 2 V/0.9 V/−0.8 V supply.
{"title":"A 3-GS/s RF Track-and-Hold Amplifier Utilizing Body-Biasing With >55-dBFS SNR and >67-dBc SFDR Up to 3 GHz in 22-nm CMOS SOI","authors":"Enne Wittenhagen;Patrick James Artz;Philipp Scholz;Friedel Gerfers","doi":"10.1109/OJSSCS.2022.3217019","DOIUrl":"https://doi.org/10.1109/OJSSCS.2022.3217019","url":null,"abstract":"In this article, a 3-GS/s time-interleaved (TI) RF track-and-hold (TaH) amplifier designed in a 22-nm SOI technology is presented. The TaH amplifier is designed to drive an ADC, which can be either two pipeline-ADCs or two rows of SAR-ADCs. Both TI TaH are driven by a single RF-matched wide-band bulk-controlled front-end (FE) buffer. The measured TaH amplifier has an SFDR beyond 70 dBc up to 2.5 GHz and remains above 67 dBc till 3 GHz enabling subsampling. An overall system bandwidth of 4.5 GHz is achieved with an SNR above 55 dBFS. The ultralow-jitter clock regeneration has only 45 fs rms jitter not limiting the SNR up to 3 GHz. Two-tone and multitone measurements reveal a third intermodulation and interband nonlinearity with >72 and >82 dBFS, respectively. Off-chip calibration of offset/gain mismatch and time-skew between both TaH-lanes reduce interleaving spurs >75 dBFS utilizing a 37-tap fractional delay FIR filter. The efficient body-bias control of the technology is used to dynamically body-bias the TaH sample-switch increasing bandwidth by 10% improving settling performance while at the same time the leakage decreases. Static body-biasing is also applied to the common-mode feedback by using the bulk as a control node. The TaH amplifier including the clock generation consumes only 178 mW from a triple 2 V/0.9 V/−0.8 V supply.","PeriodicalId":100633,"journal":{"name":"IEEE Open Journal of the Solid-State Circuits Society","volume":"2 ","pages":"135-143"},"PeriodicalIF":0.0,"publicationDate":"2022-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8782712/9733783/09928330.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67868138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-25DOI: 10.1109/OJSSCS.2022.3216798
Sang Min Lee;Hanjoon Kim;Jeseung Yeon;Juyun Lee;Younggeun Choi;Minho Kim;Changjae Park;Kiseok Jang;Youngsik Kim;Yongseung Kim;Changman Lee;Hyuck Han;Won Eung Kim;Rui Tang;Joon Ho Baek
For energy-efficient accelerators in data centers that leverage advances in the performance and energy efficiency of recent algorithms, flexible architectures are critical to support state-of-the-art algorithms for various deep learning tasks. Due to the matrix multiplication units at the core of tensor operations, most recent programmable architectures lack flexibility for layers with diminished dimensions, especially for inferences where a large batch axis is rarely allowed. In addition, exploiting the data reuse inherent within tensor operations for computing a single matrix multiplication is challenging. In this work, an extension of a vector processor in 14 nm is proposed, which is customized to tensor operations. The flexible architecture enables a tensorized loop to support various data layouts and different shapes and sizes of tensor operations. It also exploits all possible data reuse, including input, weight, and output. Based on the tensorized loop, fetch and reduction networks, which unicast or multicast with the ordering of both input data and processing data, can be simplified using a circuit-switching-like network with configured topology and flow control for each tensor operation. Two processing elements can be fused to optimize latency for a large model or can operate individually for throughput. As a result, various state-of-the-art models can be processed efficiently with straightforward compiler optimization, and the highest energy efficiency of 13.4Inferences/s/W on EfficientNetV2-S is demonstrated.
{"title":"A 64-TOPS Energy-Efficient Tensor Accelerator in 14nm With Reconfigurable Fetch Network and Processing Fusion for Maximal Data Reuse","authors":"Sang Min Lee;Hanjoon Kim;Jeseung Yeon;Juyun Lee;Younggeun Choi;Minho Kim;Changjae Park;Kiseok Jang;Youngsik Kim;Yongseung Kim;Changman Lee;Hyuck Han;Won Eung Kim;Rui Tang;Joon Ho Baek","doi":"10.1109/OJSSCS.2022.3216798","DOIUrl":"https://doi.org/10.1109/OJSSCS.2022.3216798","url":null,"abstract":"For energy-efficient accelerators in data centers that leverage advances in the performance and energy efficiency of recent algorithms, flexible architectures are critical to support state-of-the-art algorithms for various deep learning tasks. Due to the matrix multiplication units at the core of tensor operations, most recent programmable architectures lack flexibility for layers with diminished dimensions, especially for inferences where a large batch axis is rarely allowed. In addition, exploiting the data reuse inherent within tensor operations for computing a single matrix multiplication is challenging. In this work, an extension of a vector processor in 14 nm is proposed, which is customized to tensor operations. The flexible architecture enables a tensorized loop to support various data layouts and different shapes and sizes of tensor operations. It also exploits all possible data reuse, including input, weight, and output. Based on the tensorized loop, fetch and reduction networks, which unicast or multicast with the ordering of both input data and processing data, can be simplified using a circuit-switching-like network with configured topology and flow control for each tensor operation. Two processing elements can be fused to optimize latency for a large model or can operate individually for throughput. As a result, various state-of-the-art models can be processed efficiently with straightforward compiler optimization, and the highest energy efficiency of 13.4Inferences/s/W on EfficientNetV2-S is demonstrated.","PeriodicalId":100633,"journal":{"name":"IEEE Open Journal of the Solid-State Circuits Society","volume":"2 ","pages":"219-230"},"PeriodicalIF":0.0,"publicationDate":"2022-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8782712/9733783/09927346.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67868133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Due to the low-power priority of analog delay-based computation, time-domain computing-in-memory (TD-CIM) presents a splendid potential for energy-constrained edge and IoT scenarios deploying convolutional neural networks (CNNs). However, the latency in delay-based computation is proportional to the numbers and values of multiplications-and-accumulations (MACs), bottlenecking the throughput of previous data-agnostic TD-CIM-based processors which compute complete convolutions in a fixed MAC mapping manner. First, some output activations in each layer of CNNs contribute less to the final classification results, which are insignificant and can be substituted by sums of partial MACs, with a marginal accuracy degradation. Thus, complete convolution computations lead to redundant MACs. Second, activations and weights vary with input images and models. Fixed MAC mapping leads to unbalanced MAC values on delay chains, causing long idle time and latency. To address that, we design a data-aware TD-CIM-based CNN processor, DATIC, with three techniques to reduce latency: 1) a channel-skipping TD-CIM macro to remove redundant MACs for insignificant output activations (IOAs), by storing activations stationary in SRAM bitcells and shifting weights to perform only imperative MACs; 2) a convolution-order programming unit to reduce overhead of skipping redundant MACs for IOAs with random positions on feature maps; and 3) an activation-weight-adaptive channel-mapping scheduler to balance the latency of delay chains by dynamically altering the convolution mapping manner. Implemented under TSMC 28-nm technology, DATIC achieves 622.9-GOPS throughput and 32.7-TOPS/W energy efficiency for ResNet-18 with 2-b weights and 8-b activations.
{"title":"DATIC: A Data-Aware Time-Domain Computing-in-Memory-Based CNN Processor With Dynamic Channel Skipping and Mapping","authors":"Jianxun Yang;Yuyao Kong;Yixuan Li;Chenfu Guo;Hao Sun;Leibo Liu;Shaojun Wei;Jun Yang;Shouyi Yin","doi":"10.1109/OJSSCS.2022.3216562","DOIUrl":"https://doi.org/10.1109/OJSSCS.2022.3216562","url":null,"abstract":"Due to the low-power priority of analog delay-based computation, time-domain computing-in-memory (TD-CIM) presents a splendid potential for energy-constrained edge and IoT scenarios deploying convolutional neural networks (CNNs). However, the latency in delay-based computation is proportional to the numbers and values of multiplications-and-accumulations (MACs), bottlenecking the throughput of previous data-agnostic TD-CIM-based processors which compute complete convolutions in a fixed MAC mapping manner. First, some output activations in each layer of CNNs contribute less to the final classification results, which are insignificant and can be substituted by sums of partial MACs, with a marginal accuracy degradation. Thus, complete convolution computations lead to redundant MACs. Second, activations and weights vary with input images and models. Fixed MAC mapping leads to unbalanced MAC values on delay chains, causing long idle time and latency. To address that, we design a data-aware TD-CIM-based CNN processor, DATIC, with three techniques to reduce latency: 1) a channel-skipping TD-CIM macro to remove redundant MACs for insignificant output activations (IOAs), by storing activations stationary in SRAM bitcells and shifting weights to perform only imperative MACs; 2) a convolution-order programming unit to reduce overhead of skipping redundant MACs for IOAs with random positions on feature maps; and 3) an activation-weight-adaptive channel-mapping scheduler to balance the latency of delay chains by dynamically altering the convolution mapping manner. Implemented under TSMC 28-nm technology, DATIC achieves 622.9-GOPS throughput and 32.7-TOPS/W energy efficiency for ResNet-18 with 2-b weights and 8-b activations.","PeriodicalId":100633,"journal":{"name":"IEEE Open Journal of the Solid-State Circuits Society","volume":"2 ","pages":"244-258"},"PeriodicalIF":0.0,"publicationDate":"2022-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8782712/9733783/09927338.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67868225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-18DOI: 10.1109/OJSSCS.2022.3215099
Patrick P. Mercier;Benton H. Calhoun;Po-Han Peter Wang;Anjana Dissanayake;Linsheng Zhang;Drew A. Hall;Steven M. Bowers
Wake-up receivers (WuRXs) offer a potentially energy-efficient means to enable asynchronous wake-up of higher power and higher performance radios without needing frequent (often energy-expensive) synchronization. Since WuRXs are typically on for a large percentage of the time, keeping their power consumption very low is critical to minimizing the total energy draw. However, this is difficult while maintaining good sensitivity, interference resiliency, and robustness, all with application-appropriate wake-up latencies and form factors. This article reviews the main challenges facing WuRXs, outlines the most popular WuRX architectures, and details essential design techniques and tradeoffs toward enabling utility in emerging applications.
{"title":"Low-Power RF Wake-Up Receivers: Analysis, Tradeoffs, and Design","authors":"Patrick P. Mercier;Benton H. Calhoun;Po-Han Peter Wang;Anjana Dissanayake;Linsheng Zhang;Drew A. Hall;Steven M. Bowers","doi":"10.1109/OJSSCS.2022.3215099","DOIUrl":"https://doi.org/10.1109/OJSSCS.2022.3215099","url":null,"abstract":"Wake-up receivers (WuRXs) offer a potentially energy-efficient means to enable asynchronous wake-up of higher power and higher performance radios without needing frequent (often energy-expensive) synchronization. Since WuRXs are typically on for a large percentage of the time, keeping their power consumption very low is critical to minimizing the total energy draw. However, this is difficult while maintaining good sensitivity, interference resiliency, and robustness, all with application-appropriate wake-up latencies and form factors. This article reviews the main challenges facing WuRXs, outlines the most popular WuRX architectures, and details essential design techniques and tradeoffs toward enabling utility in emerging applications.","PeriodicalId":100633,"journal":{"name":"IEEE Open Journal of the Solid-State Circuits Society","volume":"2 ","pages":"144-164"},"PeriodicalIF":0.0,"publicationDate":"2022-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8782712/9733783/09923621.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67868139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-13DOI: 10.1109/OJSSCS.2022.3213772
Mohit Dandekar;Kris Myny;Wim Dehaene
This article presents the design of a readout circuit for charge-output sensor arrays integrated on a flexible substrate. The charge-integrating amplifier is built with a current-output transimpedance amplifier that includes the integrator function with reset. The charge-integrating amplifier has a fully differential internal topology, improving over single-ended design, including the feedback amplifier implemented specifically as a Nauta-transconductor. The readout circuit has been manufactured in a 3- $mu text{m}$