Pub Date : 2021-02-13DOI: 10.1109/ISSCC42613.2021.9366035
Sung-jin Jung, Jeil Ryu, Wang-hyun Kim, Seunghoon Lee, Jongboo Kim, Hyelim Park, Tae-Min Jang, Haedo Jeong, Juhwa Kim, Jeongho Park, Raeyoung Kim, Jeongho Park, H. Jo, Whee Jin Kim, Jangbeom Yang, Bongjin Sohn, Yuncheol Han, Inchun Lim, Seoungjae Yoo, Changsoon Park, Dae-Geun Jang, Byung-Hoon Ko, J. Lim, Jihon Kim, Kyungho Lee, Jesuk Lee, Yongin Park, Long Yan
Incorporating different wavelength (400 to 1000nm) LEDs, photoplethysmography (PPG) sensors allow wearable devices to monitor various health parameters such as heart rate (HR), oxygen saturation (SpO2), and blood pressure (BP). Nowadays, PPG sensing technology at the wrist is well established. To cope with the large degree of motion turbulence presented at the wrist, PPG sensors use Green (Gr) LEDs together with multiple photodiodes (PD), and they are driven by wide-dynamic-range (DR) current-sensing front-ends [1]. It is attractive to use a near-infra-red (nIR) PPG sensor in a True Wireless Stereo (TWS), as the ear provides the best site to measure heart rhythm (more blood flow, constant distance from the heart, and less motion than at the finger or wrist). However, TWS requires a PPG sensor that is more stringent on size and power consumption (shown in Fig. 28.2.1). A promising solution [2, 3] is integrating an array of PDs with an ADC to dramatically reduce power while also providing monolithic integration. However, the limited DR (<80 dB) and the poor spectral responsivity remain challenging. This work advances [1] by demonstrating a CMOS monolithic PPG sensor, and improves spectral responsivity more than $4 times (0.3mathrm{A} /mathrm{W}$ across 400 to 1000nm) compared to [2]. The sensor is fabricated by back-side illumination (BSI) CMOS technology providing 90dB DR (18dB improvement from [3]) while consuming only $24 mu mathrm{W}$ power and 5.5mm 2 silicon area.
{"title":"A 400-to-1000nm 24μ W Monolithic PPG Sensor with 0.3A/W Spectral Responsivity for Miniature Wearables","authors":"Sung-jin Jung, Jeil Ryu, Wang-hyun Kim, Seunghoon Lee, Jongboo Kim, Hyelim Park, Tae-Min Jang, Haedo Jeong, Juhwa Kim, Jeongho Park, Raeyoung Kim, Jeongho Park, H. Jo, Whee Jin Kim, Jangbeom Yang, Bongjin Sohn, Yuncheol Han, Inchun Lim, Seoungjae Yoo, Changsoon Park, Dae-Geun Jang, Byung-Hoon Ko, J. Lim, Jihon Kim, Kyungho Lee, Jesuk Lee, Yongin Park, Long Yan","doi":"10.1109/ISSCC42613.2021.9366035","DOIUrl":"https://doi.org/10.1109/ISSCC42613.2021.9366035","url":null,"abstract":"Incorporating different wavelength (400 to 1000nm) LEDs, photoplethysmography (PPG) sensors allow wearable devices to monitor various health parameters such as heart rate (HR), oxygen saturation (SpO2), and blood pressure (BP). Nowadays, PPG sensing technology at the wrist is well established. To cope with the large degree of motion turbulence presented at the wrist, PPG sensors use Green (Gr) LEDs together with multiple photodiodes (PD), and they are driven by wide-dynamic-range (DR) current-sensing front-ends [1]. It is attractive to use a near-infra-red (nIR) PPG sensor in a True Wireless Stereo (TWS), as the ear provides the best site to measure heart rhythm (more blood flow, constant distance from the heart, and less motion than at the finger or wrist). However, TWS requires a PPG sensor that is more stringent on size and power consumption (shown in Fig. 28.2.1). A promising solution [2, 3] is integrating an array of PDs with an ADC to dramatically reduce power while also providing monolithic integration. However, the limited DR (<80 dB) and the poor spectral responsivity remain challenging. This work advances [1] by demonstrating a CMOS monolithic PPG sensor, and improves spectral responsivity more than $4 times (0.3mathrm{A} /mathrm{W}$ across 400 to 1000nm) compared to [2]. The sensor is fabricated by back-side illumination (BSI) CMOS technology providing 90dB DR (18dB improvement from [3]) while consuming only $24 mu mathrm{W}$ power and 5.5mm 2 silicon area.","PeriodicalId":371093,"journal":{"name":"2021 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"248 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124723581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-02-13DOI: 10.1109/ISSCC42613.2021.9365844
Kyunghoon Kim, Joo-Hyung Chae, Jaehyeok Yang, Ji-Hyo Kang, Gang-Sik Lee, Sangyeon Byeon, Youngtaek Kim, Boram Kim, Donghoon Kim, Yeongmuk Cho, Kangmoo Choi, Hye-Lim Park, Junghwan Ji, S. Jeong, Yongsuk Joo, Jaehoon Cha, Mi-Lim Park, Hongdeuk Kim, Sijun Park, K. Kong, Sunho Kim, Sangkwon Lee, J. Chun, Hyung-Seuk Kim, S. Cha
The demand for high-performance graphics systems used for artificial intelligence continues to grow; this trend requires graphics systems to achieve ever higher bandwidths. Enabling GDDR6 DRAM to achieve data rates beyond 18Gb/s/pin [1] requires identifying and solving factors that affect the speed of a memory interface. Prior studies have showed that the memory interface is vulnerable from the signal integrity (SI) and power integrity (PI) perspective, since it is based on a parallel interface using single-ended signaling. Furthermore, circuit schemes to mitigate process, voltage, and temperature (PVT) variations in sub-nanometer DRAM process are required to improve performance. To achieve 24Gb/s/pin on a 1.35V DRAM process, this work proposes a GDDR6 DRAM with a half-rate clocking architecture and optimized I/O.
{"title":"A 24Gb/s/pin 8Gb GDDR6 with a Half-Rate Daisy-Chain-Based Clocking Architecture and IO Circuitry for Low-Noise Operation","authors":"Kyunghoon Kim, Joo-Hyung Chae, Jaehyeok Yang, Ji-Hyo Kang, Gang-Sik Lee, Sangyeon Byeon, Youngtaek Kim, Boram Kim, Donghoon Kim, Yeongmuk Cho, Kangmoo Choi, Hye-Lim Park, Junghwan Ji, S. Jeong, Yongsuk Joo, Jaehoon Cha, Mi-Lim Park, Hongdeuk Kim, Sijun Park, K. Kong, Sunho Kim, Sangkwon Lee, J. Chun, Hyung-Seuk Kim, S. Cha","doi":"10.1109/ISSCC42613.2021.9365844","DOIUrl":"https://doi.org/10.1109/ISSCC42613.2021.9365844","url":null,"abstract":"The demand for high-performance graphics systems used for artificial intelligence continues to grow; this trend requires graphics systems to achieve ever higher bandwidths. Enabling GDDR6 DRAM to achieve data rates beyond 18Gb/s/pin [1] requires identifying and solving factors that affect the speed of a memory interface. Prior studies have showed that the memory interface is vulnerable from the signal integrity (SI) and power integrity (PI) perspective, since it is based on a parallel interface using single-ended signaling. Furthermore, circuit schemes to mitigate process, voltage, and temperature (PVT) variations in sub-nanometer DRAM process are required to improve performance. To achieve 24Gb/s/pin on a 1.35V DRAM process, this work proposes a GDDR6 DRAM with a half-rate clocking architecture and optimized I/O.","PeriodicalId":371093,"journal":{"name":"2021 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"33 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129454196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-02-13DOI: 10.1109/ISSCC42613.2021.9365944
Wei Zhu, Jiawen Wang, Ruitao Wang, Yan Wang
Millimeter-wave (mm-wave) imaging radars and communication systems operating at W-band obtain an ever-increasing attention due to their high-resolution and high data rate [1] –[6]. However, because of the challenges of lower active device gain and greater passive loss, most W-Band transceivers (TRXs) do not integrate T/R switches and attenuators [1] –[6], which will significantly reduce the system performance and increase the cost. In this paper, we propose a W-band bidirectional phased-array TRX FE in a 65nm CMOS technology to support communication and radar applications in which three critical issues of higher mm-wave-band phased-array TRX FE have been initially dealt with: 1) the large insertion loss (IL) of the T/R switch, 2) the limited resolution and 3) gain/phase variation of the phase shifter (PS) and attenuator. The concept of EM coupling was applied in the W-band circuit design and the couple-based T/R switches, PSs and attenuators were integrated in the TRX FE, achieving <1dB T/R switch IL, > 12.3% peak PAE at 15.1dBm output power and <1°/dB phase/gain resolution with <±2.1dB/±6° gain/phase variation.
{"title":"14.5 A 1V W-Band Bidirectional Transceiver Front-End with <1dB T/R Switch Loss, <1°/dB Phase/Gain Resolution and 12.3% TX PAE at 15.1dBm Output Power in 65nm CMOS Technology","authors":"Wei Zhu, Jiawen Wang, Ruitao Wang, Yan Wang","doi":"10.1109/ISSCC42613.2021.9365944","DOIUrl":"https://doi.org/10.1109/ISSCC42613.2021.9365944","url":null,"abstract":"Millimeter-wave (mm-wave) imaging radars and communication systems operating at W-band obtain an ever-increasing attention due to their high-resolution and high data rate [1] –[6]. However, because of the challenges of lower active device gain and greater passive loss, most W-Band transceivers (TRXs) do not integrate T/R switches and attenuators [1] –[6], which will significantly reduce the system performance and increase the cost. In this paper, we propose a W-band bidirectional phased-array TRX FE in a 65nm CMOS technology to support communication and radar applications in which three critical issues of higher mm-wave-band phased-array TRX FE have been initially dealt with: 1) the large insertion loss (IL) of the T/R switch, 2) the limited resolution and 3) gain/phase variation of the phase shifter (PS) and attenuator. The concept of EM coupling was applied in the W-band circuit design and the couple-based T/R switches, PSs and attenuators were integrated in the TRX FE, achieving <1dB T/R switch IL, > 12.3% peak PAE at 15.1dBm output power and <1°/dB phase/gain resolution with <±2.1dB/±6° gain/phase variation.","PeriodicalId":371093,"journal":{"name":"2021 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129718911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-02-13DOI: 10.1109/ISSCC42613.2021.9365788
Hongyang Jia, Murat Ozatay, Yinqi Tang, Hossein Valavi, Rakshit Pathak, Jinseok Lee, N. Verma
This paper presents a scalable neural-network (NN) inference accelerator in 16nm, based on an array of programmable cores employing mixed-signal In-Memory Computing (IMC), digital Near-Memory Computing (NMC), and localized buffering/control. IMC achieves high energy efficiency and throughput for matrix-vector multiplications (MVMs), which dominate NNs; but, scalability poses numerous challenges, both technologically, going to advanced nodes to maintain gains over digital architectures, and architecturally, for full execution of diverse NNs. Recent demonstrations have explored integrating IMC in programmable processors [1, 2], but have not achieved IMC efficiency and throughput for full executions. The central challenge is drastically different physical design points and associated tradeoffs incurred by IMC compared to digital engines. Namely, IMC substantially increases compute energy efficiency and HW density/parallelism, but retains the overheads of HW virtualization (state and data swapping/buffering/communication across spatial/temporal computation mappings). The demonstrated architecture is co-designed with SW-mapping algorithms (encapsulated in a custom graph compiler), to provide efficiency across a broad range of mapping strategies, to overcome these overheads.
{"title":"A Programmable Neural-Network Inference Accelerator Based on Scalable In-Memory Computing","authors":"Hongyang Jia, Murat Ozatay, Yinqi Tang, Hossein Valavi, Rakshit Pathak, Jinseok Lee, N. Verma","doi":"10.1109/ISSCC42613.2021.9365788","DOIUrl":"https://doi.org/10.1109/ISSCC42613.2021.9365788","url":null,"abstract":"This paper presents a scalable neural-network (NN) inference accelerator in 16nm, based on an array of programmable cores employing mixed-signal In-Memory Computing (IMC), digital Near-Memory Computing (NMC), and localized buffering/control. IMC achieves high energy efficiency and throughput for matrix-vector multiplications (MVMs), which dominate NNs; but, scalability poses numerous challenges, both technologically, going to advanced nodes to maintain gains over digital architectures, and architecturally, for full execution of diverse NNs. Recent demonstrations have explored integrating IMC in programmable processors [1, 2], but have not achieved IMC efficiency and throughput for full executions. The central challenge is drastically different physical design points and associated tradeoffs incurred by IMC compared to digital engines. Namely, IMC substantially increases compute energy efficiency and HW density/parallelism, but retains the overheads of HW virtualization (state and data swapping/buffering/communication across spatial/temporal computation mappings). The demonstrated architecture is co-designed with SW-mapping algorithms (encapsulated in a custom graph compiler), to provide efficiency across a broad range of mapping strategies, to overcome these overheads.","PeriodicalId":371093,"journal":{"name":"2021 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129357852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-02-13DOI: 10.1109/ISSCC42613.2021.9366000
H. Fujiwara, Y. Nien, Chih-Yu Lin, H. Pan, H. Hsu, Shin-Rung Wu, Yao-Yi Liu, Yen-Huei Chen, H. Liao, Jonathan Chang
Continued scaling of the transistor increases random Vt variation, which limits the minimum operating voltage $(V_{mathrm{MIN}})$. Furthermore, fin formation differences between the SRAM bitcells, the peripheral circuits and the standard logic degrade area efficiency due to the empty spaces at fin-to-fin boundary and the required dummy [1]. Memories with small capacities that use the classical SRAM design suffer from this issue the most. In this paper, we will propose a 5nm digital-based SRAM macro with a 16T cell supporting a bit-write-mask operation. We adopted the standard cell rules for the proposed SRAM layout design. The area of the 16T cell is larger than the foundry’s 6T SRAM cell; however, the total macro area of a small capacity SRAM is smaller since there is no empty space in the macro and due to its simple peripheral circuit. In addition, the proposed SRAM can be directly abutted with the standard cell region. The proposed SRAM can support ultra-wide range voltage operation due to the advantages of a digital-based bitcell design.
{"title":"A 5nm 5.7GHz@1.0V and 1.3GHz@0.5V 4kb Standard-Cell- Based Two-Port Register File with a 16T Bitcell with No Half-Selection Issue","authors":"H. Fujiwara, Y. Nien, Chih-Yu Lin, H. Pan, H. Hsu, Shin-Rung Wu, Yao-Yi Liu, Yen-Huei Chen, H. Liao, Jonathan Chang","doi":"10.1109/ISSCC42613.2021.9366000","DOIUrl":"https://doi.org/10.1109/ISSCC42613.2021.9366000","url":null,"abstract":"Continued scaling of the transistor increases random Vt variation, which limits the minimum operating voltage $(V_{mathrm{MIN}})$. Furthermore, fin formation differences between the SRAM bitcells, the peripheral circuits and the standard logic degrade area efficiency due to the empty spaces at fin-to-fin boundary and the required dummy [1]. Memories with small capacities that use the classical SRAM design suffer from this issue the most. In this paper, we will propose a 5nm digital-based SRAM macro with a 16T cell supporting a bit-write-mask operation. We adopted the standard cell rules for the proposed SRAM layout design. The area of the 16T cell is larger than the foundry’s 6T SRAM cell; however, the total macro area of a small capacity SRAM is smaller since there is no empty space in the macro and due to its simple peripheral circuit. In addition, the proposed SRAM can be directly abutted with the standard cell region. The proposed SRAM can support ultra-wide range voltage operation due to the advantages of a digital-based bitcell design.","PeriodicalId":371093,"journal":{"name":"2021 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"03 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127396733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-02-13DOI: 10.1109/ISSCC42613.2021.9365949
Zhengkun Shen, Haoyun Jiang, Fan Yang, Yixiao Wang, Zherui Zhang, Junhua Liu, H. Liao
Frequency synthesizers are critical for millimeter-wave (mm-wave) frequency-modulated continuous-wave (FMCW) radars. Large-chirp-bandwidth ($BW_{chirp})$ sawtooth waveforms are required to be synthesized with fast slope and high-frequency linearity for accurate detection of targets or high-quality imaging. Fractional-N phase-locked loops (PLLs) with a two-point-modulation (TPM) scheme are widely used to synthesize fast high-linearity chirps [1] –[3]. However, for a wideband multi-bank digitally controlled oscillator (DCO), the intrinsic $1/ surd Lmathrm{C}$ nonlinearity and the frequency discontinuity from overlaps between adjacent tuning bands introduce a significant gain mismatch between the two modulation paths of the TPM scheme and degrade the chirp linearity [3], [5]. To linearize the DCO tuning curve, a piecewise linear pre-distortion method is commonly used as shown in Fig. 32.5.1 [3]. In this method, the overlaps are mitigated by scaling every band with the same factor SC, which is based on the assumption that ratios of each DCO tuning band to the corresponding coarse frequency step remain the same. In practice, precise matching between these bands cannot be guaranteed. Each tuning band is then linearly fitted with its average gain $g_{i}$, but non-ideal residual frequency errors may still deteriorate the chirp linearity. As a DCO bandwidth increases, the mismatches and residual frequency errors tend to be more severe, making this method unsuitable for wideband FMCW synthesizers.
频率合成器是毫米波(mm波)调频连续波(FMCW)雷达的关键器件。大啁啾带宽($BW_{chirp})$锯齿波需要具有快速斜率和高线性度才能准确检测目标或实现高质量成像。双点调制(TPM)方案的分数n锁相环(pll)被广泛用于合成快速的高线性啁啾[1]-[3]。然而,对于宽带多组数字控制振荡器(DCO),固有的$1/ surd L maththrm {C}$非线性和相邻调谐带之间重叠的频率不连续导致TPM方案的两个调制路径之间的显著增益失配,并降低了啁啾线性度[3],[5]。为了使DCO调谐曲线线性化,通常采用分段线性预失真方法,如图32.5.1[3]所示。在这种方法中,通过用相同的SC因子缩放每个频带来减轻重叠,这是基于每个DCO调谐频带与相应粗频率步长的比率保持不变的假设。在实际应用中,不能保证这些波段之间的精确匹配。然后用其平均增益$g_{i}$线性拟合每个调谐带,但非理想剩余频率误差仍可能使啁啾线性恶化。随着DCO带宽的增加,失配和剩余频率误差趋于严重,使得该方法不适用于宽带FMCW合成器。
{"title":"32.5 A 24GHz Self-Calibrated ADPLL-Based FMCW Synthesizer with 0.01% rms Frequency Error Under 3.2GHz Chirp Bandwidth and 320MHz/μs Slope","authors":"Zhengkun Shen, Haoyun Jiang, Fan Yang, Yixiao Wang, Zherui Zhang, Junhua Liu, H. Liao","doi":"10.1109/ISSCC42613.2021.9365949","DOIUrl":"https://doi.org/10.1109/ISSCC42613.2021.9365949","url":null,"abstract":"Frequency synthesizers are critical for millimeter-wave (mm-wave) frequency-modulated continuous-wave (FMCW) radars. Large-chirp-bandwidth ($BW_{chirp})$ sawtooth waveforms are required to be synthesized with fast slope and high-frequency linearity for accurate detection of targets or high-quality imaging. Fractional-N phase-locked loops (PLLs) with a two-point-modulation (TPM) scheme are widely used to synthesize fast high-linearity chirps [1] –[3]. However, for a wideband multi-bank digitally controlled oscillator (DCO), the intrinsic $1/ surd Lmathrm{C}$ nonlinearity and the frequency discontinuity from overlaps between adjacent tuning bands introduce a significant gain mismatch between the two modulation paths of the TPM scheme and degrade the chirp linearity [3], [5]. To linearize the DCO tuning curve, a piecewise linear pre-distortion method is commonly used as shown in Fig. 32.5.1 [3]. In this method, the overlaps are mitigated by scaling every band with the same factor SC, which is based on the assumption that ratios of each DCO tuning band to the corresponding coarse frequency step remain the same. In practice, precise matching between these bands cannot be guaranteed. Each tuning band is then linearly fitted with its average gain $g_{i}$, but non-ideal residual frequency errors may still deteriorate the chirp linearity. As a DCO bandwidth increases, the mismatches and residual frequency errors tend to be more severe, making this method unsuitable for wideband FMCW synthesizers.","PeriodicalId":371093,"journal":{"name":"2021 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127519848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-02-13DOI: 10.1109/ISSCC42613.2021.9365751
Jongeun Park, Sungbong Park, K. Cho, Taehun Lee, Changkyu Lee, Donghyun Kim, Beomsuk Lee, Sungin Kim, H. Ji, Dongmo Im, Haeyong Park, Jinyoung Kim, J. Cha, Taehoon Kim, I. Joe, Soojin Hong, Chongkwang Chang, Jingyun Kim, W. Shim, Taehee Kim, Jamie Lee, D. Park, Euiyeol Kim, Howoo Park, Jaekyu Lee, Yitae Kim, JungChak Ahn, Youngki Chung, ChungSam Jun, Hyunchul Kim, Changrok Moon, Ho-Kyu Kang
For years, there has been a strong drive for sub-micron pixel development, in spite of reaching the visible light diffraction limit, because a smaller pixel pitch of CMOS image sensors (CISs) is inevitably required for ever-miniaturizing camera modules as mobile devices incorporate more cameras, few of which are dedicated to ultra-high-resolution zoomed images [1]. To that end, image sensor vendors have tried to find new ways to avoid reduction in sensitivity and more crosstalk in the sensor through pixel architecture change and/or fabrication process refinement [2] –[4]. For example, a $0.7 mu m$ pixel sensor was demonstrated with acceptable photodiode (PD) full-well capacity (FWC) of $gt 6$,000e- as well as signal-to-noise ratio (SNR) of $sim32$ dB without optical/electrical crosstalk by employing state-of-the-art full-depth deep-trench isolations (FDTIs). [4] However, further scaling requires elaborate fabrication innovation and layout ideas. At the same time, meeting every aspect of pixel performance compared to the previous generation becomes even more difficult, e.g., with respect to dark or illuminated characteristics, fixed-pattern or temporal noises, etc. The latter, in particular, is associated with in-pixel source-follower (SF) amplifiers. Therefore, electrical performance of scaled in-pixel transistors cannot be overlooked. In this paper, a 32-megpixel (MP) CIS with $0.64 mu m$ unit pixels is demonstrated with FDTI design. Innovations in terms of fabrication and design to achieve this performance with scaling are discussed.
多年来,尽管达到了可见光衍射极限,但亚微米像素的发展一直受到强烈的推动,因为随着移动设备包含更多的相机,越来越小型化的相机模块不可避免地需要更小的CMOS图像传感器(CISs)的像素间距,其中很少有专门用于超高分辨率缩放图像[1]。为此,图像传感器供应商试图通过改变像素架构和/或改进制造工艺来寻找新的方法来避免传感器中灵敏度降低和更多串扰[2]-[4]。例如,通过采用最先进的全深度深沟隔离(FDTIs), $0.7 mu m$像素传感器具有可接受的光电二极管(PD)全井容量(FWC) $gt 6$,000e,以及$sim32$ dB的信噪比(SNR),没有光/电串扰。[4]然而,进一步的规模化需要精细的制造创新和布局理念。与此同时,与上一代相比,满足像素性能的各个方面变得更加困难,例如,关于黑暗或照明特性,固定模式或时间噪声等。后者尤其与像素内源跟随器(SF)放大器相关。因此,缩放像素内晶体管的电性能不容忽视。本文采用FDTI设计,演示了一个单位像素为$0.64 mu m$的3200万像素CIS。在制造和设计方面的创新,以实现这种性能与缩放进行了讨论。
{"title":"7.9 1/2.74-inch 32Mpixel-Prototype CMOS Image Sensor with 0.64μ m Unit Pixels Separated by Full-Depth Deep-Trench Isolation","authors":"Jongeun Park, Sungbong Park, K. Cho, Taehun Lee, Changkyu Lee, Donghyun Kim, Beomsuk Lee, Sungin Kim, H. Ji, Dongmo Im, Haeyong Park, Jinyoung Kim, J. Cha, Taehoon Kim, I. Joe, Soojin Hong, Chongkwang Chang, Jingyun Kim, W. Shim, Taehee Kim, Jamie Lee, D. Park, Euiyeol Kim, Howoo Park, Jaekyu Lee, Yitae Kim, JungChak Ahn, Youngki Chung, ChungSam Jun, Hyunchul Kim, Changrok Moon, Ho-Kyu Kang","doi":"10.1109/ISSCC42613.2021.9365751","DOIUrl":"https://doi.org/10.1109/ISSCC42613.2021.9365751","url":null,"abstract":"For years, there has been a strong drive for sub-micron pixel development, in spite of reaching the visible light diffraction limit, because a smaller pixel pitch of CMOS image sensors (CISs) is inevitably required for ever-miniaturizing camera modules as mobile devices incorporate more cameras, few of which are dedicated to ultra-high-resolution zoomed images [1]. To that end, image sensor vendors have tried to find new ways to avoid reduction in sensitivity and more crosstalk in the sensor through pixel architecture change and/or fabrication process refinement [2] –[4]. For example, a $0.7 mu m$ pixel sensor was demonstrated with acceptable photodiode (PD) full-well capacity (FWC) of $gt 6$,000e- as well as signal-to-noise ratio (SNR) of $sim32$ dB without optical/electrical crosstalk by employing state-of-the-art full-depth deep-trench isolations (FDTIs). [4] However, further scaling requires elaborate fabrication innovation and layout ideas. At the same time, meeting every aspect of pixel performance compared to the previous generation becomes even more difficult, e.g., with respect to dark or illuminated characteristics, fixed-pattern or temporal noises, etc. The latter, in particular, is associated with in-pixel source-follower (SF) amplifiers. Therefore, electrical performance of scaled in-pixel transistors cannot be overlooked. In this paper, a 32-megpixel (MP) CIS with $0.64 mu m$ unit pixels is demonstrated with FDTI design. Innovations in terms of fabrication and design to achieve this performance with scaling are discussed.","PeriodicalId":371093,"journal":{"name":"2021 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129966254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-02-13DOI: 10.1109/ISSCC42613.2021.9365927
G. Kiene, A. Catania, Ramon W. J. Overwater, P. Bruschi, E. Charbon, M. Babaie, F. Sebastiano
Quantum computers (QCs) promise significant speedup for relevant computational problems that are intractable by classical computers. QCs process information stored in quantum bits (qubits) that must be typically cooled down to cryogenic temperatures. Since state-of-the-art QCs employ only a few qubits, those qubits can be driven and read out by room-temperature electronics connected to the cryogenic qubits by only a few wires. However, practical QCs will require more than thousands of qubits, making this approach impractical due to system complexity and reliability concerns. Although frequency multiplexing would reduce the interconnects to room temperature by fitting many qubit channels in the same physical interconnect, an excessive number of interconnects would still be required. An alternative, more scalable solution is a cryogenic electronic interface operating very close to the quantum processor to keep the whole control loop at cryogenic temperature, hence avoiding any high-speed interconnect to room temperature. This system must comprise drivers, readout circuits (LNAs, ADCs), and a digital controller to steer the quantum-algorithm execution [1]. While cryogenic CMOS (cryo-CMOS) wideband drivers and LNAs supporting qubit frequency multiplexing have been shown before [1] –[3], no wideband cryo-CMOS ADC has been demonstrated yet.
{"title":"13.4 A 1GS/s 6-to-8b 0.5mW/Qubit Cryo-CMOS SAR ADC for Quantum Computing in 40nm CMOS","authors":"G. Kiene, A. Catania, Ramon W. J. Overwater, P. Bruschi, E. Charbon, M. Babaie, F. Sebastiano","doi":"10.1109/ISSCC42613.2021.9365927","DOIUrl":"https://doi.org/10.1109/ISSCC42613.2021.9365927","url":null,"abstract":"Quantum computers (QCs) promise significant speedup for relevant computational problems that are intractable by classical computers. QCs process information stored in quantum bits (qubits) that must be typically cooled down to cryogenic temperatures. Since state-of-the-art QCs employ only a few qubits, those qubits can be driven and read out by room-temperature electronics connected to the cryogenic qubits by only a few wires. However, practical QCs will require more than thousands of qubits, making this approach impractical due to system complexity and reliability concerns. Although frequency multiplexing would reduce the interconnects to room temperature by fitting many qubit channels in the same physical interconnect, an excessive number of interconnects would still be required. An alternative, more scalable solution is a cryogenic electronic interface operating very close to the quantum processor to keep the whole control loop at cryogenic temperature, hence avoiding any high-speed interconnect to room temperature. This system must comprise drivers, readout circuits (LNAs, ADCs), and a digital controller to steer the quantum-algorithm execution [1]. While cryogenic CMOS (cryo-CMOS) wideband drivers and LNAs supporting qubit frequency multiplexing have been shown before [1] –[3], no wideband cryo-CMOS ADC has been demonstrated yet.","PeriodicalId":371093,"journal":{"name":"2021 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130997458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-02-13DOI: 10.1109/ISSCC42613.2021.9365802
Hao Li, J. Sharma, Chun-Ming Hsu, G. Balamurugan, J. Jaussi
Several 400G Ethernet standards (e.g. 400G-DR4/FR4) have been developed to address the rapid increase in interconnect BW demand created by data-centric computing [1]. Low-cost100Gb/s PAM-4 optical transceivers are critical to spur their adoption in high volume by data centers. While low-cost integrated silicon-photonic 100Gb/s PAM-4 transmitters have been demonstrated recently, the electronics in current receiver solutions is more disaggregated. They typically employ a standalone BiCMOS TIA 1C followed by a 100G PAM-4 (ADC+DSP)-based SerDes 1C (designed to equalize high-loss electrical channels), which results in higher power dissipation and package cost. To address these drawbacks, we present a 100Gb/s PAM-4 optical RX with a single-chip Solution integrating all 0f the RX electronics in a bulk CMOS process. While standalone l00Gb/s PAM-4 CMOS linear TIAs have been shown in prior work [2], [3], their integration with subsequent SerDes has not yet been demonstrated.
{"title":"11.6 A 100Gb/s-8.3dBm-Sensitivity PAM-4 Optical Receiver with Integrated TIA, FFE and Direct-Feedback DFE in 28nm CMOS","authors":"Hao Li, J. Sharma, Chun-Ming Hsu, G. Balamurugan, J. Jaussi","doi":"10.1109/ISSCC42613.2021.9365802","DOIUrl":"https://doi.org/10.1109/ISSCC42613.2021.9365802","url":null,"abstract":"Several 400G Ethernet standards (e.g. 400G-DR4/FR4) have been developed to address the rapid increase in interconnect BW demand created by data-centric computing [1]. Low-cost100Gb/s PAM-4 optical transceivers are critical to spur their adoption in high volume by data centers. While low-cost integrated silicon-photonic 100Gb/s PAM-4 transmitters have been demonstrated recently, the electronics in current receiver solutions is more disaggregated. They typically employ a standalone BiCMOS TIA 1C followed by a 100G PAM-4 (ADC+DSP)-based SerDes 1C (designed to equalize high-loss electrical channels), which results in higher power dissipation and package cost. To address these drawbacks, we present a 100Gb/s PAM-4 optical RX with a single-chip Solution integrating all 0f the RX electronics in a bulk CMOS process. While standalone l00Gb/s PAM-4 CMOS linear TIAs have been shown in prior work [2], [3], their integration with subsequent SerDes has not yet been demonstrated.","PeriodicalId":371093,"journal":{"name":"2021 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130436903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-02-13DOI: 10.1109/ISSCC42613.2021.9366026
Hsueh-Yen Shen, Yu-Chi Lee, Tzu-Wei Tong, Chia-Hsiang Yang
Super resolution is the process of reconstructing a high-resolution (HR) image from a low-resolution (LR) one. Super-resolution technology enables high-resolution video streaming, image zoom-in, and far object recognition. Fig. 4.7.1 shows such an application scenario. The details of the videos/images can be reconstructed and projected to a higher-resolution screen, thereby providing a better visual experience. A hardware accelerator is needed to speed up the super-resolution process to support real-time high-resolution video streaming. Conventionally, dictionary-based approaches, such as ANR/GR [1] and A+ [2], convert the LR image into the HR one from learned mapping functions. Neural network (NN)-based algorithms generate better-quality super-resolution images by extracting features from training [3]. However, the complexity of the dictionary-based and the NN-based algorithms is excessively high, making them unsuitable for high-speed applications [4]. A rapid and accurate image super resolution (RAISR) algorithm [4] is proposed to achieve comparable quality with a much faster processing speed when compared to the previous solutions. It employs pre-learned filters to enhance the image quality based on bicubic interpolation. A pre-learned filter (also known as kernel) is selected by a hash function to address the structure-related details.
{"title":"4.7 A 91mW 90fps Super-Resolution Processor for Full HD Images","authors":"Hsueh-Yen Shen, Yu-Chi Lee, Tzu-Wei Tong, Chia-Hsiang Yang","doi":"10.1109/ISSCC42613.2021.9366026","DOIUrl":"https://doi.org/10.1109/ISSCC42613.2021.9366026","url":null,"abstract":"Super resolution is the process of reconstructing a high-resolution (HR) image from a low-resolution (LR) one. Super-resolution technology enables high-resolution video streaming, image zoom-in, and far object recognition. Fig. 4.7.1 shows such an application scenario. The details of the videos/images can be reconstructed and projected to a higher-resolution screen, thereby providing a better visual experience. A hardware accelerator is needed to speed up the super-resolution process to support real-time high-resolution video streaming. Conventionally, dictionary-based approaches, such as ANR/GR [1] and A+ [2], convert the LR image into the HR one from learned mapping functions. Neural network (NN)-based algorithms generate better-quality super-resolution images by extracting features from training [3]. However, the complexity of the dictionary-based and the NN-based algorithms is excessively high, making them unsuitable for high-speed applications [4]. A rapid and accurate image super resolution (RAISR) algorithm [4] is proposed to achieve comparable quality with a much faster processing speed when compared to the previous solutions. It employs pre-learned filters to enhance the image quality based on bicubic interpolation. A pre-learned filter (also known as kernel) is selected by a hash function to address the structure-related details.","PeriodicalId":371093,"journal":{"name":"2021 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127840027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}