Pub Date : 2021-02-13DOI: 10.1109/ISSCC42613.2021.9365964
Dong-Hoon Jung, Tae-Hwang Kong, Jun-Hyeok Yang, SangHo Kim, Kwang-Ho Kim, J. Park, Michael Choi, Jongshin Shin
Although the number of cores is increasing continuously in modern microprocessors for applications such as HPC and AI, the available power is strictly limited by the thermal power budget. To overcome this limitation, recently, each core has been implemented with a dedicated integrated voltage regulator to increase the efficiency of power usage. Distributed digital LDO (DLDO) is a powerful solution for the integrated voltage regulator because it can supply uniform power over the entire core with reduced IR drop and help the thermal management [1– 4]. In the previous distributed DLDOs [1– 3], even though all LDO outputs are connected to drive the power-delivery network, the LDOs operate independently using their own controller, which occupies a large portion of the LDO size. Therefore, the current density in these types of structures is low. In [4], the distributed DLDO uses a dual-loop structure. In this scheme, the high current density can be achieved because the four shared global controllers control the 16 local LDOs (LLDOs) for highly accurate regulation. However, the LLDOs consume large quiescent current since they operate at a switching frequency of several-GHz for a fast transient response. Besides, the load current range is narrow due to the small switching duty-cycle range of the power FETs. Because of these drawbacks, the structure proposed in [4] has limitations in practical applications.
{"title":"29.6 A Distributed Digital LDO with Time-Multiplexing Calibration Loop Achieving 40A/mm2 Current Density and 1mA-to-6.4A Ultra-Wide Load Range in 5nm FinFET CMOS","authors":"Dong-Hoon Jung, Tae-Hwang Kong, Jun-Hyeok Yang, SangHo Kim, Kwang-Ho Kim, J. Park, Michael Choi, Jongshin Shin","doi":"10.1109/ISSCC42613.2021.9365964","DOIUrl":"https://doi.org/10.1109/ISSCC42613.2021.9365964","url":null,"abstract":"Although the number of cores is increasing continuously in modern microprocessors for applications such as HPC and AI, the available power is strictly limited by the thermal power budget. To overcome this limitation, recently, each core has been implemented with a dedicated integrated voltage regulator to increase the efficiency of power usage. Distributed digital LDO (DLDO) is a powerful solution for the integrated voltage regulator because it can supply uniform power over the entire core with reduced IR drop and help the thermal management [1– 4]. In the previous distributed DLDOs [1– 3], even though all LDO outputs are connected to drive the power-delivery network, the LDOs operate independently using their own controller, which occupies a large portion of the LDO size. Therefore, the current density in these types of structures is low. In [4], the distributed DLDO uses a dual-loop structure. In this scheme, the high current density can be achieved because the four shared global controllers control the 16 local LDOs (LLDOs) for highly accurate regulation. However, the LLDOs consume large quiescent current since they operate at a switching frequency of several-GHz for a fast transient response. Besides, the load current range is narrow due to the small switching duty-cycle range of the power FETs. Because of these drawbacks, the structure proposed in [4] has limitations in practical applications.","PeriodicalId":371093,"journal":{"name":"2021 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131850046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-02-13DOI: 10.1109/ISSCC42613.2021.9365939
D. Rossi, Francesco Conti, M. Eggimann, Stefan Mach, Alfio Di Mauro, M. Guermandi, Giuseppe Tagliavini, A. Pullini, Igor Loi, Jie Chen, E. Flamand, L. Benini
The Internet-of-Things requires end-nodes with ultra-low-power always-on capability for long battery lifetime, as well as high performance, energy efficiency, and extreme flexibility to deal with complex and fast-evolving near-sensor analytics algorithms (NSAAs). We present Vega, an always-on IoT end-node SoC capable of scaling from a 1.7$mu$ W fully retentive COGNITIVE sleep mode up to 32.2GOPS (@49.4mW) peak performance on NSAAs, including mobile DNN inference, exploiting 1.6MB of state- retentive SRAM, and 4MB of non-volatile MRAM. To meet the performance and flexibility requirements of NSAAs, the SoC features 10 RISC-V cores: one core for SoC and IO management and a 9-core cluster supporting multi-precision SIMD integer and floating- point computation. Two programmable machine-learning (ML) accelerators boost energy efficiency in sleep and active state, respectively.
{"title":"4.4 A 1.3TOPS/W @ 32GOPS Fully Integrated 10-Core SoC for IoT End-Nodes with 1.7μW Cognitive Wake-Up From MRAM-Based State-Retentive Sleep Mode","authors":"D. Rossi, Francesco Conti, M. Eggimann, Stefan Mach, Alfio Di Mauro, M. Guermandi, Giuseppe Tagliavini, A. Pullini, Igor Loi, Jie Chen, E. Flamand, L. Benini","doi":"10.1109/ISSCC42613.2021.9365939","DOIUrl":"https://doi.org/10.1109/ISSCC42613.2021.9365939","url":null,"abstract":"The Internet-of-Things requires end-nodes with ultra-low-power always-on capability for long battery lifetime, as well as high performance, energy efficiency, and extreme flexibility to deal with complex and fast-evolving near-sensor analytics algorithms (NSAAs). We present Vega, an always-on IoT end-node SoC capable of scaling from a 1.7$mu$ W fully retentive COGNITIVE sleep mode up to 32.2GOPS (@49.4mW) peak performance on NSAAs, including mobile DNN inference, exploiting 1.6MB of state- retentive SRAM, and 4MB of non-volatile MRAM. To meet the performance and flexibility requirements of NSAAs, the SoC features 10 RISC-V cores: one core for SoC and IO management and a 9-core cluster supporting multi-precision SIMD integer and floating- point computation. Two programmable machine-learning (ML) accelerators boost energy efficiency in sleep and active state, respectively.","PeriodicalId":371093,"journal":{"name":"2021 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131877693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-02-13DOI: 10.1109/ISSCC42613.2021.9366025
Xiaomin Li, Yibo Xu, Lizheng Ren, Weiwei Ge, Jianlong Cai, Xinning Liu, Jun Yang
Limited by battery capacity, advanced MCUs for IoT applications require ultra-low power consumption. In a conventional design, most modules except the crystal oscillator (XO32), real-time clock (RTC), and retention memory are turned off to reduce the current in sleep state, but the sleep power still accounts for most of the total power consumption. When the load current is reduced to $sim 100$ nA, the transient current of a switched capacitor voltage regulator (SCVR) remains unchanged $(sim 100$ nA), so that power efficiency is low and the sleep current cannot be reduced further. Voltage stacking has been proposed to address power efficiency [1, 2]. Prior voltage stacking architectures could not realize dynamic switching between flat mode and stack mode, leading to high dynamic power in the normal state. In addition, the SCVR still consumes some power during the sleep state [3]. This paper proposes a dynamic voltage-stacking scheme, which supports two operating modes: a flat mode in the normal state and a stack mode in the sleep state. In the flat mode, the retention memory, RTC, and XO32 are connected in parallel and are powered by the SCVR. In the stack mode, the four instances are connected in series, including the SRAM1 (level1), the SRAM2 (level2), the XO32, and the RTC (level3), and the on-chip SCVR is shut down for power saving.
{"title":"29.8 115nA@3V ULPMark-CP Score 1205 SCVR-Less Dynamic Voltage-Stacking Scheme for IoT MCU","authors":"Xiaomin Li, Yibo Xu, Lizheng Ren, Weiwei Ge, Jianlong Cai, Xinning Liu, Jun Yang","doi":"10.1109/ISSCC42613.2021.9366025","DOIUrl":"https://doi.org/10.1109/ISSCC42613.2021.9366025","url":null,"abstract":"Limited by battery capacity, advanced MCUs for IoT applications require ultra-low power consumption. In a conventional design, most modules except the crystal oscillator (XO32), real-time clock (RTC), and retention memory are turned off to reduce the current in sleep state, but the sleep power still accounts for most of the total power consumption. When the load current is reduced to $sim 100$ nA, the transient current of a switched capacitor voltage regulator (SCVR) remains unchanged $(sim 100$ nA), so that power efficiency is low and the sleep current cannot be reduced further. Voltage stacking has been proposed to address power efficiency [1, 2]. Prior voltage stacking architectures could not realize dynamic switching between flat mode and stack mode, leading to high dynamic power in the normal state. In addition, the SCVR still consumes some power during the sleep state [3]. This paper proposes a dynamic voltage-stacking scheme, which supports two operating modes: a flat mode in the normal state and a stack mode in the sleep state. In the flat mode, the retention memory, RTC, and XO32 are connected in parallel and are powered by the SCVR. In the stack mode, the four instances are connected in series, including the SRAM1 (level1), the SRAM2 (level2), the XO32, and the RTC (level3), and the on-chip SCVR is shut down for power saving.","PeriodicalId":371093,"journal":{"name":"2021 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128657690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-02-13DOI: 10.1109/ISSCC42613.2021.9365997
Atharav Atharav, B. Razavi
The power consumption of wireline transceivers has become increasingly critical as higher data rates and a larger numbers of lanes per chip are sought [1] –[6]. While attractive for lossy channels, PAM-4 signaling has mostly dictated ADC-based receivers (RXs) and relatively high power consumption [1], [2]. Non-return-to-zero (NRZ) receivers, on the other hand, can be realized in the analog domain, potentially consuming less power, but they must deal with a greater loss. This paper introduces an NRZ RX that achieves more than a twofold reduction in power while exhibiting BER < 10-12 for a channel loss of 25dB at 28GHz. The proposed design can compete with PAM-4 counterparts and/or serve in 112Gb/s systems that must also support 56Gb/s reception. Figure 11.7.1 shows the RX architecture. The data path consists of a CTLE core, a DFE core, a discrete-time linear equalizer (DTLE) [4], and a DMUX. The receiver performance is greatly improved by a number of feedforward and feedback paths. Also proposed is a half-rate “band-pass” CDR that avoids loading the main data path and the use of quadrature VCOs.
{"title":"11.7 A 56Gb/s 50mW NRZ Receiver in 28nm CMOS","authors":"Atharav Atharav, B. Razavi","doi":"10.1109/ISSCC42613.2021.9365997","DOIUrl":"https://doi.org/10.1109/ISSCC42613.2021.9365997","url":null,"abstract":"The power consumption of wireline transceivers has become increasingly critical as higher data rates and a larger numbers of lanes per chip are sought [1] –[6]. While attractive for lossy channels, PAM-4 signaling has mostly dictated ADC-based receivers (RXs) and relatively high power consumption [1], [2]. Non-return-to-zero (NRZ) receivers, on the other hand, can be realized in the analog domain, potentially consuming less power, but they must deal with a greater loss. This paper introduces an NRZ RX that achieves more than a twofold reduction in power while exhibiting BER < 10-12 for a channel loss of 25dB at 28GHz. The proposed design can compete with PAM-4 counterparts and/or serve in 112Gb/s systems that must also support 56Gb/s reception. Figure 11.7.1 shows the RX architecture. The data path consists of a CTLE core, a DFE core, a discrete-time linear equalizer (DTLE) [4], and a DMUX. The receiver performance is greatly improved by a number of feedforward and feedback paths. Also proposed is a half-rate “band-pass” CDR that avoids loading the main data path and the use of quadrature VCOs.","PeriodicalId":371093,"journal":{"name":"2021 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134599659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-02-13DOI: 10.1109/ISSCC42613.2021.9365941
Qiang Zhou, Yan He, Kaiyuan Yang, T. Chi
It is projected that 75 billion Internet-of-Things (IoT) devices will be deployed for applications such as wearable electronics and smart home by 2025. Securing IoT devices is one of the most significant barriers we need to overcome for large-scale IoT adoption. Conventional wireless security has been implemented solely using upper-layer cryptography [1]. Unfortunately, IoT nodes are often energy-constrained and may not have enough computational resources to implement advanced asymmetric cryptographic algorithms and public-key-infrastructures (PKI) [2]–[3]. To overcome this challenge, there has been growing interest in leveraging the physical impairments of the radios that are bonded to specific TX for secure identification [4] –[6], a.k.a. RF fingerprinting. If Bob (the RX) has sufficient sensitivity, it can identify Alice (the legitimate TX) and the malicious impersonator during demodulation based on their inherent radio signatures, similar to how we distinguish different people based on their unique voice signatures (Fig. 12.3.1). As the device-dependent radio impairments come from process variation, it is challenging for impersonators to forge in practice. In addition, unlike conventional identification approach that device IDs are inserted in preambles and checked only once a while, RF fingerprinting enables continuous identification at any moment during communication, leading to a tighter bond between the data packet and device.
{"title":"12.3 Exploring PUF-Controlled PA Spectral Regrowth for Physical-Layer Identification of IoT Nodes","authors":"Qiang Zhou, Yan He, Kaiyuan Yang, T. Chi","doi":"10.1109/ISSCC42613.2021.9365941","DOIUrl":"https://doi.org/10.1109/ISSCC42613.2021.9365941","url":null,"abstract":"It is projected that 75 billion Internet-of-Things (IoT) devices will be deployed for applications such as wearable electronics and smart home by 2025. Securing IoT devices is one of the most significant barriers we need to overcome for large-scale IoT adoption. Conventional wireless security has been implemented solely using upper-layer cryptography [1]. Unfortunately, IoT nodes are often energy-constrained and may not have enough computational resources to implement advanced asymmetric cryptographic algorithms and public-key-infrastructures (PKI) [2]–[3]. To overcome this challenge, there has been growing interest in leveraging the physical impairments of the radios that are bonded to specific TX for secure identification [4] –[6], a.k.a. RF fingerprinting. If Bob (the RX) has sufficient sensitivity, it can identify Alice (the legitimate TX) and the malicious impersonator during demodulation based on their inherent radio signatures, similar to how we distinguish different people based on their unique voice signatures (Fig. 12.3.1). As the device-dependent radio impairments come from process variation, it is challenging for impersonators to forge in practice. In addition, unlike conventional identification approach that device IDs are inserted in preambles and checked only once a while, RF fingerprinting enables continuous identification at any moment during communication, leading to a tighter bond between the data packet and device.","PeriodicalId":371093,"journal":{"name":"2021 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133510129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-02-13DOI: 10.1109/ISSCC42613.2021.9365838
K. Dandu, S. Samala, K. Bhatia, M. Moallem, Karthik Subburaj, Z. Ahmad, Daniel Breen, Sunhwan Jang, Tim Davis, Mayank Singh, Shankar Ram, Vashishth Dudhia, M. Dewilde, Dheeraj Shetty, J. Samuel, Z. Parkar, Cathy Chi, Pilar Loya, Zachary Crawford, John B. Herrington, R. Kulak, Abhinav Daga, Rakesh Raavi, R. Teja, Rajesh Veettil, Daniel Khemraj, Indu Prathapan, P. Narayanan, Naveen Narayanan, Sangamesh Anandwade, Jasbir Singh, V. Srinivasan, Neeraj P. Nayak, K. Ramasubramanian, B. Ginsburg, V. Rentala
Millimeterwave (mm-Wave) radar sensors operating in the 76-to-81 GHz band are a key component of advanced driver-assistance systems (ADAS) for enhanced automotive safety. The recent entry of CMOS solutions in this space has accelerated development of multi-mode Radars that can support long, medium and short-range applications [1–3]. As ADAS applications evolve to support higher levels of autonomy, there is increased demand on radar sensors for improved maximum range, velocity, and angular resolution. Emerging automotive in-cabin occupancy sensing applications are creating opportunities for short-range, high-resolution sensors operating in 60/77GHz bands (depending on the regulatory market). The unlicensed 60GHz band has also enabled industrial sensing opportunities across diverse markets such as robotics, building automation, and healthcare. Several of these broad-market applications require inexpensive and small form factor sensors that can be deployed on low cost PCBs (e.g., FR4) without expertise in mm-Wave design. In this paper, we describe our high-performance 76-to-81GHz FMCW Automotive Radar that supports multi-chip cascading to enable higher angular resolution and a compact 57-to-64 GHz single-chip Radar with integrated antennas on package. All devices are built on a 45nm bulk CMOS technology with 9 metal layers and packaged using flip-chip BGA technology.
{"title":"High-Performance and Small Form-Factor mm-Wave CMOS Radars for Automotive and Industrial Sensing in 76-to-81GHz and 57-to-64GHz Bands","authors":"K. Dandu, S. Samala, K. Bhatia, M. Moallem, Karthik Subburaj, Z. Ahmad, Daniel Breen, Sunhwan Jang, Tim Davis, Mayank Singh, Shankar Ram, Vashishth Dudhia, M. Dewilde, Dheeraj Shetty, J. Samuel, Z. Parkar, Cathy Chi, Pilar Loya, Zachary Crawford, John B. Herrington, R. Kulak, Abhinav Daga, Rakesh Raavi, R. Teja, Rajesh Veettil, Daniel Khemraj, Indu Prathapan, P. Narayanan, Naveen Narayanan, Sangamesh Anandwade, Jasbir Singh, V. Srinivasan, Neeraj P. Nayak, K. Ramasubramanian, B. Ginsburg, V. Rentala","doi":"10.1109/ISSCC42613.2021.9365838","DOIUrl":"https://doi.org/10.1109/ISSCC42613.2021.9365838","url":null,"abstract":"Millimeterwave (mm-Wave) radar sensors operating in the 76-to-81 GHz band are a key component of advanced driver-assistance systems (ADAS) for enhanced automotive safety. The recent entry of CMOS solutions in this space has accelerated development of multi-mode Radars that can support long, medium and short-range applications [1–3]. As ADAS applications evolve to support higher levels of autonomy, there is increased demand on radar sensors for improved maximum range, velocity, and angular resolution. Emerging automotive in-cabin occupancy sensing applications are creating opportunities for short-range, high-resolution sensors operating in 60/77GHz bands (depending on the regulatory market). The unlicensed 60GHz band has also enabled industrial sensing opportunities across diverse markets such as robotics, building automation, and healthcare. Several of these broad-market applications require inexpensive and small form factor sensors that can be deployed on low cost PCBs (e.g., FR4) without expertise in mm-Wave design. In this paper, we describe our high-performance 76-to-81GHz FMCW Automotive Radar that supports multi-chip cascading to enable higher angular resolution and a compact 57-to-64 GHz single-chip Radar with integrated antennas on package. All devices are built on a 45nm bulk CMOS technology with 9 metal layers and packaged using flip-chip BGA technology.","PeriodicalId":371093,"journal":{"name":"2021 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133371579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-02-13DOI: 10.1109/ISSCC42613.2021.9366010
Preethi Padmanabhan, Chao Zhang, M. Cazzaniga, Baris C. Efe, A. Ximenes, Myung-Jae Lee, E. Charbon
3D vision is an increasingly important feature in many applications of consumer, automotive, industrial, and medical imaging. Long range, high depth resolution, high spatial resolution, and high frame rates, are often conflicting requirements and difficult to be simultaneously achieved, especially in extreme ambient light conditions. In order to address range and depth resolution, direct time-of-flight has emerged as a powerful technique to perform light detection and ranging (LiDAR), thanks to advances in low-jitter optical detectors, such as single-photon avalanche diodes (SPADs), and accurate time-to-digital converters (TDCs) [1]–[5]. High spatial resolution can be achieved using scanning, at a cost of system complexity and somewhat lower frame rates [1], [3], while FLASH sensors [2], 4, [5] offer an alternative for both high frame rates and large pixel counts, but at limited ambient light conditions, due to typically long exposure times.
{"title":"7.4 A 256×128 3D-Stacked (45nm) SPAD FLASH LiDAR with 7-Level Coincidence Detection and Progressive Gating for 100m Range and 10klux Background Light","authors":"Preethi Padmanabhan, Chao Zhang, M. Cazzaniga, Baris C. Efe, A. Ximenes, Myung-Jae Lee, E. Charbon","doi":"10.1109/ISSCC42613.2021.9366010","DOIUrl":"https://doi.org/10.1109/ISSCC42613.2021.9366010","url":null,"abstract":"3D vision is an increasingly important feature in many applications of consumer, automotive, industrial, and medical imaging. Long range, high depth resolution, high spatial resolution, and high frame rates, are often conflicting requirements and difficult to be simultaneously achieved, especially in extreme ambient light conditions. In order to address range and depth resolution, direct time-of-flight has emerged as a powerful technique to perform light detection and ranging (LiDAR), thanks to advances in low-jitter optical detectors, such as single-photon avalanche diodes (SPADs), and accurate time-to-digital converters (TDCs) [1]–[5]. High spatial resolution can be achieved using scanning, at a cost of system complexity and somewhat lower frame rates [1], [3], while FLASH sensors [2], 4, [5] offer an alternative for both high frame rates and large pixel counts, but at limited ambient light conditions, due to typically long exposure times.","PeriodicalId":371093,"journal":{"name":"2021 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130514873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-02-13DOI: 10.1109/ISSCC42613.2021.9365975
Ravi Shivnaraine, Marcus van Ierssel, K. Farzan, D. DiClemente, G. Ng, Nanyan Y. Wang, J. Musayev, Gairik Dutta, M. Shibata, A. Moradi, H. Vahedi, Manavi Farzad, Prabhnoor Kainth, Matt Yu, N. Nguyen, J. Pham, A. McLaren
The increasing connectivity of devices in our daily lives has driven the need for higher bandwidth in network and data centers. Recently, we have seen the development of 112Gb/s SerDes, particularly for long-reach interfaces [1– 3]. In high-density switch ASICs, we see an increasing demand to improve both area efficiency (mm2/lane) and signaling efficiencies (pJ/b) [1– 6]. In a switch ASIC, keeping the SerDes power low translates into broader system power savings since additional power and cost for cooling can be limited or even avoided entirely. One path forward to achieve these important system gains is co-packaged optics (CPO) with an extra-short-reach (XSR) interface. In these applications the switch ASIC and optical engine are no more than 50mm apart which represents a total loss of approximately 10dB at 106.25Gb/s.
{"title":"11.2 A 26.5625-to-106.25Gb/s XSR SerDes with 1.55pJ/b Efficiency in 7nm CMOS","authors":"Ravi Shivnaraine, Marcus van Ierssel, K. Farzan, D. DiClemente, G. Ng, Nanyan Y. Wang, J. Musayev, Gairik Dutta, M. Shibata, A. Moradi, H. Vahedi, Manavi Farzad, Prabhnoor Kainth, Matt Yu, N. Nguyen, J. Pham, A. McLaren","doi":"10.1109/ISSCC42613.2021.9365975","DOIUrl":"https://doi.org/10.1109/ISSCC42613.2021.9365975","url":null,"abstract":"The increasing connectivity of devices in our daily lives has driven the need for higher bandwidth in network and data centers. Recently, we have seen the development of 112Gb/s SerDes, particularly for long-reach interfaces [1– 3]. In high-density switch ASICs, we see an increasing demand to improve both area efficiency (mm2/lane) and signaling efficiencies (pJ/b) [1– 6]. In a switch ASIC, keeping the SerDes power low translates into broader system power savings since additional power and cost for cooling can be limited or even avoided entirely. One path forward to achieve these important system gains is co-packaged optics (CPO) with an extra-short-reach (XSR) interface. In these applications the switch ASIC and optical engine are no more than 50mm apart which represents a total loss of approximately 10dB at 106.25Gb/s.","PeriodicalId":371093,"journal":{"name":"2021 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125773733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-02-13DOI: 10.1109/ISSCC42613.2021.9365860
Abdullah Abdulslam, P. Mercier
Inductive DC-DC converters are fundamentally limited by the trade-off between conduction losses and switching losses. Miniaturized converters used in applications such as mobile devices suffer badly from this trade off, as a small inductor has a large DCR, which contributes large I2 RDCR conduction losses, while a small inductance desires high frequency operation, which implies high CGATE V2 f hard charging switching losses from the power MOSFET gate drivers. Interestingly, the rise/fall time of such drivers cannot be too rapid, regardless of switching frequency, due to inductive ringing causing potential voltage stresses [1], [2]. To ease the conduction/switching loss trade-off, it is possible to exploit the requirement for finite rise/fall time by replacing conventionally hard-switching gate drivers with adiabatic charge-recycling (CR) gate drivers. As depicted in Fig. 17.5.1 (top right), CR can, through the help of inductor LR, recycle the charge stored on CGATE to another capacitance, CSTORE (and vice-versa), theoretically with 100% efficiency. This approach was demonstrated in [3], where the charge on the power MOSFET gates are recycled to two auxiliary capacitors through two separate inductors (Fig. 17.5.1, top left). However, besides the overhead of two inductors, recycling with separate storage capacitors introduces indirect losses, while the separated duty-cycled resonate gate drivers makes non-overlap timing control between power MOSFETs difficult. By AC-coupling the power NMOS to the resonant gate driver as in [4] (Fig. 17.5.1, bottom left), it is possible to reduce the number of resonant inductors to 1. However, the non-overlap time cannot be precisely controlled, leading to potentially large overlap losses, and the limited duty-cycle control through driver slope modulation prevents robust regulation across a wide output range.
{"title":"17.5 A 98.2%-Efficiency Reciprocal Direct Charge Recycling Inductor-First DC-DC Converter","authors":"Abdullah Abdulslam, P. Mercier","doi":"10.1109/ISSCC42613.2021.9365860","DOIUrl":"https://doi.org/10.1109/ISSCC42613.2021.9365860","url":null,"abstract":"Inductive DC-DC converters are fundamentally limited by the trade-off between conduction losses and switching losses. Miniaturized converters used in applications such as mobile devices suffer badly from this trade off, as a small inductor has a large DCR, which contributes large I2 RDCR conduction losses, while a small inductance desires high frequency operation, which implies high CGATE V2 f hard charging switching losses from the power MOSFET gate drivers. Interestingly, the rise/fall time of such drivers cannot be too rapid, regardless of switching frequency, due to inductive ringing causing potential voltage stresses [1], [2]. To ease the conduction/switching loss trade-off, it is possible to exploit the requirement for finite rise/fall time by replacing conventionally hard-switching gate drivers with adiabatic charge-recycling (CR) gate drivers. As depicted in Fig. 17.5.1 (top right), CR can, through the help of inductor LR, recycle the charge stored on CGATE to another capacitance, CSTORE (and vice-versa), theoretically with 100% efficiency. This approach was demonstrated in [3], where the charge on the power MOSFET gates are recycled to two auxiliary capacitors through two separate inductors (Fig. 17.5.1, top left). However, besides the overhead of two inductors, recycling with separate storage capacitors introduces indirect losses, while the separated duty-cycled resonate gate drivers makes non-overlap timing control between power MOSFETs difficult. By AC-coupling the power NMOS to the resonant gate driver as in [4] (Fig. 17.5.1, bottom left), it is possible to reduce the number of resonant inductors to 1. However, the non-overlap time cannot be precisely controlled, leading to potentially large overlap losses, and the limited duty-cycle control through driver slope modulation prevents robust regulation across a wide output range.","PeriodicalId":371093,"journal":{"name":"2021 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117024137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-02-13DOI: 10.1109/ISSCC42613.2021.9365978
A. Ghosh, D. Das, Josef Danial, V. De, Santosh K. Ghosh, Shreyas Sen
Mathematically secure cryptographic algorithms leak side-channel information in the form of correlated power and electromagnetic (EM) signals, leading to physical side-channel analysis (SCA) attacks. Circuit-level countermeasures against power/EM SCA include current equalizer [1], series LDO [2], IVR [3], enhancing protection up to 10M traces. Recently, current domain signature attenuation [4] and randomized NL-LDO cascaded with arithmetic countermeasures [5] achieved $gt1mathrm{B}$ minimum traces to disclosure (MTD) with a single and two countermeasures, respectively. Among these, the highest protection with a single strategy is achieved using signature attenuation [4], [6], which utilized a current source making the supply current mostly constant. While being highly resilient to SCA, [4] required analog-biased cascode current sources and an analog bleed path, making it not easily scalable across different technology generations. Conversely, [2], [5] are synthesizable but a single countermeasure only achieved moderate protection (up to 10M MTD). This work embraces the concept of signature attenuation in the current domain, but makes it fully-synthesizable with digital current sources, control loop and the bleed to increase the MTD from 10M [5] to $250mathrm{M} (25 times $ improvement, Fig. 36.2.1) using a single synthesizable countermeasure. Finally, combining the digital signature attenuation circuit (DSAC) with a second synthesizable generic technique in the form of a time-varying transfer function (TVTF), this work achieves an MTD $gt1.25mathrm{B}$ for both EM and power SCA.
{"title":"36.2 An EM/Power SCA-Resilient AES-256 with Synthesizable Signature Attenuation Using Digital-Friendly Current Source and RO-Bleed-Based Integrated Local Feedback and Global Switched-Mode Control","authors":"A. Ghosh, D. Das, Josef Danial, V. De, Santosh K. Ghosh, Shreyas Sen","doi":"10.1109/ISSCC42613.2021.9365978","DOIUrl":"https://doi.org/10.1109/ISSCC42613.2021.9365978","url":null,"abstract":"Mathematically secure cryptographic algorithms leak side-channel information in the form of correlated power and electromagnetic (EM) signals, leading to physical side-channel analysis (SCA) attacks. Circuit-level countermeasures against power/EM SCA include current equalizer [1], series LDO [2], IVR [3], enhancing protection up to 10M traces. Recently, current domain signature attenuation [4] and randomized NL-LDO cascaded with arithmetic countermeasures [5] achieved $gt1mathrm{B}$ minimum traces to disclosure (MTD) with a single and two countermeasures, respectively. Among these, the highest protection with a single strategy is achieved using signature attenuation [4], [6], which utilized a current source making the supply current mostly constant. While being highly resilient to SCA, [4] required analog-biased cascode current sources and an analog bleed path, making it not easily scalable across different technology generations. Conversely, [2], [5] are synthesizable but a single countermeasure only achieved moderate protection (up to 10M MTD). This work embraces the concept of signature attenuation in the current domain, but makes it fully-synthesizable with digital current sources, control loop and the bleed to increase the MTD from 10M [5] to $250mathrm{M} (25 times $ improvement, Fig. 36.2.1) using a single synthesizable countermeasure. Finally, combining the digital signature attenuation circuit (DSAC) with a second synthesizable generic technique in the form of a time-varying transfer function (TVTF), this work achieves an MTD $gt1.25mathrm{B}$ for both EM and power SCA.","PeriodicalId":371093,"journal":{"name":"2021 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"337 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122842533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}