Pub Date : 2012-04-03DOI: 10.1109/ISSCC.2012.6177085
Yao-Hong Liu, Xiongchuan Huang, M. Vidojkovic, K. Imamura, P. Harpe, G. Dolmans, H. D. Groot
This paper presents an ultra-low-power (ULP) 2.3/2.4GHz multi-standard transmitter (TX) for wireless sensor networks and wireless body area networks. Several 2.3/2.4GHz wireless standards have been proposed for such applications, including IEEE802.15.6 (BAN) for body area networks, IEEE802.15.4 (Zigbee) and Bluetooth Low Energy (BLE) for sensor networks and IEEE802.15.4g (SUN) for smart buildings. Recent standard compliant short-range TXs [1-6] typically consume DC power in the range of 20 to 50mW. This is rather high for autonomous systems with limited battery energy. Implemented in a 90nm CMOS technology, the presented TX saves at least 75% of power consumption by replacing several power-hungry analog blocks with the digitally-assisted circuits. This TX is compliant with all 4 of these standards, while dissipating only 4.5mA from a 1.2V supply.
{"title":"A 2.7nJ/b multi-standard 2.3/2.4GHz polar transmitter for wireless sensor networks","authors":"Yao-Hong Liu, Xiongchuan Huang, M. Vidojkovic, K. Imamura, P. Harpe, G. Dolmans, H. D. Groot","doi":"10.1109/ISSCC.2012.6177085","DOIUrl":"https://doi.org/10.1109/ISSCC.2012.6177085","url":null,"abstract":"This paper presents an ultra-low-power (ULP) 2.3/2.4GHz multi-standard transmitter (TX) for wireless sensor networks and wireless body area networks. Several 2.3/2.4GHz wireless standards have been proposed for such applications, including IEEE802.15.6 (BAN) for body area networks, IEEE802.15.4 (Zigbee) and Bluetooth Low Energy (BLE) for sensor networks and IEEE802.15.4g (SUN) for smart buildings. Recent standard compliant short-range TXs [1-6] typically consume DC power in the range of 20 to 50mW. This is rather high for autonomous systems with limited battery energy. Implemented in a 90nm CMOS technology, the presented TX saves at least 75% of power consumption by replacing several power-hungry analog blocks with the digitally-assisted circuits. This TX is compliant with all 4 of these standards, while dissipating only 4.5mA from a 1.2V supply.","PeriodicalId":255282,"journal":{"name":"2012 IEEE International Solid-State Circuits Conference","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114056859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-04-03DOI: 10.1109/ISSCC.2012.6176957
Pradeep Shettigar, S. Pavan
We propose design techniques that enable the realization of power-efficient single-bit CT-ΔΣ ADCs at multi-Gb/s speeds. An FIR DAC [1 ] is used to reduce sensitivity to clock jitter and relax loop filter linearity. A mostly analog path compensates the modulator for the delay introduced by the FIR DAC. The CTDSM samples at 3.6GS/S, has 83dB DR in 36MHz BW, and occupies 0.12mm2 in 90nm CMOS. Dissipating 15mW from a 1.2V supply, it thereby achieves an FoMSNDR of 72.8fJ/level, which is an improvement over the state of the art for converters with bandwidths greater than 20MHz.
{"title":"A 15mW 3.6GS/s CT-ΔΣ ADC with 36MHz bandwidth and 83dB DR in 90nm CMOS","authors":"Pradeep Shettigar, S. Pavan","doi":"10.1109/ISSCC.2012.6176957","DOIUrl":"https://doi.org/10.1109/ISSCC.2012.6176957","url":null,"abstract":"We propose design techniques that enable the realization of power-efficient single-bit CT-ΔΣ ADCs at multi-Gb/s speeds. An FIR DAC [1 ] is used to reduce sensitivity to clock jitter and relax loop filter linearity. A mostly analog path compensates the modulator for the delay introduced by the FIR DAC. The CTDSM samples at 3.6GS/S, has 83dB DR in 36MHz BW, and occupies 0.12mm2 in 90nm CMOS. Dissipating 15mW from a 1.2V supply, it thereby achieves an FoMSNDR of 72.8fJ/level, which is an improvement over the state of the art for converters with bandwidths greater than 20MHz.","PeriodicalId":255282,"journal":{"name":"2012 IEEE International Solid-State Circuits Conference","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122462295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-04-03DOI: 10.1109/ISSCC.2012.6176965
Michiel C. M. Soer, E. Klumperink, B. Nauta, F. V. Vliet
Phased arrays in CMOS for consumer communication bands aim to enhance receiver performance by exploiting beamforming with antenna arrays. Sensitivity increases with the number of antenna elements through array gain and interferers can be cancelled through the spatial filtering of the beam pattern [1]. For the latter, the linearity of the receiver before the beamforming summing point becomes a bottleneck as interferers are not cancelled yet. Phase shifting in the LO domain reduces the complexity in the signal path and enables the use of linear signal blocks, but has high requirements on the multiphase LO generation [2]. On the other hand, a switched-capacitor phase shifter can be very linear, but is limited by the linearity of the necessary input matching and element summing gm-stages [3]. This paper proposes a fully passive phased-array receiver front-end which implements impedance matching, phase shifting and element summing with only switched-capacitor stages for a high linearity.
{"title":"A 1.5-to-5.0GHz input-matched +2dBm P1dB all-passive switched-capacitor beamforming receiver front-end in 65nm CMOS","authors":"Michiel C. M. Soer, E. Klumperink, B. Nauta, F. V. Vliet","doi":"10.1109/ISSCC.2012.6176965","DOIUrl":"https://doi.org/10.1109/ISSCC.2012.6176965","url":null,"abstract":"Phased arrays in CMOS for consumer communication bands aim to enhance receiver performance by exploiting beamforming with antenna arrays. Sensitivity increases with the number of antenna elements through array gain and interferers can be cancelled through the spatial filtering of the beam pattern [1]. For the latter, the linearity of the receiver before the beamforming summing point becomes a bottleneck as interferers are not cancelled yet. Phase shifting in the LO domain reduces the complexity in the signal path and enables the use of linear signal blocks, but has high requirements on the multiphase LO generation [2]. On the other hand, a switched-capacitor phase shifter can be very linear, but is limited by the linearity of the necessary input matching and element summing gm-stages [3]. This paper proposes a fully passive phased-array receiver front-end which implements impedance matching, phase shifting and element summing with only switched-capacitor stages for a high linearity.","PeriodicalId":255282,"journal":{"name":"2012 IEEE International Solid-State Circuits Conference","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122830183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-04-03DOI: 10.1109/ISSCC.2012.6177019
Jerald Yoo, Long Yan, D. El-Damak, Muhammad Awais Bin Altaf, Ali H. Shoeb, H. Yoo, A. Chandrakasan
Tracking seizure activity to determine proper medication requires a small form factor, ultra-low power sensor with continuous EEG classification. Technical challenges arise from: 1) patient-to-patient variation of seizure pattern on EEG, 2) fully integrating an ultra-low power variable dynamic range instrumentation circuits with seizure detection processor, and 3) reducing communication overhead. Reference [1] extracted EEG features locally on-chip to reduce the data being transmitted, and saved power by 1/14 when compared to raw EEG data transmission. However, it still needs data transmission and off-chip classification to detect and to store seizure activity. This paper presents an ultra-low power scalable EEG acquisition SoC for continuous seizure detection and recording with fully integrated patient-specific Support Vector Machine (SVM)-based classification processor.
{"title":"An 8-channel scalable EEG acquisition SoC with fully integrated patient-specific seizure classification and recording processor","authors":"Jerald Yoo, Long Yan, D. El-Damak, Muhammad Awais Bin Altaf, Ali H. Shoeb, H. Yoo, A. Chandrakasan","doi":"10.1109/ISSCC.2012.6177019","DOIUrl":"https://doi.org/10.1109/ISSCC.2012.6177019","url":null,"abstract":"Tracking seizure activity to determine proper medication requires a small form factor, ultra-low power sensor with continuous EEG classification. Technical challenges arise from: 1) patient-to-patient variation of seizure pattern on EEG, 2) fully integrating an ultra-low power variable dynamic range instrumentation circuits with seizure detection processor, and 3) reducing communication overhead. Reference [1] extracted EEG features locally on-chip to reduce the data being transmitted, and saved power by 1/14 when compared to raw EEG data transmission. However, it still needs data transmission and off-chip classification to detect and to store seizure activity. This paper presents an ultra-low power scalable EEG acquisition SoC for continuous seizure detection and recording with fully integrated patient-specific Support Vector Machine (SVM)-based classification processor.","PeriodicalId":255282,"journal":{"name":"2012 IEEE International Solid-State Circuits Conference","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131250927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-04-03DOI: 10.1109/ISSCC.2012.6177070
H. Koizumi, M. Togashi, M. Nogawa, Y. Ohtomo
A burst-mode laser diode driver circuit (BLDD) for 10Gb/s-class passive optical network (10G-EPON) systems reduces power consumption by 94% while the laser diode (LD) is in the off state. The off-state optical launch power is kept at less than -45dBm while meeting the transistor breakdown condition. The BLDD recovers to the active state within 16ns, which is 46x faster than that of a previously reported burst-mode transmitter, and the fast recovery makes efficient burst-by-burst power saving possible.
{"title":"A 10Gb/s burst-mode laser diode driver for burst-by-burst power saving","authors":"H. Koizumi, M. Togashi, M. Nogawa, Y. Ohtomo","doi":"10.1109/ISSCC.2012.6177070","DOIUrl":"https://doi.org/10.1109/ISSCC.2012.6177070","url":null,"abstract":"A burst-mode laser diode driver circuit (BLDD) for 10Gb/s-class passive optical network (10G-EPON) systems reduces power consumption by 94% while the laser diode (LD) is in the off state. The off-state optical launch power is kept at less than -45dBm while meeting the transistor breakdown condition. The BLDD recovers to the active state within 16ns, which is 46x faster than that of a previously reported burst-mode transmitter, and the fast recovery makes efficient burst-by-burst power saving possible.","PeriodicalId":255282,"journal":{"name":"2012 IEEE International Solid-State Circuits Conference","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132589239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-04-03DOI: 10.1109/ISSCC.2012.6176988
E. Karl, Yih Wang, Y. Ng, Z. Guo, F. Hamzaoglu, U. Bhattacharya, Kevin Zhang, K. Mistry, M. Bohr
Future product applications demand increasing performance with reduced power consumption, which motivates the pursuit of high-performance at reduced operating voltages. Random and systematic device variations pose significant challenges to SRAM VMIN and low-voltage performance as technology scaling follows Moore's law to the 22nm node. A high-performance, voltage-scalable 162Mb SRAM array is developed in a 22nm tri-gate bulk technology featuring 3rd-generation high-k metal-gate transistors and 5th-generation strained silicon. Tri-gate technology reduces short-channel effects (SCE) and improves subthreshold slope to provide 37% improved device performance at 0.7V. Continuous device width sizing in planar technology is replaced by combining parallel silicon fins to multiply drive current. Process-circuit co-optimization of transient voltage collapse write assist (TVC-WA) and wordline underdrive read assist (WLUD-RA) features address process variation and fin quantization at 22nm and enable a 175mV reduction in the supply voltage required for 2GHz SRAM operation. Figure 13.1.1 shows an SEM top-down view of a 0.092μm2 high-density 6T SRAM bitcell (HDC) and a 0.108μm2 low-voltage 6T SRAM cell (LVC) after gate and diffusion processing. Computational OPC/RET techniques extend the capabilities of 193nm immersion lithography to allow a 1.85× increase in array density relative to 32nm designs [1].
{"title":"A 4.6GHz 162Mb SRAM design in 22nm tri-gate CMOS technology with integrated active VMIN-enhancing assist circuitry","authors":"E. Karl, Yih Wang, Y. Ng, Z. Guo, F. Hamzaoglu, U. Bhattacharya, Kevin Zhang, K. Mistry, M. Bohr","doi":"10.1109/ISSCC.2012.6176988","DOIUrl":"https://doi.org/10.1109/ISSCC.2012.6176988","url":null,"abstract":"Future product applications demand increasing performance with reduced power consumption, which motivates the pursuit of high-performance at reduced operating voltages. Random and systematic device variations pose significant challenges to SRAM VMIN and low-voltage performance as technology scaling follows Moore's law to the 22nm node. A high-performance, voltage-scalable 162Mb SRAM array is developed in a 22nm tri-gate bulk technology featuring 3rd-generation high-k metal-gate transistors and 5th-generation strained silicon. Tri-gate technology reduces short-channel effects (SCE) and improves subthreshold slope to provide 37% improved device performance at 0.7V. Continuous device width sizing in planar technology is replaced by combining parallel silicon fins to multiply drive current. Process-circuit co-optimization of transient voltage collapse write assist (TVC-WA) and wordline underdrive read assist (WLUD-RA) features address process variation and fin quantization at 22nm and enable a 175mV reduction in the supply voltage required for 2GHz SRAM operation. Figure 13.1.1 shows an SEM top-down view of a 0.092μm2 high-density 6T SRAM bitcell (HDC) and a 0.108μm2 low-voltage 6T SRAM cell (LVC) after gate and diffusion processing. Computational OPC/RET techniques extend the capabilities of 193nm immersion lithography to allow a 1.85× increase in array density relative to 32nm designs [1].","PeriodicalId":255282,"journal":{"name":"2012 IEEE International Solid-State Circuits Conference","volume":"2017 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132733946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-04-03DOI: 10.1109/ISSCC.2012.6177079
Meng-Fan Chang, Che-Wei Wu, Chia-Chen Kuo, S. Shen, Ku-Feng Lin, Shu-Meng Yang, Y. King, Chorng-Jung Lin, Y. Chih
Numerous low-supply-voltage (VDD) mobile chips, such as energy-harvesting-powered devices and biomedical applications, require low-VDD on-chip nonvolatile memory (NVM) for low-power active-mode access and power-off data storage. However, conventional NVMs cannot achieve low-VDD operation due to insufficient write voltage generated by charge-pumped (CP) circuits at a low VDD, and a lack of low-VDD current-mode sense amplifiers (CSA) [1-4] to overcome read issues in reduced sensing margins, degraded speeds, and insufficient voltage headroom (VHR). Resistive RAM (ReRAM) [4-6] is a promising memory with the advantages of short write time, low write-voltage, and reduced write power compared to Flash and other NVMs. Using a low-VDD CP with relaxed output voltage/current requirements for write operations, ReRAM is a good candidate for on-chip low-VDD NVM if a low-VDD CSA is provided, particularly for frequent-read-seldom-write applications. We develop a body-drain-driven CSA (BDD-CSA) with dynamic BL bias voltage (VBL) and small VHR for larger sensing margins to achieve a lower VDDmin, faster read speed, and better tolerance of read cell current (ICELL) and BL leakage current (IBL-LEAK) variations compared to conventional CSAs. A fabricated 65nm 4Mb ReRAM macro using the BDD-CSA and our CMOS-logic-compatible ReRAM cell [7] achieves 0.5V VDDmin. The BDD-CSA achieves 0.32V VDDmin.
{"title":"A 0.5V 4Mb logic-process compatible embedded resistive RAM (ReRAM) in 65nm CMOS using low-voltage current-mode sensing scheme with 45ns random read time","authors":"Meng-Fan Chang, Che-Wei Wu, Chia-Chen Kuo, S. Shen, Ku-Feng Lin, Shu-Meng Yang, Y. King, Chorng-Jung Lin, Y. Chih","doi":"10.1109/ISSCC.2012.6177079","DOIUrl":"https://doi.org/10.1109/ISSCC.2012.6177079","url":null,"abstract":"Numerous low-supply-voltage (VDD) mobile chips, such as energy-harvesting-powered devices and biomedical applications, require low-VDD on-chip nonvolatile memory (NVM) for low-power active-mode access and power-off data storage. However, conventional NVMs cannot achieve low-VDD operation due to insufficient write voltage generated by charge-pumped (CP) circuits at a low VDD, and a lack of low-VDD current-mode sense amplifiers (CSA) [1-4] to overcome read issues in reduced sensing margins, degraded speeds, and insufficient voltage headroom (VHR). Resistive RAM (ReRAM) [4-6] is a promising memory with the advantages of short write time, low write-voltage, and reduced write power compared to Flash and other NVMs. Using a low-VDD CP with relaxed output voltage/current requirements for write operations, ReRAM is a good candidate for on-chip low-VDD NVM if a low-VDD CSA is provided, particularly for frequent-read-seldom-write applications. We develop a body-drain-driven CSA (BDD-CSA) with dynamic BL bias voltage (VBL) and small VHR for larger sensing margins to achieve a lower VDDmin, faster read speed, and better tolerance of read cell current (ICELL) and BL leakage current (IBL-LEAK) variations compared to conventional CSAs. A fabricated 65nm 4Mb ReRAM macro using the BDD-CSA and our CMOS-logic-compatible ReRAM cell [7] achieves 0.5V VDDmin. The BDD-CSA achieves 0.32V VDDmin.","PeriodicalId":255282,"journal":{"name":"2012 IEEE International Solid-State Circuits Conference","volume":"11 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133110822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-04-03DOI: 10.1109/ISSCC.2012.6176866
Y. Yano
Microelectronics has evolved to save power. In fact, semiconductor technology has been in the lead in the reduction of power consumption, by facilitating energy monitoring, and the controlling and managing of energy consumption. The key product in this advance has been a less-commonly-known semiconductor device called the microcontroller. That the MCU uses very little power was demonstrated for the first time by Renesas in 2006, by operating a low-power MCU from the electricity generated by 4 lemons! Subseqently, in 2011, we succeeded in operating our latest low-power MCU for 3 hours and 45 minutes using one lemon as a power source. Yet, in the future, MCUs must evolve further to save power, in widespread applications including the "energy harvesting" environment.
{"title":"Take the expressway to go greener","authors":"Y. Yano","doi":"10.1109/ISSCC.2012.6176866","DOIUrl":"https://doi.org/10.1109/ISSCC.2012.6176866","url":null,"abstract":"Microelectronics has evolved to save power. In fact, semiconductor technology has been in the lead in the reduction of power consumption, by facilitating energy monitoring, and the controlling and managing of energy consumption. The key product in this advance has been a less-commonly-known semiconductor device called the microcontroller. That the MCU uses very little power was demonstrated for the first time by Renesas in 2006, by operating a low-power MCU from the electricity generated by 4 lemons! Subseqently, in 2011, we succeeded in operating our latest low-power MCU for 3 hours and 45 minutes using one lemon as a power source. Yet, in the future, MCUs must evolve further to save power, in widespread applications including the \"energy harvesting\" environment.","PeriodicalId":255282,"journal":{"name":"2012 IEEE International Solid-State Circuits Conference","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115152163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-04-03DOI: 10.1109/ISSCC.2012.6176966
S. Hsu, A. Agarwal, M. Anders, S. Mathew, Himanshu Kaul, F. Sheikh, R. Krishnamurthy
Energy-efficient SIMD permutation operations are key for maximizing high-performance microprocessor vector datapath utilization in multimedia, graphics, and signal processing workloads [1-3]. A wide SIMD vector permutation engine is required to achieve high-throughput data rearrangement operations on large data sets, with scaled supply voltages to deliver high energy efficiency. An ultra-low-voltage reconfigurable 4-way to 32-way SIMD vector permutation engine consisting of a 32-entry × 256b 3-read/1-write ported register file with a 256b byte-wise any-to-any permute crossbar for 2-dimensional shuffle is fabricated in 22nm CMOS. The register file integrates a vertical shuffle across multiple entries into read/write operations, and includes clockless static reads with shared P/N dual-ended transmission gate (DETG) writes, improving register file VMIN by 250mV across PVT variations with a wide dynamic operating range of 280mV-1.1V. The permute crossbar implements an interleaved folded byte-wise multiplexer layout forming an any-to-any fully-connected tree to perform a horizontal shuffle with permute accumulate circuits, and includes vector flip-flops, stacked min-delay buffers, shared gates to average min-sized transistor variation, and ultra-low-voltage split-output (ULVS) level shifters improving logic VMIN by 150mV, while enabling peak energy efficiency of 585GOPS/W measured at 260mV, 50°C. The permutation engine occupies a dense layout of 0.048mm2 (Fig. 10.1.7) while achieving: (i) nominal register file performance of 1.8GHz, 106mW measured at 0.9V, 50°C; (ii) robust register file functionality measured down to 280mV (subthreshold) with peak energy efficiency of 154GOPS/W; (iii) scalable permute crossbar performance of 2.9GHz, 69mW measured at 1.1V, 50°C with deep sub-threshold operation at 240mV, 10MHz consuming 19μW; and (iv) a 64b 4×4 matrix transpose algorithm with 53% energy savings and 42% improved peak throughput of 263Gbps measured at 1.8GHz, 0.9V.
{"title":"A 280mV-to-1.1V 256b reconfigurable SIMD vector permutation engine with 2-dimensional shuffle in 22nm CMOS","authors":"S. Hsu, A. Agarwal, M. Anders, S. Mathew, Himanshu Kaul, F. Sheikh, R. Krishnamurthy","doi":"10.1109/ISSCC.2012.6176966","DOIUrl":"https://doi.org/10.1109/ISSCC.2012.6176966","url":null,"abstract":"Energy-efficient SIMD permutation operations are key for maximizing high-performance microprocessor vector datapath utilization in multimedia, graphics, and signal processing workloads [1-3]. A wide SIMD vector permutation engine is required to achieve high-throughput data rearrangement operations on large data sets, with scaled supply voltages to deliver high energy efficiency. An ultra-low-voltage reconfigurable 4-way to 32-way SIMD vector permutation engine consisting of a 32-entry × 256b 3-read/1-write ported register file with a 256b byte-wise any-to-any permute crossbar for 2-dimensional shuffle is fabricated in 22nm CMOS. The register file integrates a vertical shuffle across multiple entries into read/write operations, and includes clockless static reads with shared P/N dual-ended transmission gate (DETG) writes, improving register file VMIN by 250mV across PVT variations with a wide dynamic operating range of 280mV-1.1V. The permute crossbar implements an interleaved folded byte-wise multiplexer layout forming an any-to-any fully-connected tree to perform a horizontal shuffle with permute accumulate circuits, and includes vector flip-flops, stacked min-delay buffers, shared gates to average min-sized transistor variation, and ultra-low-voltage split-output (ULVS) level shifters improving logic VMIN by 150mV, while enabling peak energy efficiency of 585GOPS/W measured at 260mV, 50°C. The permutation engine occupies a dense layout of 0.048mm2 (Fig. 10.1.7) while achieving: (i) nominal register file performance of 1.8GHz, 106mW measured at 0.9V, 50°C; (ii) robust register file functionality measured down to 280mV (subthreshold) with peak energy efficiency of 154GOPS/W; (iii) scalable permute crossbar performance of 2.9GHz, 69mW measured at 1.1V, 50°C with deep sub-threshold operation at 240mV, 10MHz consuming 19μW; and (iv) a 64b 4×4 matrix transpose algorithm with 53% energy savings and 42% improved peak throughput of 263Gbps measured at 1.8GHz, 0.9V.","PeriodicalId":255282,"journal":{"name":"2012 IEEE International Solid-State Circuits Conference","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115382579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-04-03DOI: 10.1109/ISSCC.2012.6176963
Dixian Zhao, Shailesh Kulkarni, P. Reynaert
This paper presents a 60GHz transmitter (TX) based on the outphasing technique. It avoids amplifying variable-envelope signals and reconstructs the modulated signals by vector summing two constant-amplitude phase-modulated signals using an on-chip power combiner. The proposed design proves to have higher linear output power with better average efficiency compared to existing 60GHz solutions.
{"title":"A 60GHz outphasing transmitter in 40nm CMOS with 15.6dBm output power","authors":"Dixian Zhao, Shailesh Kulkarni, P. Reynaert","doi":"10.1109/ISSCC.2012.6176963","DOIUrl":"https://doi.org/10.1109/ISSCC.2012.6176963","url":null,"abstract":"This paper presents a 60GHz transmitter (TX) based on the outphasing technique. It avoids amplifying variable-envelope signals and reconstructs the modulated signals by vector summing two constant-amplitude phase-modulated signals using an on-chip power combiner. The proposed design proves to have higher linear output power with better average efficiency compared to existing 60GHz solutions.","PeriodicalId":255282,"journal":{"name":"2012 IEEE International Solid-State Circuits Conference","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114144492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}