Pub Date : 2013-03-28DOI: 10.1109/ISSCC.2013.6487676
Yaoming Sun, M. Marinkovic, G. Fischer, W. Winkler, W. Debski, S. Beer, T. Zwick, M. Girma, J. Hasch, C. Scheytt
This paper presents an integrated mixed-signal 120GHz FMCW/CW radar chipset in a 0.13μm SiGe BiCMOS technology. It features on-chip MMW built-in-self-test (BIST) circuits, a harmonic transceiver, software linearization (SWL) circuits and a digital interface. This chipset has been tested in a low-cost package, where the antennas are integrated. Above 100GHz, our transceiver has achieved state-ofthe-art integration level and receiver linearity, and DC power consumption.
{"title":"A low-cost miniature 120GHz SiP FMCW/CW radar sensor with software linearization","authors":"Yaoming Sun, M. Marinkovic, G. Fischer, W. Winkler, W. Debski, S. Beer, T. Zwick, M. Girma, J. Hasch, C. Scheytt","doi":"10.1109/ISSCC.2013.6487676","DOIUrl":"https://doi.org/10.1109/ISSCC.2013.6487676","url":null,"abstract":"This paper presents an integrated mixed-signal 120GHz FMCW/CW radar chipset in a 0.13μm SiGe BiCMOS technology. It features on-chip MMW built-in-self-test (BIST) circuits, a harmonic transceiver, software linearization (SWL) circuits and a digital interface. This chipset has been tested in a low-cost package, where the antennas are integrated. Above 100GHz, our transceiver has achieved state-ofthe-art integration level and receiver linearity, and DC power consumption.","PeriodicalId":6378,"journal":{"name":"2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers","volume":"176 1","pages":"148-149"},"PeriodicalIF":0.0,"publicationDate":"2013-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79817946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-03-28DOI: 10.1109/ISSCC.2013.6487681
Yongha Park, Chang-Hyo Yu, Kilwhan Lee, Hyunsuk Kim, Youngeun Park, Chun-Ho Kim, Yunseok Choi, Jinhong Oh, Chang-Hoon Oh, Gurnrack Moon, Sangduk Kim, H. Jang, Jin-Aeon Lee, Chi-Yong Kim, Sungho Park
72.5GFLOPS GPGPU computing, 240 Mpixel/s sustainable image signal processing and 60fps 1080p multi-format video codec (MFC) capabilities are integrated with an 1.7GHz out-of-order-execution dual-core ARMv7A architecture CPU and 12.8GB/s memory subsystem for a next-generation application processor. The GPU-based general-purpose computing capability can deliver 10× higher energy efficiency in compute-intensive multimedia applications, compared with a CPU solution on the same die. The improved energy efficiency with GPGPU computing enables next-generation fused multimedia applications, with the assistance of dedicated high-performance low-power multimedia accelerators, as well as with low-power design and process technology, as shown in Fig. 9.4.1.
{"title":"72.5GFLOPS 240Mpixel/s 1080p 60fps multi-format video codec application processor enabled with GPGPU for fused multimedia application","authors":"Yongha Park, Chang-Hyo Yu, Kilwhan Lee, Hyunsuk Kim, Youngeun Park, Chun-Ho Kim, Yunseok Choi, Jinhong Oh, Chang-Hoon Oh, Gurnrack Moon, Sangduk Kim, H. Jang, Jin-Aeon Lee, Chi-Yong Kim, Sungho Park","doi":"10.1109/ISSCC.2013.6487681","DOIUrl":"https://doi.org/10.1109/ISSCC.2013.6487681","url":null,"abstract":"72.5GFLOPS GPGPU computing, 240 Mpixel/s sustainable image signal processing and 60fps 1080p multi-format video codec (MFC) capabilities are integrated with an 1.7GHz out-of-order-execution dual-core ARMv7A architecture CPU and 12.8GB/s memory subsystem for a next-generation application processor. The GPU-based general-purpose computing capability can deliver 10× higher energy efficiency in compute-intensive multimedia applications, compared with a CPU solution on the same die. The improved energy efficiency with GPGPU computing enables next-generation fused multimedia applications, with the assistance of dedicated high-performance low-power multimedia accelerators, as well as with low-power design and process technology, as shown in Fig. 9.4.1.","PeriodicalId":6378,"journal":{"name":"2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers","volume":"63 1","pages":"160-161"},"PeriodicalIF":0.0,"publicationDate":"2013-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81408281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-03-28DOI: 10.1109/ISSCC.2013.6487686
Milad Darvishi, R. V. D. Zee, B. Nauta
Radio receivers should be robust to large out-of-band blockers with small degradation in their sensitivity. N-path mixers can be used as mixer-first receivers [1] with good linearity and RF filtering [2]. However, 1/f noise calls for large active device sizes for IF circuits and high power consumption. The 1/f noise issue can be relaxed by having RF gain. However, to avoid desensitization by large out-of-band blockers, a bandpass filter (BPF) with sharp cut-off frequency is required in front of the RF amplifiers. gm-C BPFs suffer from tight tradeoffs among DR, power consumption, Q and fc. Also, on-chip Q-enhanced LC BPFs [3] are not suitable due to low DR, large area and non-tunability. Therefore, bulky and non-tunable SAW filters are used. N-path BPFs offer high Q while their center frequency is tuned by the clock frequency [2]. Compared to gm-C filters, this technique decouples the required Q from the DR. The 4-path filter in [4] has only 2nd-order filtering and limited rejection. The order and rejection of N-path BPFs can be increased by cascading [5], but this renders a “round” passband shape. The 4th-order 4-path BPF in [6] has a “flat” passband shape and high rejection but a high NF. This work solves the noise issue of [6] while achieving the same out-of-band linearity and adding 25dB of voltage gain to relax the noise requirement of the subsequent stages.
{"title":"A 0.1-to-1.2GHz tunable 6th-order N-path channel-select filter with 0.6dB passband ripple and +7dBm blocker tolerance","authors":"Milad Darvishi, R. V. D. Zee, B. Nauta","doi":"10.1109/ISSCC.2013.6487686","DOIUrl":"https://doi.org/10.1109/ISSCC.2013.6487686","url":null,"abstract":"Radio receivers should be robust to large out-of-band blockers with small degradation in their sensitivity. N-path mixers can be used as mixer-first receivers [1] with good linearity and RF filtering [2]. However, 1/f noise calls for large active device sizes for IF circuits and high power consumption. The 1/f noise issue can be relaxed by having RF gain. However, to avoid desensitization by large out-of-band blockers, a bandpass filter (BPF) with sharp cut-off frequency is required in front of the RF amplifiers. gm-C BPFs suffer from tight tradeoffs among DR, power consumption, Q and fc. Also, on-chip Q-enhanced LC BPFs [3] are not suitable due to low DR, large area and non-tunability. Therefore, bulky and non-tunable SAW filters are used. N-path BPFs offer high Q while their center frequency is tuned by the clock frequency [2]. Compared to gm-C filters, this technique decouples the required Q from the DR. The 4-path filter in [4] has only 2nd-order filtering and limited rejection. The order and rejection of N-path BPFs can be increased by cascading [5], but this renders a “round” passband shape. The 4th-order 4-path BPF in [6] has a “flat” passband shape and high rejection but a high NF. This work solves the noise issue of [6] while achieving the same out-of-band linearity and adding 25dB of voltage gain to relax the noise requirement of the subsequent stages.","PeriodicalId":6378,"journal":{"name":"2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers","volume":"2 1","pages":"172-173"},"PeriodicalIF":0.0,"publicationDate":"2013-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78641358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-03-28DOI: 10.1109/ISSCC.2013.6487618
Lisa T. Su
Anyone wishing to drive advances in computing technology must carefully negotiate key trade-offs. First, reducing power consumption is increasingly critical. Consumers want improved battery life, size, and weight for their laptops, tablets, and smartphones. Likewise, data-center power demands and cooling costs continue to rise. Concurrent is the demand for improved performance that enables compelling new user experiences. Users want to access devices through more natural interfaces (speech and gesture); they also want devices to manage ever-expanding volumes of data (home movies, pictures, and a world of content available in the cloud). An essential part of making these new user experiences available is programmer productivity; software developers must easily be able to tap into new capabilities by using familiar, powerful programming models. Finally, it is increasingly important that software be supported across a broad spectrum of devices; developers cannot sustain today's trend of re-writing code for an ever expanding number of different platforms.
{"title":"“Architecting the future through heterogeneous computing”","authors":"Lisa T. Su","doi":"10.1109/ISSCC.2013.6487618","DOIUrl":"https://doi.org/10.1109/ISSCC.2013.6487618","url":null,"abstract":"Anyone wishing to drive advances in computing technology must carefully negotiate key trade-offs. First, reducing power consumption is increasingly critical. Consumers want improved battery life, size, and weight for their laptops, tablets, and smartphones. Likewise, data-center power demands and cooling costs continue to rise. Concurrent is the demand for improved performance that enables compelling new user experiences. Users want to access devices through more natural interfaces (speech and gesture); they also want devices to manage ever-expanding volumes of data (home movies, pictures, and a world of content available in the cloud). An essential part of making these new user experiences available is programmer productivity; software developers must easily be able to tap into new capabilities by using familiar, powerful programming models. Finally, it is increasingly important that software be supported across a broad spectrum of devices; developers cannot sustain today's trend of re-writing code for an ever expanding number of different platforms.","PeriodicalId":6378,"journal":{"name":"2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers","volume":"35 1","pages":"8-11"},"PeriodicalIF":0.0,"publicationDate":"2013-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86919137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-03-28DOI: 10.1109/ISSCC.2013.6487622
S. Parikh, T. Kao, Y. Hidaka, J. Jiang, Asako Toda, S. McLeod, W. Walker, Y. Koyanagi, Toshiyuki Shibuya, J. Yamada
Standards such as OIF CEI-25G, CEI-28G and 32G-FC require transceivers operating at high data rates over imperfect channels. Equalizers are used to cancel the inter-symbol interference (ISI) caused by frequency-dependent channel losses such as skin effect and dielectric loss. The primary objective of an equalizer is to compensate for high-frequency loss, which often exceeds 30dB at fs/2. However, due to the skin effect in a PCB stripline, which starts at 10MHz or lower, we also need to compensate for a small amount of loss at low frequency (e.g., 500MHz). Figure 2.1.1 shows simulated responses of a backplane channel (42.6dB loss at fs/2 for 32Gb/s) with conventional high-frequency equalizers only (4-tap feed-forward equalizer (FFE), 1st-order continuous-time linear equalizer (CTLE) with a dominant pole at fs/4, and 1-tap DFE) and with additional low-frequency equalization. Conventional equalizers cannot compensate for the small amount of low-frequency loss because the slope of the low-frequency loss is too gentle (<;3dB/dec). The FFE and CTLE do not have a pole in the low frequency region and hence have only a steep slope of 20dB/dec above their zero. The DFE cancels only short-term ISI. Effects of such low-frequency loss have often been overlooked or neglected, because 1) the loss is small (2 to 3dB), 2) when plotted using the linear frequency axis which is commonly used to show frequency dependence of skin effect and dielectric loss, the low-frequency loss is degenerated at DC and hardly visible (Fig. 2.1.1a), and 3) the long ISI tail of the channel pulse response seems well cancelled at first glance by conventional equalizers only (Fig. 2.1.1b). However, the uncompensated low-frequency loss causes non-negligible long-term residual ISI, because the integral of the residual ISI magnitude keeps going up for several hundred UI. As shown by the eye diagrams in the inset of Fig. 2.1.1(b), the residual long-term ISI results in 0.42UI data-dependent Jitter (DDJ) that is difficult to reduce further by enhancing FFE/CTLE/DFE, but can be reduced to 0.21UI by adding a low-frequency equalizer (LFEQ). Savoj et al. also recently reported long-tail cancellation [2].
{"title":"A 32Gb/s wireline receiver with a low-frequency equalizer, CTLE and 2-tap DFE in 28nm CMOS","authors":"S. Parikh, T. Kao, Y. Hidaka, J. Jiang, Asako Toda, S. McLeod, W. Walker, Y. Koyanagi, Toshiyuki Shibuya, J. Yamada","doi":"10.1109/ISSCC.2013.6487622","DOIUrl":"https://doi.org/10.1109/ISSCC.2013.6487622","url":null,"abstract":"Standards such as OIF CEI-25G, CEI-28G and 32G-FC require transceivers operating at high data rates over imperfect channels. Equalizers are used to cancel the inter-symbol interference (ISI) caused by frequency-dependent channel losses such as skin effect and dielectric loss. The primary objective of an equalizer is to compensate for high-frequency loss, which often exceeds 30dB at fs/2. However, due to the skin effect in a PCB stripline, which starts at 10MHz or lower, we also need to compensate for a small amount of loss at low frequency (e.g., 500MHz). Figure 2.1.1 shows simulated responses of a backplane channel (42.6dB loss at fs/2 for 32Gb/s) with conventional high-frequency equalizers only (4-tap feed-forward equalizer (FFE), 1st-order continuous-time linear equalizer (CTLE) with a dominant pole at fs/4, and 1-tap DFE) and with additional low-frequency equalization. Conventional equalizers cannot compensate for the small amount of low-frequency loss because the slope of the low-frequency loss is too gentle (<;3dB/dec). The FFE and CTLE do not have a pole in the low frequency region and hence have only a steep slope of 20dB/dec above their zero. The DFE cancels only short-term ISI. Effects of such low-frequency loss have often been overlooked or neglected, because 1) the loss is small (2 to 3dB), 2) when plotted using the linear frequency axis which is commonly used to show frequency dependence of skin effect and dielectric loss, the low-frequency loss is degenerated at DC and hardly visible (Fig. 2.1.1a), and 3) the long ISI tail of the channel pulse response seems well cancelled at first glance by conventional equalizers only (Fig. 2.1.1b). However, the uncompensated low-frequency loss causes non-negligible long-term residual ISI, because the integral of the residual ISI magnitude keeps going up for several hundred UI. As shown by the eye diagrams in the inset of Fig. 2.1.1(b), the residual long-term ISI results in 0.42UI data-dependent Jitter (DDJ) that is difficult to reduce further by enhancing FFE/CTLE/DFE, but can be reduced to 0.21UI by adding a low-frequency equalizer (LFEQ). Savoj et al. also recently reported long-tail cancellation [2].","PeriodicalId":6378,"journal":{"name":"2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers","volume":"38 1","pages":"28-29"},"PeriodicalIF":0.0,"publicationDate":"2013-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86737347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-03-28DOI: 10.1109/ISSCC.2013.6487651
Sang-Sung Lee, Jaeheon Lee, In-Young Lee, Sang-Gug Lee, J. Ko
RFID systems use backscattering communication in which the TX transmits a continuous wave (CW) to provide energy to the tag while the RX receives data from it. Due to the simultaneous operation of the RX and TX, large TX leakage is the main issue in securing RX sensitivity. Although external isolation components such as a circulator or directional coupler are widely used in RFID systems, TX leakage is still a dominant source of sensitivity degradation due to its finite isolation and environmentally dependent antenna reflection ratio, as shown in Fig. 5.6.1(a). In a single-antenna-based RFID system, the TX carrier leakage is typically above 0dBm at the RX input despite off-chip isolation components [1]. As can be seen in Fig. 5.6.1(b), when the close-in phase noise of the TX carrier is -85dBc/Hz, the phase noise level of 0dBm TX leakage in the receive channel reaches 89dB higher than the thermal noise level, thus directly degrading the SNR. In efforts to solve the leakage problem, leakage cancellation [2,3] and self-correlated RX [4] techniques have been reported. However, high power consumption for leakage replica generation and long calibration time, as in [2,3], and hardware complexity for a 45 degree phase shift [4] are issues that need to be resolved.
{"title":"A new TX leakage-suppression technique for an RFID receiver using a dead-zone amplifier","authors":"Sang-Sung Lee, Jaeheon Lee, In-Young Lee, Sang-Gug Lee, J. Ko","doi":"10.1109/ISSCC.2013.6487651","DOIUrl":"https://doi.org/10.1109/ISSCC.2013.6487651","url":null,"abstract":"RFID systems use backscattering communication in which the TX transmits a continuous wave (CW) to provide energy to the tag while the RX receives data from it. Due to the simultaneous operation of the RX and TX, large TX leakage is the main issue in securing RX sensitivity. Although external isolation components such as a circulator or directional coupler are widely used in RFID systems, TX leakage is still a dominant source of sensitivity degradation due to its finite isolation and environmentally dependent antenna reflection ratio, as shown in Fig. 5.6.1(a). In a single-antenna-based RFID system, the TX carrier leakage is typically above 0dBm at the RX input despite off-chip isolation components [1]. As can be seen in Fig. 5.6.1(b), when the close-in phase noise of the TX carrier is -85dBc/Hz, the phase noise level of 0dBm TX leakage in the receive channel reaches 89dB higher than the thermal noise level, thus directly degrading the SNR. In efforts to solve the leakage problem, leakage cancellation [2,3] and self-correlated RX [4] techniques have been reported. However, high power consumption for leakage replica generation and long calibration time, as in [2,3], and hardware complexity for a 45 degree phase shift [4] are issues that need to be resolved.","PeriodicalId":6378,"journal":{"name":"2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers","volume":"25 1","pages":"92-93"},"PeriodicalIF":0.0,"publicationDate":"2013-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90147720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-03-28DOI: 10.1109/ISSCC.2013.6487803
S. Takaya, M. Nagata, A. Sakai, T. Kariya, S. Uchiyama, H. Kobayashi, H. Ikeda
Three dimensional (3D) stacking of memory chips is a promising direction for implementing memory systems in mobile applications and for low-cost high-performance computation. The requirements are extremely low power consumption, high data bandwidth, stability and scalability of operation, as well as large storage capacity with a small footprint. A digital control chip at the base of the stack is needed to efficiently access the 3D memory hierarchy, as well as to emulate a standard memory interface for compatibility. The overall performance and yields of a 3D system are constrained by vertical communication channels among the stacked chips, as well as the connections to the PCB. However, the empirical models presently used in the design stage do not properly represent the electrical and mechanical properties and performance variations of through silicon vias (TSVs) and microbumps (μBumps). What is needed are circuit techniques that handle such uncertainties to enable the creation of robust 3D data links. This paper presents a complete test vehicle for TSV-based wide I/O data communication in a three-tier 3D chip stack assembled in a BGA package. In-place eye-diagram and waveform capturers are mounted in an active silicon interposer to characterize vertical signaling through the chain of TSVs and μBumps.
{"title":"A 100GB/s wide I/O with 4096b TSVs through an active silicon interposer with in-place waveform capturing","authors":"S. Takaya, M. Nagata, A. Sakai, T. Kariya, S. Uchiyama, H. Kobayashi, H. Ikeda","doi":"10.1109/ISSCC.2013.6487803","DOIUrl":"https://doi.org/10.1109/ISSCC.2013.6487803","url":null,"abstract":"Three dimensional (3D) stacking of memory chips is a promising direction for implementing memory systems in mobile applications and for low-cost high-performance computation. The requirements are extremely low power consumption, high data bandwidth, stability and scalability of operation, as well as large storage capacity with a small footprint. A digital control chip at the base of the stack is needed to efficiently access the 3D memory hierarchy, as well as to emulate a standard memory interface for compatibility. The overall performance and yields of a 3D system are constrained by vertical communication channels among the stacked chips, as well as the connections to the PCB. However, the empirical models presently used in the design stage do not properly represent the electrical and mechanical properties and performance variations of through silicon vias (TSVs) and microbumps (μBumps). What is needed are circuit techniques that handle such uncertainties to enable the creation of robust 3D data links. This paper presents a complete test vehicle for TSV-based wide I/O data communication in a three-tier 3D chip stack assembled in a BGA package. In-place eye-diagram and waveform capturers are mounted in an active silicon interposer to characterize vertical signaling through the chain of TSVs and μBumps.","PeriodicalId":6378,"journal":{"name":"2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers","volume":"4 1","pages":"434-435"},"PeriodicalIF":0.0,"publicationDate":"2013-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79192225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-03-28DOI: 10.1109/ISSCC.2013.6487751
M. Sinangil, A. Chandrakasan
Mobile applications such as tablets pack increasingly more processing capability comparable to workstations or laptops but can do little for cooling or extending the battery life in their form factors. SRAMs account for a large fraction of chip area and are critical in this context. Recent work has focused on voltage scaling in SRAMs, which is an effective way of achieving energy efficiency [1,2]. These conventional SRAMs are mostly general-purpose in the sense that they are designed without considering the specific features of the data they will store. However, application-specific features such as statistics of storage data can be exploited and incorporated into the transistor-level design to provide a new dimension towards achieving the next level of energy savings in addition to the savings provided through voltage scaling. The work in [3] is an example where an inversion bit is added for each word to reduce read-bitline (RBL) transitions in an 8T-cell-based design with a single-ended read port. Similarly, the work in [4] stores only the LSBs of each word in 6T SRAMs where occasional bit-errors at low voltages are tolerable for its application. In this work, we focus on video; however, the ideas can be generalized to different applications. In video encoders, pixel processing is performed over large partitions of image frames (e.g., 192×192 pixels), which are stored in on-chip SRAMs and accessed frequently. Image frames generally consist of smooth backgrounds or large objects where the intensity of pixels is spatially correlated. For the video image frame in Fig. 18.2.1, the deviation of each pixel's intensity from its block average for a 16×16 block shows that 76% of pixels lie within 3 LSB of the average. This additional information can be used to design an SRAM where correlation of data is used to reduce bitline activity factor which, for an 8T SRAM in a 65nm low-power CMOS process, accounts for ~50% of total energy consumption during read accesses at 0.6V. In this work, we present a prediction-based reduced-bitline-switching-activity (PB-RBSA) scheme along with a hierarchical sensing network with statistical sense-amplifier gating to exploit the correlation of storage data. Reduction of switching activity on the bitlines and in the sensing network of the memory provide up to 1.9× reduction in energy/access.
{"title":"An SRAM using output prediction to reduce BL-switching activity and statistically-gated SA for up to 1.9× reduction in energy/access","authors":"M. Sinangil, A. Chandrakasan","doi":"10.1109/ISSCC.2013.6487751","DOIUrl":"https://doi.org/10.1109/ISSCC.2013.6487751","url":null,"abstract":"Mobile applications such as tablets pack increasingly more processing capability comparable to workstations or laptops but can do little for cooling or extending the battery life in their form factors. SRAMs account for a large fraction of chip area and are critical in this context. Recent work has focused on voltage scaling in SRAMs, which is an effective way of achieving energy efficiency [1,2]. These conventional SRAMs are mostly general-purpose in the sense that they are designed without considering the specific features of the data they will store. However, application-specific features such as statistics of storage data can be exploited and incorporated into the transistor-level design to provide a new dimension towards achieving the next level of energy savings in addition to the savings provided through voltage scaling. The work in [3] is an example where an inversion bit is added for each word to reduce read-bitline (RBL) transitions in an 8T-cell-based design with a single-ended read port. Similarly, the work in [4] stores only the LSBs of each word in 6T SRAMs where occasional bit-errors at low voltages are tolerable for its application. In this work, we focus on video; however, the ideas can be generalized to different applications. In video encoders, pixel processing is performed over large partitions of image frames (e.g., 192×192 pixels), which are stored in on-chip SRAMs and accessed frequently. Image frames generally consist of smooth backgrounds or large objects where the intensity of pixels is spatially correlated. For the video image frame in Fig. 18.2.1, the deviation of each pixel's intensity from its block average for a 16×16 block shows that 76% of pixels lie within 3 LSB of the average. This additional information can be used to design an SRAM where correlation of data is used to reduce bitline activity factor which, for an 8T SRAM in a 65nm low-power CMOS process, accounts for ~50% of total energy consumption during read accesses at 0.6V. In this work, we present a prediction-based reduced-bitline-switching-activity (PB-RBSA) scheme along with a hierarchical sensing network with statistical sense-amplifier gating to exploit the correlation of storage data. Reduction of switching activity on the bitlines and in the sensing network of the memory provide up to 1.9× reduction in energy/access.","PeriodicalId":6378,"journal":{"name":"2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers","volume":"118 1","pages":"318-319"},"PeriodicalIF":0.0,"publicationDate":"2013-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76553851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-03-28DOI: 10.1109/ISSCC.2013.6487754
John Davis, Paul Bunce, Diana M. Henderson, Y. Chan, U. Srinivasan, D. Rodko, P. Patel, T. Knips, T. Werner
The L1 cache for the 5.5 GHz 32nm zEnterprise™ EC12 processor requires SRAM designs that make aggressive use of dynamic circuitry. As technology has scaled and transistor counts have grown, random device variability [1] and power limitations have become significant challenges. In particular, random device-variability-induced pulse shrinkage and misalignment in dynamic signals must be carefully addressed. Described here are a series of new design approaches enabling L1 cache SRAM operation at 7GHz, including a 3-level bitline hierarchy, decreased dynamic path lengths, localized read enables, and a power-savings mechanism in which selective columns can be partially powered down.
{"title":"7GHz L1 cache SRAMs for the 32nm zEnterprise™ EC12 processor","authors":"John Davis, Paul Bunce, Diana M. Henderson, Y. Chan, U. Srinivasan, D. Rodko, P. Patel, T. Knips, T. Werner","doi":"10.1109/ISSCC.2013.6487754","DOIUrl":"https://doi.org/10.1109/ISSCC.2013.6487754","url":null,"abstract":"The L1 cache for the 5.5 GHz 32nm zEnterprise™ EC12 processor requires SRAM designs that make aggressive use of dynamic circuitry. As technology has scaled and transistor counts have grown, random device variability [1] and power limitations have become significant challenges. In particular, random device-variability-induced pulse shrinkage and misalignment in dynamic signals must be carefully addressed. Described here are a series of new design approaches enabling L1 cache SRAM operation at 7GHz, including a 3-level bitline hierarchy, decreased dynamic path lengths, localized read enables, and a power-savings mechanism in which selective columns can be partially powered down.","PeriodicalId":6378,"journal":{"name":"2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers","volume":"6 1","pages":"324-325"},"PeriodicalIF":0.0,"publicationDate":"2013-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75343624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-03-28DOI: 10.1109/ISSCC.2013.6487696
M. Natsui, D. Suzuki, N. Sakimura, R. Nebashi, Y. Tsuji, A. Morioka, T. Sugibayashi, S. Miura, H. Honjo, K. Kinoshita, S. Ikeda, T. Endoh, H. Ohno, T. Hanyu
Nonvolatile logic-in-memory (NV-LIM) architecture [1], where magnetic tunnel junction (MTJ) devices [2] are distributed over a CMOS logic-circuit plane, has the potential of overcoming the serious power-consumption problem that has rapidly become a dominant constraint on the performance improvement of today's VLSI processors. Normally-off and instant-on capabilities with a small area penalty due to non-volatility and three-dimensional-stackability of MTJ devices in the above structure allow us to apply a power-gating technique in a fine temporal granularity, which can perfectly eliminate wasted power dissipation due to leakage current. The impact of embedding nonvolatile memory devices into a logic circuit was, however, demonstrated by using only small fabricated primitive logic-circuit elements [3], memory-like structures such as FPGA [4], or circuit simulation because of the lack of an established MTJ-oriented design flow reflecting the chip-fabrication environment, while larger-capacity and/or high-speed-access MRAM has been increasingly developed. In this paper, we present an MTJ/MOS-hybrid video coding hardware that uses a cycle-based power-gating technique for a practical-scale MTJ-based NV-LIM LSI, which is fully designed using the established semi-automated MTJ-oriented design flow.
{"title":"Nonvolatile logic-in-memory array processor in 90nm MTJ/MOS achieving 75% leakage reduction using cycle-based power gating","authors":"M. Natsui, D. Suzuki, N. Sakimura, R. Nebashi, Y. Tsuji, A. Morioka, T. Sugibayashi, S. Miura, H. Honjo, K. Kinoshita, S. Ikeda, T. Endoh, H. Ohno, T. Hanyu","doi":"10.1109/ISSCC.2013.6487696","DOIUrl":"https://doi.org/10.1109/ISSCC.2013.6487696","url":null,"abstract":"Nonvolatile logic-in-memory (NV-LIM) architecture [1], where magnetic tunnel junction (MTJ) devices [2] are distributed over a CMOS logic-circuit plane, has the potential of overcoming the serious power-consumption problem that has rapidly become a dominant constraint on the performance improvement of today's VLSI processors. Normally-off and instant-on capabilities with a small area penalty due to non-volatility and three-dimensional-stackability of MTJ devices in the above structure allow us to apply a power-gating technique in a fine temporal granularity, which can perfectly eliminate wasted power dissipation due to leakage current. The impact of embedding nonvolatile memory devices into a logic circuit was, however, demonstrated by using only small fabricated primitive logic-circuit elements [3], memory-like structures such as FPGA [4], or circuit simulation because of the lack of an established MTJ-oriented design flow reflecting the chip-fabrication environment, while larger-capacity and/or high-speed-access MRAM has been increasingly developed. In this paper, we present an MTJ/MOS-hybrid video coding hardware that uses a cycle-based power-gating technique for a practical-scale MTJ-based NV-LIM LSI, which is fully designed using the established semi-automated MTJ-oriented design flow.","PeriodicalId":6378,"journal":{"name":"2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers","volume":"248 1","pages":"194-195"},"PeriodicalIF":0.0,"publicationDate":"2013-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75701591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}