Pub Date : 2015-07-22DOI: 10.1109/ISLPED.2015.7273500
Cheng Li, P. Ampadu
Whereas buffers significantly impact Network-on-Chip (NoC) performance, they also account for up to 75% and nearly 50% of NoC router area and power respectively. Traditionally, SRAM has been used as an area and power efficient implementation of the router buffer. However, motivated by the smaller size and lower-power potential of planar embedded DRAM (eDRAM), we implement the router buffer using a 3T NMOS eDRAM for improved power and area efficiency. We demonstrate that the lifetime of flits stalled in the NoC router buffer is much shorter than the retention time of currently available eDRAM. This observation allows us to make the appropriate trade-off in size and sense-amplifier complexity to meet requirements of power and performance. A low-overhead need-based refresh mechanism is further explored. With a conservative buffer design using 65nm CMOS technology, our method reduces buffer area by up to 52% and power by 43%, while maintaining performance similar to a SRAM-based buffer. In a NoC router with 128-bit channel width, we achieve 26% and 11% reduction of total router area and power respectively. We conclude that eDRAM-based buffer is a power and area efficient alternative to SRAM-based buffer for NoC router design.
{"title":"A compact low-power eDRAM-based NoC buffer","authors":"Cheng Li, P. Ampadu","doi":"10.1109/ISLPED.2015.7273500","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273500","url":null,"abstract":"Whereas buffers significantly impact Network-on-Chip (NoC) performance, they also account for up to 75% and nearly 50% of NoC router area and power respectively. Traditionally, SRAM has been used as an area and power efficient implementation of the router buffer. However, motivated by the smaller size and lower-power potential of planar embedded DRAM (eDRAM), we implement the router buffer using a 3T NMOS eDRAM for improved power and area efficiency. We demonstrate that the lifetime of flits stalled in the NoC router buffer is much shorter than the retention time of currently available eDRAM. This observation allows us to make the appropriate trade-off in size and sense-amplifier complexity to meet requirements of power and performance. A low-overhead need-based refresh mechanism is further explored. With a conservative buffer design using 65nm CMOS technology, our method reduces buffer area by up to 52% and power by 43%, while maintaining performance similar to a SRAM-based buffer. In a NoC router with 128-bit channel width, we achieve 26% and 11% reduction of total router area and power respectively. We conclude that eDRAM-based buffer is a power and area efficient alternative to SRAM-based buffer for NoC router design.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132357580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-22DOI: 10.1109/ISLPED.2015.7273485
Tanguy Sassolas, C. Sandionigi, Alexandre Guerre, Julien Mottin, P. Vivet, H. Boussetta, N. Peltier
Modern SoCs are characterized by increasing power density and consequently increasing temperature, that directly impacts performances, reliability and cost of a device through its packaging. Thermal issues need to be predicted and mitigated as early as possible in the design flow, when the optimization opportunities are the highest. In this paper, we present an efficient framework for the design of dynamic thermal mitigation schemes based on a high-level SystemC virtual prototype tightly coupled with efficient power and thermal simulation tools. We demonstrate the benefit of our approach through silicon comparison with the SThorm 64-core architecture and provide simulation speed results making it a sound solution for the design of thermal mitigation early in the flow.
{"title":"A simulation framework for rapid prototyping and evaluation of thermal mitigation techniques in many-core architectures","authors":"Tanguy Sassolas, C. Sandionigi, Alexandre Guerre, Julien Mottin, P. Vivet, H. Boussetta, N. Peltier","doi":"10.1109/ISLPED.2015.7273485","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273485","url":null,"abstract":"Modern SoCs are characterized by increasing power density and consequently increasing temperature, that directly impacts performances, reliability and cost of a device through its packaging. Thermal issues need to be predicted and mitigated as early as possible in the design flow, when the optimization opportunities are the highest. In this paper, we present an efficient framework for the design of dynamic thermal mitigation schemes based on a high-level SystemC virtual prototype tightly coupled with efficient power and thermal simulation tools. We demonstrate the benefit of our approach through silicon comparison with the SThorm 64-core architecture and provide simulation speed results making it a sound solution for the design of thermal mitigation early in the flow.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"2009 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129446964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-22DOI: 10.1109/ISLPED.2015.7273517
A. Rahmani, M. Haghbayan, A. Kanduri, Awet Yemane Weldezion, P. Liljeberg, J. Plosila, A. Jantsch, H. Tenhunen
Power management of NoC-based many-core systems with runtime application mapping becomes more challenging in the dark silicon era. It necessitates a multi-objective control approach to consider an upper limit on total power consumption, dynamic behaviour of workloads, processing elements utilization, per-core power consumption, and load on network-on-chip. In this paper, we propose a multi-objective dynamic power management method that simultaneously considers all of these parameters. Fine-grained voltage and frequency scaling, including near-threshold operation, and per-core power gating are utilized to optimize the performance. In addition, a disturbance rejecter is designed that proactively scales down activity in running applications when a new application commences execution, to prevent sharp power budget violations. Simulations of dynamic workloads and mixed time-critical application profiles show that our method is effective in honoring the power budget while considerably boosting the system throughput and reducing power budget violation, compared to the state-of-the-art power management policies.
{"title":"Dynamic power management for many-core platforms in the dark silicon era: A multi-objective control approach","authors":"A. Rahmani, M. Haghbayan, A. Kanduri, Awet Yemane Weldezion, P. Liljeberg, J. Plosila, A. Jantsch, H. Tenhunen","doi":"10.1109/ISLPED.2015.7273517","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273517","url":null,"abstract":"Power management of NoC-based many-core systems with runtime application mapping becomes more challenging in the dark silicon era. It necessitates a multi-objective control approach to consider an upper limit on total power consumption, dynamic behaviour of workloads, processing elements utilization, per-core power consumption, and load on network-on-chip. In this paper, we propose a multi-objective dynamic power management method that simultaneously considers all of these parameters. Fine-grained voltage and frequency scaling, including near-threshold operation, and per-core power gating are utilized to optimize the performance. In addition, a disturbance rejecter is designed that proactively scales down activity in running applications when a new application commences execution, to prevent sharp power budget violations. Simulations of dynamic workloads and mixed time-critical application profiles show that our method is effective in honoring the power budget while considerably boosting the system throughput and reducing power budget violation, compared to the state-of-the-art power management policies.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129344104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-22DOI: 10.1109/ISLPED.2015.7273487
Fulya Kaplan, A. Coskun
CMOS scaling trends lead to elevated on-chip temperatures, which substantially limit the performance of today's processors. To improve thermal efficiency, Phase Change Materials (PCMs) have recently been used as passive cooling solutions. PCMs store large amount of heat at near-constant temperature during phase change, allowing strategies such as computational sprinting. While existing sprinting methods allow short performance boosts, there is significant unexplored potential in improving performance on systems with PCM-enhanced cooling. To this end, this paper proposes a novel runtime management policy driven by observations that are not captured by prior techniques: (i) PCM melts non-uniformly due to spatially heterogeneous on-chip heat distribution; (ii) power consumption during sprinting is highly application dependent and assuming a fixed sprinting power leads to lower thermal efficiency; (iii) if we monitor the remaining PCM energy at various locations, we can utilize the PCM heat storage capability much more efficiently. The proposed Adaptive Sprinting policy exploits these observations to extend sprinting duration for increased performance gains. Our policy monitors the remaining PCM energy corresponding to each core at runtime, and using this information, it decides on the number, the location and the voltage-frequency (V/f) setting of the sprinting cores. Experimental evaluation including a detailed phase change thermal model demonstrates 29% performance improvement, 22% energy savings, and 43% energy delay product (EDP) reduction on average, compared to prior strategies.
{"title":"Adaptive sprinting: How to get the most out of Phase Change based passive cooling","authors":"Fulya Kaplan, A. Coskun","doi":"10.1109/ISLPED.2015.7273487","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273487","url":null,"abstract":"CMOS scaling trends lead to elevated on-chip temperatures, which substantially limit the performance of today's processors. To improve thermal efficiency, Phase Change Materials (PCMs) have recently been used as passive cooling solutions. PCMs store large amount of heat at near-constant temperature during phase change, allowing strategies such as computational sprinting. While existing sprinting methods allow short performance boosts, there is significant unexplored potential in improving performance on systems with PCM-enhanced cooling. To this end, this paper proposes a novel runtime management policy driven by observations that are not captured by prior techniques: (i) PCM melts non-uniformly due to spatially heterogeneous on-chip heat distribution; (ii) power consumption during sprinting is highly application dependent and assuming a fixed sprinting power leads to lower thermal efficiency; (iii) if we monitor the remaining PCM energy at various locations, we can utilize the PCM heat storage capability much more efficiently. The proposed Adaptive Sprinting policy exploits these observations to extend sprinting duration for increased performance gains. Our policy monitors the remaining PCM energy corresponding to each core at runtime, and using this information, it decides on the number, the location and the voltage-frequency (V/f) setting of the sprinting cores. Experimental evaluation including a detailed phase change thermal model demonstrates 29% performance improvement, 22% energy savings, and 43% energy delay product (EDP) reduction on average, compared to prior strategies.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116628223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dynamic voltage scaling (DVS) has proven effective in minimizing the power consumption of OLED displays, resulting only in minimal image distortion. This technique has been extended to perform zone-specific DVS by dividing the panel area into zones and applying independent DVS to each zone based on the displayed content. The application of the latter technique to large-area OLED displays has not been done in part due to a high overhead of its dedicated DC-DC converter for each zone and low conversion efficiency when the load current of each converter lies outside the desirable range. To address this issue, this work proposes a reconfigurable power delivery network architecture, comprised of a small number of DC-DC converters, a switch network and an online controller, to realize fine-grained (zone-specific) DVS in large-area OLED display panels. The proposed framework consistently achieves high power conversion efficiency and significant energy saving while preserving the image quality. Experimental results demonstrate that up to 36% power savings can be achieved in a 65" 4K Ultra high-definition OLED display by using the proposed framework.
{"title":"Design and optimization of a reconfigurable power delivery network for large-area, DVS-enabled OLED displays","authors":"Woojoo Lee, Yanzhi Wang, Donghwa Shin, Shahin Nazarian, Massoud Pedram","doi":"10.1109/ISLPED.2015.7273507","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273507","url":null,"abstract":"Dynamic voltage scaling (DVS) has proven effective in minimizing the power consumption of OLED displays, resulting only in minimal image distortion. This technique has been extended to perform zone-specific DVS by dividing the panel area into zones and applying independent DVS to each zone based on the displayed content. The application of the latter technique to large-area OLED displays has not been done in part due to a high overhead of its dedicated DC-DC converter for each zone and low conversion efficiency when the load current of each converter lies outside the desirable range. To address this issue, this work proposes a reconfigurable power delivery network architecture, comprised of a small number of DC-DC converters, a switch network and an online controller, to realize fine-grained (zone-specific) DVS in large-area OLED display panels. The proposed framework consistently achieves high power conversion efficiency and significant energy saving while preserving the image quality. Experimental results demonstrate that up to 36% power savings can be achieved in a 65\" 4K Ultra high-definition OLED display by using the proposed framework.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122739938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-22DOI: 10.1109/ISLPED.2015.7273541
Byungkyu Song, T. Na, Seong-ook Jung, Jung Pill Kim, Seung H. Kang
A global reference-circuit (RC), which means one RC is shared with many sensing circuits (SC), is being considered for high-bandwidth STT-RAMs because of the low power consumption and small area characteristic. However, using the global RC for high-bandwidth STT-RAMs causes a droop effect and coupling noise effect, leading to the significant performance degradation. Thus, the validity of using the global RC should be identified. In this paper, the local RC and various global RCs are introduced, and compared in aspects of area, sensing time, and power consumption. By classification of the merits and demerits of various RCs, we present the following requirements of proper RC for high-bandwidth STT-RAMs: 1) small area, 2) no performance degradation, 3) low power consumption, and 4) process variation tolerant reference signal generation.
{"title":"Reference-circuit analysis for high-bandwidth spin transfer torque random access memory","authors":"Byungkyu Song, T. Na, Seong-ook Jung, Jung Pill Kim, Seung H. Kang","doi":"10.1109/ISLPED.2015.7273541","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273541","url":null,"abstract":"A global reference-circuit (RC), which means one RC is shared with many sensing circuits (SC), is being considered for high-bandwidth STT-RAMs because of the low power consumption and small area characteristic. However, using the global RC for high-bandwidth STT-RAMs causes a droop effect and coupling noise effect, leading to the significant performance degradation. Thus, the validity of using the global RC should be identified. In this paper, the local RC and various global RCs are introduced, and compared in aspects of area, sensing time, and power consumption. By classification of the merits and demerits of various RCs, we present the following requirements of proper RC for high-bandwidth STT-RAMs: 1) small area, 2) no performance degradation, 3) low power consumption, and 4) process variation tolerant reference signal generation.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122042094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-22DOI: 10.1109/ISLPED.2015.7273508
Anup Das, M. J. Walker, Andreas Hansson, B. Al-Hashimi, G. Merrett
Applications running on smartphones interact with the hardware and the system software differently, resulting in widely varying power consumption and hence thermal profiles. Typically, these smartphone platforms expose some hardware power control features to users, controlled through software governors such as cpufreq for dynamic voltage-frequency scaling (DVFS) and cpuquiet for dynamic core selection (DCS). Operating systems on these platforms manage these governors conservatively, independent of application's performance requirement. To address this, we propose an alternative approach, which uses reinforcement learning to explore the trade-off between power saving opportunities using DVFS and DCS and application's performance at run-time. The objective is to reduce power consumption, taking into consideration dynamic power, leakage power, and the inter-dependency between temperature and power. The reinforcement learning-based control is validated as a case-study on ARM A15-based nvidia's tegra smartphone through its implementation as a run-time manager (RTM). This RTM interfaces with different hardware performance counters and the embedded Linux Operating System through (1) the cpuquiet API to select cores at run-time; and (2) the cpufreq API to scale the frequency of active cores. Experiments with mobile and high performance applications demonstrate that the proposed approach achieves an average 22% (7-40%) power reduction compared to existing techniques.
{"title":"Hardware-software interaction for run-time power optimization: A case study of embedded Linux on multicore smartphones","authors":"Anup Das, M. J. Walker, Andreas Hansson, B. Al-Hashimi, G. Merrett","doi":"10.1109/ISLPED.2015.7273508","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273508","url":null,"abstract":"Applications running on smartphones interact with the hardware and the system software differently, resulting in widely varying power consumption and hence thermal profiles. Typically, these smartphone platforms expose some hardware power control features to users, controlled through software governors such as cpufreq for dynamic voltage-frequency scaling (DVFS) and cpuquiet for dynamic core selection (DCS). Operating systems on these platforms manage these governors conservatively, independent of application's performance requirement. To address this, we propose an alternative approach, which uses reinforcement learning to explore the trade-off between power saving opportunities using DVFS and DCS and application's performance at run-time. The objective is to reduce power consumption, taking into consideration dynamic power, leakage power, and the inter-dependency between temperature and power. The reinforcement learning-based control is validated as a case-study on ARM A15-based nvidia's tegra smartphone through its implementation as a run-time manager (RTM). This RTM interfaces with different hardware performance counters and the embedded Linux Operating System through (1) the cpuquiet API to select cores at run-time; and (2) the cpufreq API to scale the frequency of active cores. Experiments with mobile and high performance applications demonstrate that the proposed approach achieves an average 22% (7-40%) power reduction compared to existing techniques.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131951857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-22DOI: 10.1109/ISLPED.2015.7273497
Cagla Cakir, R. Ho, J. Lexau, K. Mai
As process technologies have scaled, the increasing number of processor cores and memories on a single die has also driven the need for more complex on-chip interconnection networks. Crossbar switches are primary building blocks in such networks-on-chip, as they can be used as fast single-stage networks or as the core of the router switch in multi-stage networks. While crossbars offer non-blocking, single-hop, all-to-all communication, they tend to scale poorly with the number of nodes due to the latency and energy of the long wires and high-radix multiplexor structures needed. To combat these limitations, we propose a low-swing crossbar design that uses capacitively driven wires and capacitively coupled multiplexers. Capacitively driven wires offer low swing signaling, higher bandwidths, and low energy consumption, while capacitively coupled multiplexers offer reduced parasitic loading from the inactive inputs. We present a 16×16 64b low-swing crossbar switch designed in a TSMC 40nm CMOS bulk process. Post-layout simulation shows it operating at a maximum frequency of 2.2GHz, achieving a bandwidth of 2.56Tb/s at 0.9V (nominal Vdd) with an area of 0.94mm2. Total energy consumption for full, half, and minimum bandwidths are 110pJ, 84pJ, and 64pJ respectively, thus offering an efficiency of 10.49 Tbps/W, a 3X improvement over previously published results.
{"title":"High-efficiency crossbar switches using capacitively coupled signaling","authors":"Cagla Cakir, R. Ho, J. Lexau, K. Mai","doi":"10.1109/ISLPED.2015.7273497","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273497","url":null,"abstract":"As process technologies have scaled, the increasing number of processor cores and memories on a single die has also driven the need for more complex on-chip interconnection networks. Crossbar switches are primary building blocks in such networks-on-chip, as they can be used as fast single-stage networks or as the core of the router switch in multi-stage networks. While crossbars offer non-blocking, single-hop, all-to-all communication, they tend to scale poorly with the number of nodes due to the latency and energy of the long wires and high-radix multiplexor structures needed. To combat these limitations, we propose a low-swing crossbar design that uses capacitively driven wires and capacitively coupled multiplexers. Capacitively driven wires offer low swing signaling, higher bandwidths, and low energy consumption, while capacitively coupled multiplexers offer reduced parasitic loading from the inactive inputs. We present a 16×16 64b low-swing crossbar switch designed in a TSMC 40nm CMOS bulk process. Post-layout simulation shows it operating at a maximum frequency of 2.2GHz, achieving a bandwidth of 2.56Tb/s at 0.9V (nominal Vdd) with an area of 0.94mm2. Total energy consumption for full, half, and minimum bandwidths are 110pJ, 84pJ, and 64pJ respectively, thus offering an efficiency of 10.49 Tbps/W, a 3X improvement over previously published results.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"78 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133605077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-22DOI: 10.1109/ISLPED.2015.7273489
Wei Zhang, Hang Zhang, J. Lach
To minimize the access latency of set-associative caches, the data in all ways are read out in parallel with the tag lookup. However, this is energy inefficient, as only the data from the matching way is used and the others are discarded. This paper proposes an early tag lookup (ETL) technique for L1 instruction caches that determines the matching way one cycle earlier than the cache access, so that only the matching data way need be accessed. ETL incurs no performance penalty and insignificant hardware overhead. Evaluation on a 4-way set-associative L1 instruction cache in 45nm technology shows that ETL reduces the read energy by 68% on average.
{"title":"Reducing dynamic energy of set-associative L1 instruction cache by early tag lookup","authors":"Wei Zhang, Hang Zhang, J. Lach","doi":"10.1109/ISLPED.2015.7273489","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273489","url":null,"abstract":"To minimize the access latency of set-associative caches, the data in all ways are read out in parallel with the tag lookup. However, this is energy inefficient, as only the data from the matching way is used and the others are discarded. This paper proposes an early tag lookup (ETL) technique for L1 instruction caches that determines the matching way one cycle earlier than the cache access, so that only the matching data way need be accessed. ETL incurs no performance penalty and insignificant hardware overhead. Evaluation on a 4-way set-associative L1 instruction cache in 45nm technology shows that ETL reduces the read energy by 68% on average.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116314135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-22DOI: 10.1109/ISLPED.2015.7273527
A. Romani, Antonio Camarda, A. Baldazzi, M. Tartagni
This paper introduces the use of piezoelectric transformers (PTs) as key elements for ultra-low voltage start-up circuits for battery-less energy harvesting applications. Firstly, a step-up oscillator topology based on a PT and a JFET is presented. The circuit is able to start from voltages as low as 16 mV, and to boost the output voltage up to 1.32 V in a no load condition. In order to validate the proposed approach, a surrounding power management and conversion circuit is developed. This circuit is able to automatically enable a boost DC/DC converter once the start-up circuit has generated a sufficient voltage. The whole circuit self-starts with an input voltage of 30 mV, and the maximum conversion efficiency referred to the maximum power point (MPP) is higher than 40%, with an intrinsic current consumption as low as 1.3 μA.
{"title":"A micropower energy harvesting circuit with piezoelectric transformer-based ultra-low voltage start-up","authors":"A. Romani, Antonio Camarda, A. Baldazzi, M. Tartagni","doi":"10.1109/ISLPED.2015.7273527","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273527","url":null,"abstract":"This paper introduces the use of piezoelectric transformers (PTs) as key elements for ultra-low voltage start-up circuits for battery-less energy harvesting applications. Firstly, a step-up oscillator topology based on a PT and a JFET is presented. The circuit is able to start from voltages as low as 16 mV, and to boost the output voltage up to 1.32 V in a no load condition. In order to validate the proposed approach, a surrounding power management and conversion circuit is developed. This circuit is able to automatically enable a boost DC/DC converter once the start-up circuit has generated a sufficient voltage. The whole circuit self-starts with an input voltage of 30 mV, and the maximum conversion efficiency referred to the maximum power point (MPP) is higher than 40%, with an intrinsic current consumption as low as 1.3 μA.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117304305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}