首页 > 最新文献

2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)最新文献

英文 中文
Post placement leakage reduction with stress-enhanced filler cells 后放置泄漏减少应力增强填充电池
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273531
J. Choy, V. Sukharev, A. Kteyan, Henrik Hovsepyan, R. Venkatraman, R. Castagnetti
A novel methodology for the post placement leakage reduction based on employment of the stress-enhanced filler (SEF) cells was developed. Desired reduction of sub-threshold leakage in test chip silicon was achieved by placement of SEF cells close to the most leaking devices. In the standard cell rows the “optimization zones”, representing portions of the row located between two consecutive fixed cells (clock cells, etc.), were defined. Mentor Graphics' stress assessment tool was used to find the optimal locations for SEF insertion inside each zone, providing the maximal increase of threshold voltage of the leakiest transistors. Measurements performed on the processed silicon test chip have confirmed the predicted leakage reduction of 10-15 percent while keeping same electrical performance.
开发了一种基于应力增强填料(SEF)电池的后放置泄漏减少新方法。通过将SEF电池放置在泄漏最多的器件附近,可以减少测试芯片硅中的亚阈值泄漏。在标准单元格行中,定义了“优化区域”,表示位于两个连续固定单元格(时钟单元格等)之间的行部分。使用Mentor Graphics的应力评估工具在每个区域内找到SEF插入的最佳位置,提供最漏晶体管的阈值电压的最大增幅。在加工硅测试芯片上进行的测量证实,在保持相同电气性能的情况下,预测的泄漏减少了10- 15%。
{"title":"Post placement leakage reduction with stress-enhanced filler cells","authors":"J. Choy, V. Sukharev, A. Kteyan, Henrik Hovsepyan, R. Venkatraman, R. Castagnetti","doi":"10.1109/ISLPED.2015.7273531","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273531","url":null,"abstract":"A novel methodology for the post placement leakage reduction based on employment of the stress-enhanced filler (SEF) cells was developed. Desired reduction of sub-threshold leakage in test chip silicon was achieved by placement of SEF cells close to the most leaking devices. In the standard cell rows the “optimization zones”, representing portions of the row located between two consecutive fixed cells (clock cells, etc.), were defined. Mentor Graphics' stress assessment tool was used to find the optimal locations for SEF insertion inside each zone, providing the maximal increase of threshold voltage of the leakiest transistors. Measurements performed on the processed silicon test chip have confirmed the predicted leakage reduction of 10-15 percent while keeping same electrical performance.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131872588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reducing dynamic energy of set-associative L1 instruction cache by early tag lookup 通过早期标签查找减少集合关联L1指令缓存的动态能量
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273489
Wei Zhang, Hang Zhang, J. Lach
To minimize the access latency of set-associative caches, the data in all ways are read out in parallel with the tag lookup. However, this is energy inefficient, as only the data from the matching way is used and the others are discarded. This paper proposes an early tag lookup (ETL) technique for L1 instruction caches that determines the matching way one cycle earlier than the cache access, so that only the matching data way need be accessed. ETL incurs no performance penalty and insignificant hardware overhead. Evaluation on a 4-way set-associative L1 instruction cache in 45nm technology shows that ETL reduces the read energy by 68% on average.
为了最小化集合关联缓存的访问延迟,所有方式的数据都是与标签查找并行读取的。然而,这是能源效率低下的,因为只有来自匹配方式的数据被使用,而其他数据被丢弃。本文提出了一种L1指令缓存的早期标签查找(ETL)技术,该技术比缓存访问早一个周期确定匹配方式,从而只需要访问匹配的数据方式。ETL不会导致性能损失和微不足道的硬件开销。对45nm技术的4路集关联L1指令缓存的评估表明,ETL平均减少了68%的读取能量。
{"title":"Reducing dynamic energy of set-associative L1 instruction cache by early tag lookup","authors":"Wei Zhang, Hang Zhang, J. Lach","doi":"10.1109/ISLPED.2015.7273489","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273489","url":null,"abstract":"To minimize the access latency of set-associative caches, the data in all ways are read out in parallel with the tag lookup. However, this is energy inefficient, as only the data from the matching way is used and the others are discarded. This paper proposes an early tag lookup (ETL) technique for L1 instruction caches that determines the matching way one cycle earlier than the cache access, so that only the matching data way need be accessed. ETL incurs no performance penalty and insignificant hardware overhead. Evaluation on a 4-way set-associative L1 instruction cache in 45nm technology shows that ETL reduces the read energy by 68% on average.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116314135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Adaptive sprinting: How to get the most out of Phase Change based passive cooling 自适应冲刺:如何充分利用基于相变的被动冷却
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273487
Fulya Kaplan, A. Coskun
CMOS scaling trends lead to elevated on-chip temperatures, which substantially limit the performance of today's processors. To improve thermal efficiency, Phase Change Materials (PCMs) have recently been used as passive cooling solutions. PCMs store large amount of heat at near-constant temperature during phase change, allowing strategies such as computational sprinting. While existing sprinting methods allow short performance boosts, there is significant unexplored potential in improving performance on systems with PCM-enhanced cooling. To this end, this paper proposes a novel runtime management policy driven by observations that are not captured by prior techniques: (i) PCM melts non-uniformly due to spatially heterogeneous on-chip heat distribution; (ii) power consumption during sprinting is highly application dependent and assuming a fixed sprinting power leads to lower thermal efficiency; (iii) if we monitor the remaining PCM energy at various locations, we can utilize the PCM heat storage capability much more efficiently. The proposed Adaptive Sprinting policy exploits these observations to extend sprinting duration for increased performance gains. Our policy monitors the remaining PCM energy corresponding to each core at runtime, and using this information, it decides on the number, the location and the voltage-frequency (V/f) setting of the sprinting cores. Experimental evaluation including a detailed phase change thermal model demonstrates 29% performance improvement, 22% energy savings, and 43% energy delay product (EDP) reduction on average, compared to prior strategies.
CMOS缩放趋势导致芯片上温度升高,这大大限制了当今处理器的性能。为了提高热效率,相变材料(PCMs)最近被用作被动冷却解决方案。pcm在相变过程中以近乎恒定的温度储存大量热量,从而实现计算冲刺等策略。虽然现有的冲刺方法可以在短时间内提高性能,但在提高pcm增强冷却系统的性能方面,仍有很大的潜力有待开发。为此,本文提出了一种新的运行时管理策略,该策略由先前技术无法捕获的观测数据驱动:(i)由于片上热量分布的空间异质性,PCM熔化不均匀;(ii)冲刺过程中的功耗高度依赖于应用,假设固定的冲刺功率会导致热效率降低;(iii)如果我们在不同地点监测剩余的PCM能量,我们可以更有效地利用PCM的储热能力。提出的自适应冲刺策略利用这些观察结果来延长冲刺时间,以提高性能。我们的策略在运行时监控每个内核对应的剩余PCM能量,并使用这些信息来决定冲刺内核的数量、位置和电压频率(V/f)设置。包括详细相变热模型在内的实验评估表明,与之前的策略相比,该策略的性能平均提高了29%,节能22%,能量延迟积(EDP)平均降低了43%。
{"title":"Adaptive sprinting: How to get the most out of Phase Change based passive cooling","authors":"Fulya Kaplan, A. Coskun","doi":"10.1109/ISLPED.2015.7273487","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273487","url":null,"abstract":"CMOS scaling trends lead to elevated on-chip temperatures, which substantially limit the performance of today's processors. To improve thermal efficiency, Phase Change Materials (PCMs) have recently been used as passive cooling solutions. PCMs store large amount of heat at near-constant temperature during phase change, allowing strategies such as computational sprinting. While existing sprinting methods allow short performance boosts, there is significant unexplored potential in improving performance on systems with PCM-enhanced cooling. To this end, this paper proposes a novel runtime management policy driven by observations that are not captured by prior techniques: (i) PCM melts non-uniformly due to spatially heterogeneous on-chip heat distribution; (ii) power consumption during sprinting is highly application dependent and assuming a fixed sprinting power leads to lower thermal efficiency; (iii) if we monitor the remaining PCM energy at various locations, we can utilize the PCM heat storage capability much more efficiently. The proposed Adaptive Sprinting policy exploits these observations to extend sprinting duration for increased performance gains. Our policy monitors the remaining PCM energy corresponding to each core at runtime, and using this information, it decides on the number, the location and the voltage-frequency (V/f) setting of the sprinting cores. Experimental evaluation including a detailed phase change thermal model demonstrates 29% performance improvement, 22% energy savings, and 43% energy delay product (EDP) reduction on average, compared to prior strategies.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116628223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A simulation framework for rapid prototyping and evaluation of thermal mitigation techniques in many-core architectures 多核架构中热缓解技术快速原型设计和评估的仿真框架
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273485
Tanguy Sassolas, C. Sandionigi, Alexandre Guerre, Julien Mottin, P. Vivet, H. Boussetta, N. Peltier
Modern SoCs are characterized by increasing power density and consequently increasing temperature, that directly impacts performances, reliability and cost of a device through its packaging. Thermal issues need to be predicted and mitigated as early as possible in the design flow, when the optimization opportunities are the highest. In this paper, we present an efficient framework for the design of dynamic thermal mitigation schemes based on a high-level SystemC virtual prototype tightly coupled with efficient power and thermal simulation tools. We demonstrate the benefit of our approach through silicon comparison with the SThorm 64-core architecture and provide simulation speed results making it a sound solution for the design of thermal mitigation early in the flow.
现代soc的特点是功率密度增加,因此温度升高,通过其封装直接影响器件的性能,可靠性和成本。在设计流程中,当优化机会最高时,需要尽早预测和缓解热问题。在本文中,我们提出了一个基于高级SystemC虚拟样机与高效功率和热仿真工具紧密耦合的动态热缓解方案设计的有效框架。通过与SThorm 64核架构的芯片比较,我们展示了该方法的优势,并提供了仿真速度结果,使其成为流程早期热缓解设计的可靠解决方案。
{"title":"A simulation framework for rapid prototyping and evaluation of thermal mitigation techniques in many-core architectures","authors":"Tanguy Sassolas, C. Sandionigi, Alexandre Guerre, Julien Mottin, P. Vivet, H. Boussetta, N. Peltier","doi":"10.1109/ISLPED.2015.7273485","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273485","url":null,"abstract":"Modern SoCs are characterized by increasing power density and consequently increasing temperature, that directly impacts performances, reliability and cost of a device through its packaging. Thermal issues need to be predicted and mitigated as early as possible in the design flow, when the optimization opportunities are the highest. In this paper, we present an efficient framework for the design of dynamic thermal mitigation schemes based on a high-level SystemC virtual prototype tightly coupled with efficient power and thermal simulation tools. We demonstrate the benefit of our approach through silicon comparison with the SThorm 64-core architecture and provide simulation speed results making it a sound solution for the design of thermal mitigation early in the flow.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"2009 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129446964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Reference-circuit analysis for high-bandwidth spin transfer torque random access memory 高带宽自旋传递转矩随机存取存储器参考电路分析
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273541
Byungkyu Song, T. Na, Seong-ook Jung, Jung Pill Kim, Seung H. Kang
A global reference-circuit (RC), which means one RC is shared with many sensing circuits (SC), is being considered for high-bandwidth STT-RAMs because of the low power consumption and small area characteristic. However, using the global RC for high-bandwidth STT-RAMs causes a droop effect and coupling noise effect, leading to the significant performance degradation. Thus, the validity of using the global RC should be identified. In this paper, the local RC and various global RCs are introduced, and compared in aspects of area, sensing time, and power consumption. By classification of the merits and demerits of various RCs, we present the following requirements of proper RC for high-bandwidth STT-RAMs: 1) small area, 2) no performance degradation, 3) low power consumption, and 4) process variation tolerant reference signal generation.
由于低功耗和小面积特性,高带宽stt - ram正在考虑采用全局参考电路(RC),即一个RC与多个传感电路(SC)共享。然而,在高带宽stt - ram中使用全局RC会产生下垂效应和耦合噪声效应,导致性能显著下降。因此,应该确定使用全局RC的有效性。本文介绍了局部RC和各种全局RC,并在面积、传感时间和功耗等方面进行了比较。通过对各种RC的优缺点进行分类,我们提出了用于高带宽stt - ram的适当RC的以下要求:1)小面积,2)无性能下降,3)低功耗,以及4)过程变化容忍参考信号生成。
{"title":"Reference-circuit analysis for high-bandwidth spin transfer torque random access memory","authors":"Byungkyu Song, T. Na, Seong-ook Jung, Jung Pill Kim, Seung H. Kang","doi":"10.1109/ISLPED.2015.7273541","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273541","url":null,"abstract":"A global reference-circuit (RC), which means one RC is shared with many sensing circuits (SC), is being considered for high-bandwidth STT-RAMs because of the low power consumption and small area characteristic. However, using the global RC for high-bandwidth STT-RAMs causes a droop effect and coupling noise effect, leading to the significant performance degradation. Thus, the validity of using the global RC should be identified. In this paper, the local RC and various global RCs are introduced, and compared in aspects of area, sensing time, and power consumption. By classification of the merits and demerits of various RCs, we present the following requirements of proper RC for high-bandwidth STT-RAMs: 1) small area, 2) no performance degradation, 3) low power consumption, and 4) process variation tolerant reference signal generation.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122042094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Design and optimization of a reconfigurable power delivery network for large-area, DVS-enabled OLED displays 设计和优化用于大面积、支持dvs的OLED显示器的可重构电源传输网络
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273507
Woojoo Lee, Yanzhi Wang, Donghwa Shin, Shahin Nazarian, Massoud Pedram
Dynamic voltage scaling (DVS) has proven effective in minimizing the power consumption of OLED displays, resulting only in minimal image distortion. This technique has been extended to perform zone-specific DVS by dividing the panel area into zones and applying independent DVS to each zone based on the displayed content. The application of the latter technique to large-area OLED displays has not been done in part due to a high overhead of its dedicated DC-DC converter for each zone and low conversion efficiency when the load current of each converter lies outside the desirable range. To address this issue, this work proposes a reconfigurable power delivery network architecture, comprised of a small number of DC-DC converters, a switch network and an online controller, to realize fine-grained (zone-specific) DVS in large-area OLED display panels. The proposed framework consistently achieves high power conversion efficiency and significant energy saving while preserving the image quality. Experimental results demonstrate that up to 36% power savings can be achieved in a 65" 4K Ultra high-definition OLED display by using the proposed framework.
动态电压缩放(DVS)已被证明在最大限度地降低OLED显示器的功耗方面是有效的,仅导致最小的图像失真。通过将面板区域划分为多个分区,并根据显示的内容为每个分区应用独立的分布式交换机,该技术已扩展到执行特定于分区的分布式交换机。后一种技术在大面积OLED显示器上的应用尚未完成,部分原因是其每个区域的专用DC-DC转换器的开销很高,并且当每个转换器的负载电流超出理想范围时转换效率较低。为了解决这一问题,本研究提出了一种可重构的电力输送网络架构,该架构由少量DC-DC转换器、一个交换网络和一个在线控制器组成,以实现大面积OLED显示面板上的细粒度(特定区域)分布式交换机。该框架在保证图像质量的同时,实现了高功率转换效率和显著节能。实验结果表明,采用该框架可在65英寸4K超高清OLED显示器上节省高达36%的功耗。
{"title":"Design and optimization of a reconfigurable power delivery network for large-area, DVS-enabled OLED displays","authors":"Woojoo Lee, Yanzhi Wang, Donghwa Shin, Shahin Nazarian, Massoud Pedram","doi":"10.1109/ISLPED.2015.7273507","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273507","url":null,"abstract":"Dynamic voltage scaling (DVS) has proven effective in minimizing the power consumption of OLED displays, resulting only in minimal image distortion. This technique has been extended to perform zone-specific DVS by dividing the panel area into zones and applying independent DVS to each zone based on the displayed content. The application of the latter technique to large-area OLED displays has not been done in part due to a high overhead of its dedicated DC-DC converter for each zone and low conversion efficiency when the load current of each converter lies outside the desirable range. To address this issue, this work proposes a reconfigurable power delivery network architecture, comprised of a small number of DC-DC converters, a switch network and an online controller, to realize fine-grained (zone-specific) DVS in large-area OLED display panels. The proposed framework consistently achieves high power conversion efficiency and significant energy saving while preserving the image quality. Experimental results demonstrate that up to 36% power savings can be achieved in a 65\" 4K Ultra high-definition OLED display by using the proposed framework.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122739938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Dynamic power management for many-core platforms in the dark silicon era: A multi-objective control approach 暗硅时代多核平台的动态电源管理:多目标控制方法
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273517
A. Rahmani, M. Haghbayan, A. Kanduri, Awet Yemane Weldezion, P. Liljeberg, J. Plosila, A. Jantsch, H. Tenhunen
Power management of NoC-based many-core systems with runtime application mapping becomes more challenging in the dark silicon era. It necessitates a multi-objective control approach to consider an upper limit on total power consumption, dynamic behaviour of workloads, processing elements utilization, per-core power consumption, and load on network-on-chip. In this paper, we propose a multi-objective dynamic power management method that simultaneously considers all of these parameters. Fine-grained voltage and frequency scaling, including near-threshold operation, and per-core power gating are utilized to optimize the performance. In addition, a disturbance rejecter is designed that proactively scales down activity in running applications when a new application commences execution, to prevent sharp power budget violations. Simulations of dynamic workloads and mixed time-critical application profiles show that our method is effective in honoring the power budget while considerably boosting the system throughput and reducing power budget violation, compared to the state-of-the-art power management policies.
在暗硅时代,基于运行时应用程序映射的多核系统的电源管理变得更具挑战性。它需要一种多目标控制方法来考虑总功耗的上限、工作负载的动态行为、处理元素的利用率、每核功耗和片上网络负载。在本文中,我们提出了一种同时考虑所有这些参数的多目标动态电源管理方法。细粒度的电压和频率缩放(包括近阈值操作)和每核功率门控被用于优化性能。此外,还设计了干扰抑制器,当新应用程序开始执行时,该干扰抑制器可以主动缩小正在运行的应用程序中的活动,以防止严重的功率预算违规。动态工作负载和混合时间关键型应用程序配置文件的仿真表明,与最先进的电源管理策略相比,我们的方法在遵守功率预算的同时有效地提高了系统吞吐量并减少了功率预算违规。
{"title":"Dynamic power management for many-core platforms in the dark silicon era: A multi-objective control approach","authors":"A. Rahmani, M. Haghbayan, A. Kanduri, Awet Yemane Weldezion, P. Liljeberg, J. Plosila, A. Jantsch, H. Tenhunen","doi":"10.1109/ISLPED.2015.7273517","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273517","url":null,"abstract":"Power management of NoC-based many-core systems with runtime application mapping becomes more challenging in the dark silicon era. It necessitates a multi-objective control approach to consider an upper limit on total power consumption, dynamic behaviour of workloads, processing elements utilization, per-core power consumption, and load on network-on-chip. In this paper, we propose a multi-objective dynamic power management method that simultaneously considers all of these parameters. Fine-grained voltage and frequency scaling, including near-threshold operation, and per-core power gating are utilized to optimize the performance. In addition, a disturbance rejecter is designed that proactively scales down activity in running applications when a new application commences execution, to prevent sharp power budget violations. Simulations of dynamic workloads and mixed time-critical application profiles show that our method is effective in honoring the power budget while considerably boosting the system throughput and reducing power budget violation, compared to the state-of-the-art power management policies.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129344104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
High-efficiency crossbar switches using capacitively coupled signaling 使用电容耦合信号的高效交叉开关
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273497
Cagla Cakir, R. Ho, J. Lexau, K. Mai
As process technologies have scaled, the increasing number of processor cores and memories on a single die has also driven the need for more complex on-chip interconnection networks. Crossbar switches are primary building blocks in such networks-on-chip, as they can be used as fast single-stage networks or as the core of the router switch in multi-stage networks. While crossbars offer non-blocking, single-hop, all-to-all communication, they tend to scale poorly with the number of nodes due to the latency and energy of the long wires and high-radix multiplexor structures needed. To combat these limitations, we propose a low-swing crossbar design that uses capacitively driven wires and capacitively coupled multiplexers. Capacitively driven wires offer low swing signaling, higher bandwidths, and low energy consumption, while capacitively coupled multiplexers offer reduced parasitic loading from the inactive inputs. We present a 16×16 64b low-swing crossbar switch designed in a TSMC 40nm CMOS bulk process. Post-layout simulation shows it operating at a maximum frequency of 2.2GHz, achieving a bandwidth of 2.56Tb/s at 0.9V (nominal Vdd) with an area of 0.94mm2. Total energy consumption for full, half, and minimum bandwidths are 110pJ, 84pJ, and 64pJ respectively, thus offering an efficiency of 10.49 Tbps/W, a 3X improvement over previously published results.
随着工艺技术的发展,单个芯片上的处理器内核和存储器数量的增加也推动了对更复杂的片上互连网络的需求。Crossbar交换机是这种片上网络的主要组成部分,因为它们可以用作快速的单级网络,也可以用作多级网络中路由器交换机的核心。虽然crossbar提供非阻塞、单跳、全对全的通信,但由于所需的长线路和高基数多路复用结构的延迟和能量,它们往往无法随节点数量而扩展。为了克服这些限制,我们提出了一种使用电容驱动导线和电容耦合多路复用器的低摆幅横杆设计。电容驱动线提供低摆幅信号,更高的带宽和低能耗,而电容耦合多路复用器提供减少来自非活动输入的寄生负载。我们提出了一种16×16 64b低摆幅横杆开关,采用台积电40nm CMOS批量工艺设计。布局后仿真结果表明,其最大工作频率为2.2GHz,在0.9V(标称Vdd)下实现了2.56Tb/s的带宽,面积为0.94mm2。全带宽、半带宽和最小带宽的总能耗分别为110pJ、84pJ和64pJ,因此提供10.49 Tbps/W的效率,比以前发表的结果提高了3倍。
{"title":"High-efficiency crossbar switches using capacitively coupled signaling","authors":"Cagla Cakir, R. Ho, J. Lexau, K. Mai","doi":"10.1109/ISLPED.2015.7273497","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273497","url":null,"abstract":"As process technologies have scaled, the increasing number of processor cores and memories on a single die has also driven the need for more complex on-chip interconnection networks. Crossbar switches are primary building blocks in such networks-on-chip, as they can be used as fast single-stage networks or as the core of the router switch in multi-stage networks. While crossbars offer non-blocking, single-hop, all-to-all communication, they tend to scale poorly with the number of nodes due to the latency and energy of the long wires and high-radix multiplexor structures needed. To combat these limitations, we propose a low-swing crossbar design that uses capacitively driven wires and capacitively coupled multiplexers. Capacitively driven wires offer low swing signaling, higher bandwidths, and low energy consumption, while capacitively coupled multiplexers offer reduced parasitic loading from the inactive inputs. We present a 16×16 64b low-swing crossbar switch designed in a TSMC 40nm CMOS bulk process. Post-layout simulation shows it operating at a maximum frequency of 2.2GHz, achieving a bandwidth of 2.56Tb/s at 0.9V (nominal Vdd) with an area of 0.94mm2. Total energy consumption for full, half, and minimum bandwidths are 110pJ, 84pJ, and 64pJ respectively, thus offering an efficiency of 10.49 Tbps/W, a 3X improvement over previously published results.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"78 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133605077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Hardware-software interaction for run-time power optimization: A case study of embedded Linux on multicore smartphones 运行时功耗优化的软硬件交互:多核智能手机上嵌入式Linux的案例研究
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273508
Anup Das, M. J. Walker, Andreas Hansson, B. Al-Hashimi, G. Merrett
Applications running on smartphones interact with the hardware and the system software differently, resulting in widely varying power consumption and hence thermal profiles. Typically, these smartphone platforms expose some hardware power control features to users, controlled through software governors such as cpufreq for dynamic voltage-frequency scaling (DVFS) and cpuquiet for dynamic core selection (DCS). Operating systems on these platforms manage these governors conservatively, independent of application's performance requirement. To address this, we propose an alternative approach, which uses reinforcement learning to explore the trade-off between power saving opportunities using DVFS and DCS and application's performance at run-time. The objective is to reduce power consumption, taking into consideration dynamic power, leakage power, and the inter-dependency between temperature and power. The reinforcement learning-based control is validated as a case-study on ARM A15-based nvidia's tegra smartphone through its implementation as a run-time manager (RTM). This RTM interfaces with different hardware performance counters and the embedded Linux Operating System through (1) the cpuquiet API to select cores at run-time; and (2) the cpufreq API to scale the frequency of active cores. Experiments with mobile and high performance applications demonstrate that the proposed approach achieves an average 22% (7-40%) power reduction compared to existing techniques.
在智能手机上运行的应用程序与硬件和系统软件的交互方式不同,导致功耗和热分布差异很大。通常,这些智能手机平台向用户公开了一些硬件电源控制功能,这些功能通过软件调控器进行控制,例如用于动态电压频率缩放(DVFS)的cpufreq和用于动态核心选择(DCS)的cpuquiet。这些平台上的操作系统保守地管理这些调控器,独立于应用程序的性能需求。为了解决这个问题,我们提出了一种替代方法,该方法使用强化学习来探索使用DVFS和DCS的节能机会与运行时应用程序性能之间的权衡。目标是降低功耗,同时考虑到动态功率、泄漏功率以及温度和功率之间的相互依赖性。以基于ARM a15的英伟达tegra智能手机为例,通过其作为运行时管理器(RTM)的实现,验证了基于强化学习的控制。该RTM通过(1)cpuquiet API与不同的硬件性能计数器和嵌入式Linux操作系统接口,在运行时选择内核;(2) cpufreq API,用于缩放活动内核的频率。移动和高性能应用的实验表明,与现有技术相比,所提出的方法平均可降低22%(7-40%)的功耗。
{"title":"Hardware-software interaction for run-time power optimization: A case study of embedded Linux on multicore smartphones","authors":"Anup Das, M. J. Walker, Andreas Hansson, B. Al-Hashimi, G. Merrett","doi":"10.1109/ISLPED.2015.7273508","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273508","url":null,"abstract":"Applications running on smartphones interact with the hardware and the system software differently, resulting in widely varying power consumption and hence thermal profiles. Typically, these smartphone platforms expose some hardware power control features to users, controlled through software governors such as cpufreq for dynamic voltage-frequency scaling (DVFS) and cpuquiet for dynamic core selection (DCS). Operating systems on these platforms manage these governors conservatively, independent of application's performance requirement. To address this, we propose an alternative approach, which uses reinforcement learning to explore the trade-off between power saving opportunities using DVFS and DCS and application's performance at run-time. The objective is to reduce power consumption, taking into consideration dynamic power, leakage power, and the inter-dependency between temperature and power. The reinforcement learning-based control is validated as a case-study on ARM A15-based nvidia's tegra smartphone through its implementation as a run-time manager (RTM). This RTM interfaces with different hardware performance counters and the embedded Linux Operating System through (1) the cpuquiet API to select cores at run-time; and (2) the cpufreq API to scale the frequency of active cores. Experiments with mobile and high performance applications demonstrate that the proposed approach achieves an average 22% (7-40%) power reduction compared to existing techniques.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131951857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Exploring power attack protection of resource constrained encryption engines using integrated low-drop-out regulators 利用集成的低掉差调节器探索资源受限加密引擎的功率攻击保护
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273503
Arvind Singh, Monodeep Kar, J. Ko, S. Mukhopadhyay
The power attack protection of encryption engines often comes at the expense of area, power, and/or performance overheads making the design of a low-power and compact but secure encryption engine challenging. This paper explores the feasibility of using an on-chip low dropout regulator (LDO) as a countermeasure to power attack of low-power and compact encryption engine. We design an area minimized implementation of Advanced Encryption Standard (AES) using predictive 45nm node and show that lightweight implementations are more susceptible to power attack. Using behavioral modeling, we show that an on-chip LDO can enhance power attack resistance of this compact AES engine; however, the tradeoff between LDO performance and power attack protection is essential. Our analysis shows that LDO can increase power attack resistance of the compact AES by >800X with marginal area (1.4%) and power (5%) overheads.
加密引擎的功率攻击保护通常以牺牲面积、功率和/或性能开销为代价,这使得设计低功耗、紧凑但安全的加密引擎具有挑战性。本文探讨了采用片上低差稳压器(LDO)作为对抗低功耗紧凑型加密引擎功率攻击的可行性。我们设计了一个使用预测性45nm节点的高级加密标准(AES)的面积最小化实现,并表明轻量级实现更容易受到功率攻击。通过行为建模,我们证明片上LDO可以增强该紧凑型AES引擎的抗功率攻击能力;然而,LDO性能和功率攻击保护之间的权衡是必不可少的。我们的分析表明,LDO可以在边际面积(1.4%)和功耗(5%)开销的情况下,将紧凑型AES的抗功率攻击能力提高>800X。
{"title":"Exploring power attack protection of resource constrained encryption engines using integrated low-drop-out regulators","authors":"Arvind Singh, Monodeep Kar, J. Ko, S. Mukhopadhyay","doi":"10.1109/ISLPED.2015.7273503","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273503","url":null,"abstract":"The power attack protection of encryption engines often comes at the expense of area, power, and/or performance overheads making the design of a low-power and compact but secure encryption engine challenging. This paper explores the feasibility of using an on-chip low dropout regulator (LDO) as a countermeasure to power attack of low-power and compact encryption engine. We design an area minimized implementation of Advanced Encryption Standard (AES) using predictive 45nm node and show that lightweight implementations are more susceptible to power attack. Using behavioral modeling, we show that an on-chip LDO can enhance power attack resistance of this compact AES engine; however, the tradeoff between LDO performance and power attack protection is essential. Our analysis shows that LDO can increase power attack resistance of the compact AES by >800X with marginal area (1.4%) and power (5%) overheads.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123973268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
期刊
2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1