首页 > 最新文献

Proceedings of the 2016 International Symposium on Low Power Electronics and Design最新文献

英文 中文
Comprehensive Analysis, Modeling and Design for Hold-Timing Resiliency in Voltage Scalable Design 电压可扩展设计中保持时序弹性的综合分析、建模与设计
Huanyu Wang, Geng Xie, Jie Gu
Resiliency to timing violation is a crucial requirement for low power electronics operating across a wide range of supply voltages. Although many existing solutions enhance setup timing tolerance for the higher performance, an accurate modeling and design strategy for hold resiliency dealing with conflicting requirement from both high voltages and low voltages has not been established. This paper proposes a novel voltage-scalable modeling technique that leverages conventional static timing analysis and efficient statistical analysis to achieve accurate stochastic hold timing analysis. Several highly nonlinear behaviors of circuit operation are also incorporated into the proposed model to achieve a model accuracy of within 10% of spice Monte-Carlos simulation. Leveraging the proposed modeling technique, a novel hold resilience design technique is proposed to eliminate the excessive hold fixing operation for low voltage operation and its associated performance degradation at high voltage while still being compatible with conventional design closure flow. The proposed design methodology is demonstrated in a 45nm DSP processor design enabling a voltage-scalable operation from 0.35V to 0.9V eliminating more than 20,000 hold buffers as well as 23% performance degradation at high voltages due to hold fixing.
对于在大电压范围内工作的低功率电子器件来说,对时序违逆的弹性是一个至关重要的要求。虽然许多现有的解决方案提高了设置时间公差以获得更高的性能,但尚未建立准确的保持弹性建模和设计策略,以处理来自高压和低压的冲突要求。本文提出了一种新的电压可扩展建模技术,该技术利用传统的静态定时分析和有效的统计分析来实现精确的随机保持定时分析。电路运行的一些高度非线性行为也被纳入到所提出的模型中,以实现在spice Monte-Carlos模拟的10%以内的模型精度。利用所提出的建模技术,提出了一种新的保持弹性设计技术,以消除低压操作时过度的保持固定操作及其在高压下相关的性能下降,同时仍与传统的设计关闭流程兼容。提出的设计方法在45nm DSP处理器设计中得到了验证,该设计实现了从0.35V到0.9V的电压可扩展操作,消除了超过20,000个保持缓冲,以及由于保持固定而导致的23%的高电压性能下降。
{"title":"Comprehensive Analysis, Modeling and Design for Hold-Timing Resiliency in Voltage Scalable Design","authors":"Huanyu Wang, Geng Xie, Jie Gu","doi":"10.1145/2934583.2934584","DOIUrl":"https://doi.org/10.1145/2934583.2934584","url":null,"abstract":"Resiliency to timing violation is a crucial requirement for low power electronics operating across a wide range of supply voltages. Although many existing solutions enhance setup timing tolerance for the higher performance, an accurate modeling and design strategy for hold resiliency dealing with conflicting requirement from both high voltages and low voltages has not been established. This paper proposes a novel voltage-scalable modeling technique that leverages conventional static timing analysis and efficient statistical analysis to achieve accurate stochastic hold timing analysis. Several highly nonlinear behaviors of circuit operation are also incorporated into the proposed model to achieve a model accuracy of within 10% of spice Monte-Carlos simulation. Leveraging the proposed modeling technique, a novel hold resilience design technique is proposed to eliminate the excessive hold fixing operation for low voltage operation and its associated performance degradation at high voltage while still being compatible with conventional design closure flow. The proposed design methodology is demonstrated in a 45nm DSP processor design enabling a voltage-scalable operation from 0.35V to 0.9V eliminating more than 20,000 hold buffers as well as 23% performance degradation at high voltages due to hold fixing.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130820009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Voltage Noise Induced DRAM Soft Error Reduction Technique for 3D-CPUs 用于3d - cpu的电压噪声诱导DRAM软误差降低技术
Tiantao Lu, Caleb Serafy, Zhiyuan Yang, Ankur Srivastava
Three-dimensional integration enables stacking DRAM on top of CPU, providing high bandwidth and short latency. However, non-uniform voltage fluctuation and local thermal hotspot in CPU layers are coupled into DRAM layers, causing a non-uniform bit-cell leakage (thereby bit flip) distribution. We propose a performance-power-resilience simulation framework to capture DRAM soft error in 3D multi-core CPU systems. A dynamic resilience management (DRM) scheme is investigated, which adaptively tunes CPU's operating points to adjust DRAM's voltage noise and thermal condition during runtime. The DRM uses dynamic frequency scaling to achieve a resilience borrow-in strategy, which effectively enhances DRAM's resilience without sacrificing performance.
三维集成可以将DRAM堆叠在CPU之上,提供高带宽和短延迟。然而,CPU层的不均匀电压波动和局部热热点耦合到DRAM层,导致不均匀的位元泄漏(从而导致位翻转)分布。我们提出了一个性能-功率弹性仿真框架来捕获3D多核CPU系统中的DRAM软错误。研究了一种动态弹性管理(DRM)方案,该方案自适应调整CPU的工作点,以调整运行时的电压噪声和热状态。DRM采用动态频率缩放来实现弹性借用策略,在不牺牲性能的情况下有效地增强了DRAM的弹性。
{"title":"Voltage Noise Induced DRAM Soft Error Reduction Technique for 3D-CPUs","authors":"Tiantao Lu, Caleb Serafy, Zhiyuan Yang, Ankur Srivastava","doi":"10.1145/2934583.2934589","DOIUrl":"https://doi.org/10.1145/2934583.2934589","url":null,"abstract":"Three-dimensional integration enables stacking DRAM on top of CPU, providing high bandwidth and short latency. However, non-uniform voltage fluctuation and local thermal hotspot in CPU layers are coupled into DRAM layers, causing a non-uniform bit-cell leakage (thereby bit flip) distribution. We propose a performance-power-resilience simulation framework to capture DRAM soft error in 3D multi-core CPU systems. A dynamic resilience management (DRM) scheme is investigated, which adaptively tunes CPU's operating points to adjust DRAM's voltage noise and thermal condition during runtime. The DRM uses dynamic frequency scaling to achieve a resilience borrow-in strategy, which effectively enhances DRAM's resilience without sacrificing performance.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123135457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Session details: Novel Technologies & Resilience Design 会议细节:新技术和弹性设计
Swaroop Ghosh, Tsung-Te Liu
{"title":"Session details: Novel Technologies & Resilience Design","authors":"Swaroop Ghosh, Tsung-Te Liu","doi":"10.1145/3256011","DOIUrl":"https://doi.org/10.1145/3256011","url":null,"abstract":"","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122196809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance Impact of Magnetic and Thermal Attack on STTRAM and Low-Overhead Mitigation Techniques 磁攻击和热攻击对stream性能的影响及低开销缓解技术
Jaedong Jang, Swaroop Ghosh
In this paper, we analyze the fundamental vulnerabilities of Spin-Torque-Transfer RAM on magnetic field and temperature that can be exploited by adversaries with an intent to trigger soft performance failures. We present novel attack vectors and their impact on memory performance (i.e., read, write and retention). We propose a novel low-overhead clock frequency-adaptation technique to mitigate the attack. Our analysis indicate slowing the clock frequency by 85% restores 170 mV of sense margin under 300 Oe DC magnetic field. In addition, 66% operating clock slowdown allows STTRAM to tolerate over 300 Oe AC magnetic field.
在本文中,我们分析了自旋扭矩传输RAM在磁场和温度方面的基本漏洞,这些漏洞可以被意图触发软性能故障的对手利用。我们提出了新的攻击向量及其对内存性能的影响(即读,写和保留)。我们提出了一种新的低开销时钟频率自适应技术来减轻攻击。我们的分析表明,在300 Oe直流磁场下,将时钟频率降低85%可恢复170 mV的感测余量。此外,66%的工作时钟减速使stram能够承受超过300欧的交流磁场。
{"title":"Performance Impact of Magnetic and Thermal Attack on STTRAM and Low-Overhead Mitigation Techniques","authors":"Jaedong Jang, Swaroop Ghosh","doi":"10.1145/2934583.2934614","DOIUrl":"https://doi.org/10.1145/2934583.2934614","url":null,"abstract":"In this paper, we analyze the fundamental vulnerabilities of Spin-Torque-Transfer RAM on magnetic field and temperature that can be exploited by adversaries with an intent to trigger soft performance failures. We present novel attack vectors and their impact on memory performance (i.e., read, write and retention). We propose a novel low-overhead clock frequency-adaptation technique to mitigate the attack. Our analysis indicate slowing the clock frequency by 85% restores 170 mV of sense margin under 300 Oe DC magnetic field. In addition, 66% operating clock slowdown allows STTRAM to tolerate over 300 Oe AC magnetic field.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114153059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Software Infrastructure for Enabling FPGA-Based Accelerations in Data Centers: Invited Paper 在数据中心中实现基于fpga的加速的软件基础设施:邀请论文
J. Cong, Muhuan Huang, P. Pan, Di Wu, Peng Zhang
This paper focuses on the development of an infrastructure to enable FPGA-based acceleration in data centers. We present an initial version of an integrated solution that includes automated compilation for accelerator generation, runtime accelerator resource scheduling and management, and acceleration libraries for FPGA-based customized computing for big data applications. The solution can help overcome some of the main challenges with FPGA-based accelerated computing. It has the potential to bring significant performance and energy efficiency improvement for data center applications.
本文的重点是开发一种基础设施,使基于fpga的数据中心加速成为可能。我们提出了一个集成解决方案的初始版本,其中包括用于加速器生成的自动编译、运行时加速器资源调度和管理,以及用于大数据应用的基于fpga的定制计算的加速库。该解决方案可以帮助克服基于fpga的加速计算的一些主要挑战。它有潜力为数据中心应用程序带来显著的性能和能源效率改进。
{"title":"Software Infrastructure for Enabling FPGA-Based Accelerations in Data Centers: Invited Paper","authors":"J. Cong, Muhuan Huang, P. Pan, Di Wu, Peng Zhang","doi":"10.1145/2934583.2953984","DOIUrl":"https://doi.org/10.1145/2934583.2953984","url":null,"abstract":"This paper focuses on the development of an infrastructure to enable FPGA-based acceleration in data centers. We present an initial version of an integrated solution that includes automated compilation for accelerator generation, runtime accelerator resource scheduling and management, and acceleration libraries for FPGA-based customized computing for big data applications. The solution can help overcome some of the main challenges with FPGA-based accelerated computing. It has the potential to bring significant performance and energy efficiency improvement for data center applications.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132943942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Therma: Thermal-aware Run-time Thread Migration for Nanophotonic Interconnects 热:纳米光子互连的热感知运行时线程迁移
Majed Valad Beigi, G. Memik
In this paper, we introduce Therma, a thermal-aware run-time thread migration mechanism for managing temperature fluctuations in nanophotonic networks. Nanophotonics is one of the most promising communication substrate candidates for next-generation high-performance systems. However, their underlying components are sensitive to temperature fluctuations. These fluctuations arise mostly because of the temperature changes on the cores, which are adjacent to nanophotonic components. Therma minimizes thermal fluctuations on these temperature sensitive components by moving threads across cores. Evaluation results reveal that when each core is executing a single thread, Therma achieves a 15.4% and 6.1% reduction in the photonic power consumption compared to the baseline and an interconnectoblivious thread migration scheme, respectively. It also reduces photonic power consumption by up to 20.7% compared to the alternatives when running multiple threads per core on the system.
在本文中,我们介绍了Therma,一个热感知运行时线程迁移机制,用于管理纳米光子网络中的温度波动。纳米光子学是下一代高性能系统中最有前途的通信衬底之一。然而,它们的底层成分对温度波动很敏感。这些波动主要是由于核心上的温度变化引起的,核心与纳米光子元件相邻。Therma通过在核心上移动线程来最大限度地减少这些温度敏感组件上的热波动。评估结果显示,当每个核心执行单个线程时,与基线和互连的线程迁移方案相比,Therma分别实现了15.4%和6.1%的光子功耗降低。与在系统上每核运行多个线程相比,它还减少了高达20.7%的光子功耗。
{"title":"Therma: Thermal-aware Run-time Thread Migration for Nanophotonic Interconnects","authors":"Majed Valad Beigi, G. Memik","doi":"10.1145/2934583.2934592","DOIUrl":"https://doi.org/10.1145/2934583.2934592","url":null,"abstract":"In this paper, we introduce Therma, a thermal-aware run-time thread migration mechanism for managing temperature fluctuations in nanophotonic networks. Nanophotonics is one of the most promising communication substrate candidates for next-generation high-performance systems. However, their underlying components are sensitive to temperature fluctuations. These fluctuations arise mostly because of the temperature changes on the cores, which are adjacent to nanophotonic components. Therma minimizes thermal fluctuations on these temperature sensitive components by moving threads across cores. Evaluation results reveal that when each core is executing a single thread, Therma achieves a 15.4% and 6.1% reduction in the photonic power consumption compared to the baseline and an interconnectoblivious thread migration scheme, respectively. It also reduces photonic power consumption by up to 20.7% compared to the alternatives when running multiple threads per core on the system.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133211388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
An Energy-Efficient PUF Design: Computing While Racing 节能PUF设计:赛车时计算
Hongxiang Gu, T. Xu, M. Potkonjak
Physical unclonable functions (PUFs) take advantage of the effect of process variation on hardware to obtain their unclonability. Traditional PUF design only focuses on the analog signals of circuits. An arbiter PUF, for example, generates responses by racing delay signals. Implementations of such PUFs usually employ large area and power consumption while providing very low throughput. To address this problem, we propose an energy efficient PUF design in such a way that it races analog signals and computes digital logic simultaneously. More importantly, the analog portion of the circuit (racing) shares a large amount of hardware resources with the digital portion of the circuit (computing) by introducing only small overhead in terms of area and power. Our test results on Spartan-6 field-programmable gate array (FPGA) platforms indicate that by combining the two outputs, our design enables much larger PUF output throughput, better randomness and less power consumption compared to traditional PUFs.
物理不可克隆函数(puf)利用进程变化对硬件的影响来获得其不可克隆性。传统的PUF设计只关注电路的模拟信号。例如,仲裁PUF通过竞速延迟信号产生响应。这种puf的实现通常采用较大的面积和功耗,同时提供非常低的吞吐量。为了解决这个问题,我们提出了一种节能的PUF设计,它可以同时处理模拟信号和计算数字逻辑。更重要的是,电路的模拟部分(赛车)与电路的数字部分(计算)共享了大量的硬件资源,在面积和功率方面只引入了很小的开销。我们在Spartan-6现场可编程门阵列(FPGA)平台上的测试结果表明,与传统PUF相比,通过结合两种输出,我们的设计可以实现更大的PUF输出吞吐量,更好的随机性和更低的功耗。
{"title":"An Energy-Efficient PUF Design: Computing While Racing","authors":"Hongxiang Gu, T. Xu, M. Potkonjak","doi":"10.1145/2934583.2934604","DOIUrl":"https://doi.org/10.1145/2934583.2934604","url":null,"abstract":"Physical unclonable functions (PUFs) take advantage of the effect of process variation on hardware to obtain their unclonability. Traditional PUF design only focuses on the analog signals of circuits. An arbiter PUF, for example, generates responses by racing delay signals. Implementations of such PUFs usually employ large area and power consumption while providing very low throughput. To address this problem, we propose an energy efficient PUF design in such a way that it races analog signals and computes digital logic simultaneously. More importantly, the analog portion of the circuit (racing) shares a large amount of hardware resources with the digital portion of the circuit (computing) by introducing only small overhead in terms of area and power. Our test results on Spartan-6 field-programmable gate array (FPGA) platforms indicate that by combining the two outputs, our design enables much larger PUF output throughput, better randomness and less power consumption compared to traditional PUFs.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115539718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Low Area, Low Power, Robust, Highly Sensitive Error Detecting Latch for Resilient Architectures 用于弹性结构的低面积、低功耗、鲁棒、高灵敏度的错误检测锁存器
Weizhe Hua, R. Tadros, P. Beerel
Operating at lower supply voltages to meet ever-increasing demands for power-efficiency unfortunately aggravates process, voltage, and temperature (PVT) variability. Resilient architectures have emerged as a promising way to mitigate widening worst-case margins at these voltages. In particular, timing resilient architectures use extra circuitry to detect timing violations and recover to its normal operation. The error detecting latch (EDL) is an efficient circuit that helps perform this task. This paper proposes two EDL architectures that achieve as much as 11.2% less power consumption, 20.8% less leakage, 7.8% smaller area, and 18.2% better sensitivity to glitches compared to state-of-the-art EDLs. The paper offers two different flavors trading off robustness for lower power and vice versa. The paper also proposes a comprehensive power metric encapsulating many of the various energy aspects discussed in the literature.
在较低的电源电压下工作,以满足不断增长的功率效率要求,不幸的是,这加剧了过程、电压和温度(PVT)的可变性。弹性架构已成为一种有希望的方法,以减轻在这些电压下不断扩大的最坏情况边际。特别是,时序弹性架构使用额外的电路来检测时序违规并恢复其正常操作。错误检测锁存器(EDL)是一种有效的电路,可以帮助完成这项任务。本文提出了两种EDL架构,与最先进的EDL相比,功耗降低11.2%,泄漏减少20.8%,面积减少7.8%,对故障的灵敏度提高18.2%。该论文提供了两种不同的方式来权衡鲁棒性和低功耗,反之亦然。本文还提出了一个综合的功率度量,包含了文献中讨论的许多不同的能量方面。
{"title":"Low Area, Low Power, Robust, Highly Sensitive Error Detecting Latch for Resilient Architectures","authors":"Weizhe Hua, R. Tadros, P. Beerel","doi":"10.1145/2934583.2934600","DOIUrl":"https://doi.org/10.1145/2934583.2934600","url":null,"abstract":"Operating at lower supply voltages to meet ever-increasing demands for power-efficiency unfortunately aggravates process, voltage, and temperature (PVT) variability. Resilient architectures have emerged as a promising way to mitigate widening worst-case margins at these voltages. In particular, timing resilient architectures use extra circuitry to detect timing violations and recover to its normal operation. The error detecting latch (EDL) is an efficient circuit that helps perform this task. This paper proposes two EDL architectures that achieve as much as 11.2% less power consumption, 20.8% less leakage, 7.8% smaller area, and 18.2% better sensitivity to glitches compared to state-of-the-art EDLs. The paper offers two different flavors trading off robustness for lower power and vice versa. The paper also proposes a comprehensive power metric encapsulating many of the various energy aspects discussed in the literature.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115530189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Dissecting Xeon + FPGA: Why the integration of CPUs and FPGAs makes a power difference for the datacenter: Invited Paper 解析至强+ FPGA:为什么cpu和FPGA的集成会对数据中心的功耗产生影响:特邀论文
H. Schmit, Randy Huang
Intel's Xeon roadmap includes package-integrated FPGAs in every new generation. In this talk, we will dissect why this is such a powerful combination at this time of great change in datacenter workloads. We will show how power savings within the CPU complex is a significant multiplier for power savings in the datacenter as a whole. Focusing on the domain of machine learning, we will present the recent evolution of data types and operators, and make the case that FPGAs are the path to facilitate this continued evolution. Finally, we will discuss the criticality of the close coupling of the CPU and the FPGA. This coupling facilitates high bandwidth and low latency communication that is required for the development, debugging and deployment of heterogeneous applications.
英特尔的至强路线图在每一代新产品中都包含封装集成fpga。在这次演讲中,我们将剖析为什么在数据中心工作负载发生巨大变化的时候,这是一个如此强大的组合。我们将展示CPU内部的节能如何成为整个数据中心节能的重要倍增器。专注于机器学习领域,我们将介绍数据类型和运算符的最新发展,并说明fpga是促进这种持续发展的途径。最后,我们将讨论CPU和FPGA紧密耦合的重要性。这种耦合促进了开发、调试和部署异构应用程序所需的高带宽和低延迟通信。
{"title":"Dissecting Xeon + FPGA: Why the integration of CPUs and FPGAs makes a power difference for the datacenter: Invited Paper","authors":"H. Schmit, Randy Huang","doi":"10.1145/2934583.2953983","DOIUrl":"https://doi.org/10.1145/2934583.2953983","url":null,"abstract":"Intel's Xeon roadmap includes package-integrated FPGAs in every new generation. In this talk, we will dissect why this is such a powerful combination at this time of great change in datacenter workloads. We will show how power savings within the CPU complex is a significant multiplier for power savings in the datacenter as a whole. Focusing on the domain of machine learning, we will present the recent evolution of data types and operators, and make the case that FPGAs are the path to facilitate this continued evolution. Finally, we will discuss the criticality of the close coupling of the CPU and the FPGA. This coupling facilitates high bandwidth and low latency communication that is required for the development, debugging and deployment of heterogeneous applications.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123098522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Scalable Auto-Tuning of Synthesis Parameters for Optimizing High-Performance Processors 优化高性能处理器合成参数的可伸缩自动调谐
M. Ziegler, Hung-Yi Liu, L. Carloni
Modern logic and physical synthesis tools provide numerous options and parameters that can drastically impact design quality; however, the large number of options leads to a complex design space difficult for human designers to navigate. By employing intelligent search strategies and parallel computing we can tackle this parameter tuning problem, thus automating one of the key design tasks conventionally performed by a human designer. In this paper we present a novel learning-based algorithm for synthesis parameter optimization. This new algorithm has been integrated into our existing autonomous parameter-tuning system, which was used to design multiple 22nm industrial chips and is currently being used for 14nm chips. These techniques show, on average, over 40% reduction in total negative slack and over 10% power reduction across hundreds of 14nm industrial processor macros while reducing overall human design effort. We also present a new higher-level system that manages parameter tuning of multiple designs in a scalable way. This new system addresses the needs of large design teams by prioritizing the tuning effort to maximize returns given the available compute resources.
现代逻辑和物理合成工具提供了许多选项和参数,可以极大地影响设计质量;然而,大量的选择导致了一个复杂的设计空间,很难让人类设计师驾驭。通过采用智能搜索策略和并行计算,我们可以解决这个参数调优问题,从而自动化传统上由人类设计师执行的关键设计任务之一。本文提出了一种新的基于学习的综合参数优化算法。这种新算法已经集成到我们现有的自主参数调谐系统中,该系统用于设计多个22nm工业芯片,目前正在用于14nm芯片。这些技术表明,平均而言,在数百个14nm工业处理器宏中,总负松弛量减少了40%以上,功耗降低了10%以上,同时减少了总体的人类设计工作量。我们还提出了一个新的高级系统,以可扩展的方式管理多个设计的参数调整。这个新系统解决了大型设计团队的需求,在给定可用计算资源的情况下,对调优工作进行了优先级排序,使回报最大化。
{"title":"Scalable Auto-Tuning of Synthesis Parameters for Optimizing High-Performance Processors","authors":"M. Ziegler, Hung-Yi Liu, L. Carloni","doi":"10.1145/2934583.2934620","DOIUrl":"https://doi.org/10.1145/2934583.2934620","url":null,"abstract":"Modern logic and physical synthesis tools provide numerous options and parameters that can drastically impact design quality; however, the large number of options leads to a complex design space difficult for human designers to navigate. By employing intelligent search strategies and parallel computing we can tackle this parameter tuning problem, thus automating one of the key design tasks conventionally performed by a human designer. In this paper we present a novel learning-based algorithm for synthesis parameter optimization. This new algorithm has been integrated into our existing autonomous parameter-tuning system, which was used to design multiple 22nm industrial chips and is currently being used for 14nm chips. These techniques show, on average, over 40% reduction in total negative slack and over 10% power reduction across hundreds of 14nm industrial processor macros while reducing overall human design effort. We also present a new higher-level system that manages parameter tuning of multiple designs in a scalable way. This new system addresses the needs of large design teams by prioritizing the tuning effort to maximize returns given the available compute resources.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123287530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
期刊
Proceedings of the 2016 International Symposium on Low Power Electronics and Design
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1