首页 > 最新文献

Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays最新文献

英文 中文
Binary stochastic implementation of digital logic 数字逻辑二进制随机实现
Yanzi Zhu, Peiran Suo, K. Bazargan
Stochastic computing refers to a mode of computation in which numbers are treated as probabilities implemented as 0/1 bit streams, which essentially is a unary encoding scheme. Previous work has shown significant reduction in area and increase in fault tolerance for low to medium resolution values (6-10 bits). However, this comes at very high latency cost. We propose a novel hybrid approach combining traditional binary with unary stochastic encoding, called binary stochastic. Similar to the binary representation, it is a positional number system, but instead of only 0/1 digits, the digits would be fractions. We show how simple logic such as adders and multipliers can be implemented, and then show more complex function implementations such as the gamma correction function and functions such as tanh, absolute and exponentiation using both combinational and sequential binary stochastic logic. Our experiments show significant reduction in latency compared to unary stochastic, while using significantly smaller area compared to binary implementations on FPGAs.
随机计算指的是一种计算模式,其中数字被视为实现为0/1比特流的概率,本质上是一种一元编码方案。以前的工作表明,在低到中等分辨率值(6-10位)下,面积显著减少,容错性增加。然而,这带来了非常高的延迟成本。我们提出了一种将传统二进制编码与一元随机编码相结合的新方法,称为二进制随机编码。类似于二进制表示,它是一个位置数字系统,但不是只有0/1的数字,这些数字将是分数。我们展示了如何实现简单的逻辑,如加法器和乘法器,然后展示了更复杂的函数实现,如伽马校正函数和函数,如tanh,绝对和幂,使用组合和顺序二进制随机逻辑。我们的实验表明,与一元随机相比,延迟显著减少,而与fpga上的二进制实现相比,使用的面积明显更小。
{"title":"Binary stochastic implementation of digital logic","authors":"Yanzi Zhu, Peiran Suo, K. Bazargan","doi":"10.1145/2554688.2554778","DOIUrl":"https://doi.org/10.1145/2554688.2554778","url":null,"abstract":"Stochastic computing refers to a mode of computation in which numbers are treated as probabilities implemented as 0/1 bit streams, which essentially is a unary encoding scheme. Previous work has shown significant reduction in area and increase in fault tolerance for low to medium resolution values (6-10 bits). However, this comes at very high latency cost. We propose a novel hybrid approach combining traditional binary with unary stochastic encoding, called binary stochastic. Similar to the binary representation, it is a positional number system, but instead of only 0/1 digits, the digits would be fractions. We show how simple logic such as adders and multipliers can be implemented, and then show more complex function implementations such as the gamma correction function and functions such as tanh, absolute and exponentiation using both combinational and sequential binary stochastic logic. Our experiments show significant reduction in latency compared to unary stochastic, while using significantly smaller area compared to binary implementations on FPGAs.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"44 10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130490016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A new basic logic structure for data-path computation (abstract only) 一种新的数据路径计算的基本逻辑结构(仅抽象)
P. Gaillardon, L. Amarù, G. Micheli
Nowadays, Field Programmable Gate Arrays (FPGA) implement arithmetic functions using specific circuits at the logic block level, such as the carry paths, or at the structure level adopting Digital Signal Processing (DSP) blocks. Nevertheless, all these approaches, introduced to ease the realization of specific functions, are lacking of generality. In this paper, we introduce a new logic block that natively realizes arithmetic functions while preserving the versatility to implement general logic functions. It consists of a partially interconnected matrix of signal routers driven by comparators. We demonstrate that this structure can realize (i) any 2-output 2-input logic function or (ii) any single-output 3-input logic function or (iii) specific logic, such as arithmetic functions, with up to 4-output and 8-inputs. As compared to a standard 6-input Look Up Table (LUT), the proposed block requires roughly the same area but is 35.3% faster. Even though the proposed block has not the same exhaustive configurability of a 6-input LUT, there are arithmetic functions realizable in a single block that do not fit in one, or even more, 6-input LUT. For example, a single block inherently implements an entire 3-bit adder that requires 3× more resources with LUTs plus also custom circuitry. From a system level perspective, we show that a 256-bit adder is implemented with a gain on area×delay product of 31% as compared to its traditional LUT-based counterpart.
目前,现场可编程门阵列(FPGA)在逻辑块级(如进位路径)或结构级(采用数字信号处理(DSP)块)使用特定电路实现算术功能。然而,所有这些方法都是为了简化特定功能的实现而引入的,缺乏通用性。在本文中,我们引入了一种新的逻辑块,它既能实现算术函数,又能保持实现一般逻辑函数的通用性。它由由比较器驱动的部分互连的信号路由器矩阵组成。我们证明了这种结构可以实现(i)任何2输出2输入逻辑函数或(ii)任何单输出3输入逻辑函数或(iii)特定逻辑,如算术函数,最多有4输出和8输入。与标准的6输入查找表(LUT)相比,建议的块需要大致相同的面积,但速度快35.3%。尽管所建议的块不具有6输入LUT的穷举可配置性,但在单个块中可以实现的算术函数并不适合一个或多个6输入LUT。例如,单个块固有地实现了一个完整的3位加法器,它需要3倍多的lut资源以及自定义电路。从系统级的角度来看,我们展示了一个256位加法器的实现,与传统的基于lut的加法器相比,area×delay产品的增益为31%。
{"title":"A new basic logic structure for data-path computation (abstract only)","authors":"P. Gaillardon, L. Amarù, G. Micheli","doi":"10.1145/2554688.2554701","DOIUrl":"https://doi.org/10.1145/2554688.2554701","url":null,"abstract":"Nowadays, Field Programmable Gate Arrays (FPGA) implement arithmetic functions using specific circuits at the logic block level, such as the carry paths, or at the structure level adopting Digital Signal Processing (DSP) blocks. Nevertheless, all these approaches, introduced to ease the realization of specific functions, are lacking of generality. In this paper, we introduce a new logic block that natively realizes arithmetic functions while preserving the versatility to implement general logic functions. It consists of a partially interconnected matrix of signal routers driven by comparators. We demonstrate that this structure can realize (i) any 2-output 2-input logic function or (ii) any single-output 3-input logic function or (iii) specific logic, such as arithmetic functions, with up to 4-output and 8-inputs. As compared to a standard 6-input Look Up Table (LUT), the proposed block requires roughly the same area but is 35.3% faster. Even though the proposed block has not the same exhaustive configurability of a 6-input LUT, there are arithmetic functions realizable in a single block that do not fit in one, or even more, 6-input LUT. For example, a single block inherently implements an entire 3-bit adder that requires 3× more resources with LUTs plus also custom circuitry. From a system level perspective, we show that a 256-bit adder is implemented with a gain on area×delay product of 31% as compared to its traditional LUT-based counterpart.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127897732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An automatic netlist and floorplanning approach to improve the MTTR of scrubbing techniques (abstract only) 一种自动网表和地板规划方法来提高擦洗技术的MTTR(仅摘要)
Bernhard Schmidt, Daniel Ziener, J. Teich
We introduce a new SEU mitigation approach which minimizes the scrubbing effort by a) using an automatic classification of the criticality of netlist instances and their resulting configuration bits, and by b) minimizing the number of frames which must be scrubbed by using intelligent floorplanning. The criticality of configuration bits is defined by the actions needed to correct a radiation-induced SEU at this bit. Indeed, circuits that involve feedback loops might still and infinitely cause a malfunction even if scrubbing is applied to involved configuration frames. Here, only supplementary state-restoring might be a viable solution. By analyzing an FPGA design already at the logic level and partition configuration bits of the resulting FPGA mapping into so-called essential bits and critical bits, we are able to significantly reduce the number of time consuming state-restoring actions. Moreover, by using placement and routing constraints, it is shown how to minimize the number of frames which have to be reconfigured or checked when using scrubbing. By applying both methods, we will show a reduction of the Mean-Time-To-Repair (MTTR) for sequential benchmark circuits by up to 48.5% compared to a state-of-the-art approach.
我们引入了一种新的SEU缓解方法,通过a)使用对网表实例的临界性及其产生的配置位的自动分类,以及b)通过使用智能地板规划最小化必须擦洗的帧数,从而最大限度地减少擦洗工作。配置位的关键程度取决于在该位纠正辐射诱发的SEU所需的措施。实际上,包含反馈回路的电路,即使对所涉及的配置帧应用了擦洗,仍然可能无限地引起故障。在这里,只有补充状态恢复可能是一个可行的解决方案。通过分析已经在逻辑级别的FPGA设计,并将结果FPGA的配置位划分为所谓的基本位和关键位,我们能够显着减少耗时的状态恢复操作的数量。此外,通过使用位置和路由约束,展示了如何在使用擦洗时最小化必须重新配置或检查的帧的数量。通过应用这两种方法,我们将显示,与最先进的方法相比,顺序基准电路的平均维修时间(MTTR)减少了48.5%。
{"title":"An automatic netlist and floorplanning approach to improve the MTTR of scrubbing techniques (abstract only)","authors":"Bernhard Schmidt, Daniel Ziener, J. Teich","doi":"10.1145/2554688.2554730","DOIUrl":"https://doi.org/10.1145/2554688.2554730","url":null,"abstract":"We introduce a new SEU mitigation approach which minimizes the scrubbing effort by a) using an automatic classification of the criticality of netlist instances and their resulting configuration bits, and by b) minimizing the number of frames which must be scrubbed by using intelligent floorplanning. The criticality of configuration bits is defined by the actions needed to correct a radiation-induced SEU at this bit. Indeed, circuits that involve feedback loops might still and infinitely cause a malfunction even if scrubbing is applied to involved configuration frames. Here, only supplementary state-restoring might be a viable solution. By analyzing an FPGA design already at the logic level and partition configuration bits of the resulting FPGA mapping into so-called essential bits and critical bits, we are able to significantly reduce the number of time consuming state-restoring actions. Moreover, by using placement and routing constraints, it is shown how to minimize the number of frames which have to be reconfigured or checked when using scrubbing. By applying both methods, we will show a reduction of the Mean-Time-To-Repair (MTTR) for sequential benchmark circuits by up to 48.5% compared to a state-of-the-art approach.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123259606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Energy-efficient multiplier-less discrete convolver through probabilistic domain transformation 基于概率域变换的节能无乘子离散卷积器
Mohammed Alawad, Yu Bai, R. Demara, Mingjie Lin
Energy efficiency and algorithmic robustness typically are conflicting circuit characteristics, yet with CMOS technology scaling towards 10-nm feature size, both become critical design metrics simultaneously for modern logic circuits. This paper propose a novel computing scheme hinged on probabilistic domain transformation aiming for both low power operation and fault resilience. In such a computing paradigm, algorithm inputs are first encoded through probabilistic means, which translates the input values into a number of random samples. Subsequently, light-weight operations, such as sim- ple additions will be performed onto these random samples in order to generate new random variables. Finally, the resulting random samples will be decoded probabilistically to give the final results.
能量效率和算法鲁棒性通常是相互冲突的电路特性,但随着CMOS技术向10纳米特征尺寸的扩展,两者同时成为现代逻辑电路的关键设计指标。本文提出了一种基于概率域变换的新型计算方案,以实现低功耗运行和故障恢复。在这样的计算范式中,算法输入首先通过概率方法编码,将输入值转换为一些随机样本。随后,将对这些随机样本执行轻量级操作,例如简单的加法,以生成新的随机变量。最后,将得到的随机样本进行概率解码,从而得到最终结果。
{"title":"Energy-efficient multiplier-less discrete convolver through probabilistic domain transformation","authors":"Mohammed Alawad, Yu Bai, R. Demara, Mingjie Lin","doi":"10.1145/2554688.2554769","DOIUrl":"https://doi.org/10.1145/2554688.2554769","url":null,"abstract":"Energy efficiency and algorithmic robustness typically are conflicting circuit characteristics, yet with CMOS technology scaling towards 10-nm feature size, both become critical design metrics simultaneously for modern logic circuits. This paper propose a novel computing scheme hinged on probabilistic domain transformation aiming for both low power operation and fault resilience. In such a computing paradigm, algorithm inputs are first encoded through probabilistic means, which translates the input values into a number of random samples. Subsequently, light-weight operations, such as sim- ple additions will be performed onto these random samples in order to generate new random variables. Finally, the resulting random samples will be decoded probabilistically to give the final results.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114078938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
Redefining the role of FPGAs in the next generation avionic systems (abstract only) 重新定义fpga在下一代航空电子系统中的作用(仅摘要)
V. Viswanathan, R. B. Atitallah, J. Dekeyser, Benjamin Nakache, M. Nakache
Embedded reconfigurable computing is becoming a new paradigm for system designers in avionic applications. In fact, FPGAs can be used for more than just computational purpose in order to improve the system performance. The introduction of FPGA Mezzanine Card (FMC) I/O standard has given a new purpose for FPGAs to be used as a communication platform. Taking into account the features offered by FPGAs and FMCs, such as runtime reconfiguration and modularity, we have redefined the role of these devices to be used as a generic communication and computation-centric platform. A new modular, runtime reconfigurable, Intellectual Property (IP)-based communication-centric platform for avionic applications has been designed. This means that, when the communication requirement of an avionic system changes, the necessary communication protocol is installed and executed on demand, without disturbing the normal operation of a time-critical avionic system. The efficiency and the performances of our platform are illustrated through a real industrial use-case designed using a computationally intensive application and several avionic I/O bus standards. The reconfiguration latency can be hidden totally in many cases. While in certain others, the overhead of reconfiguration can be justified by the reduction in the resource utilization.
嵌入式可重构计算正在成为航空电子应用系统设计者的一种新范式。事实上,为了提高系统性能,fpga可以用于不仅仅是计算目的。FPGA mezz卡(FMC) I/O标准的引入,为FPGA作为通信平台提供了新的用途。考虑到fpga和fmc提供的功能,例如运行时重构和模块化,我们重新定义了这些设备的角色,将其用作通用的通信和以计算为中心的平台。为航空电子应用设计了一种新的模块化、运行时可重构、基于知识产权(IP)的通信中心平台。这意味着,当航空电子系统的通信需求发生变化时,必要的通信协议被安装并按需执行,而不会干扰时间关键型航空电子系统的正常运行。通过使用计算密集型应用程序和几种航空电子I/O总线标准设计的实际工业用例,说明了我们平台的效率和性能。在许多情况下,重新配置延迟可以完全隐藏。而在某些其他情况下,可以通过减少资源利用率来证明重新配置的开销是合理的。
{"title":"Redefining the role of FPGAs in the next generation avionic systems (abstract only)","authors":"V. Viswanathan, R. B. Atitallah, J. Dekeyser, Benjamin Nakache, M. Nakache","doi":"10.1145/2554688.2554744","DOIUrl":"https://doi.org/10.1145/2554688.2554744","url":null,"abstract":"Embedded reconfigurable computing is becoming a new paradigm for system designers in avionic applications. In fact, FPGAs can be used for more than just computational purpose in order to improve the system performance. The introduction of FPGA Mezzanine Card (FMC) I/O standard has given a new purpose for FPGAs to be used as a communication platform. Taking into account the features offered by FPGAs and FMCs, such as runtime reconfiguration and modularity, we have redefined the role of these devices to be used as a generic communication and computation-centric platform. A new modular, runtime reconfigurable, Intellectual Property (IP)-based communication-centric platform for avionic applications has been designed. This means that, when the communication requirement of an avionic system changes, the necessary communication protocol is installed and executed on demand, without disturbing the normal operation of a time-critical avionic system. The efficiency and the performances of our platform are illustrated through a real industrial use-case designed using a computationally intensive application and several avionic I/O bus standards. The reconfiguration latency can be hidden totally in many cases. While in certain others, the overhead of reconfiguration can be justified by the reduction in the resource utilization.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116212236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Power estimation tool for system on programmable chip based platforms (abstract only) 基于可编程芯片平台的系统功率估计工具(仅摘要)
S. Rethinagiri, Oscar Palomar, A. Cristal, O. Unsal
The ever increasing complexity of the applications result in the development of power hungry processors. There is a scarcity of standalone tools that have a good trade off between estimation speed and accuracy to estimate power/energy at an earlier phase of design flow. There are very few tools that addresses the design space exploration issue based on power and energy. In this paper, we propose a virtual platform based standalone power and energy estimation tool for System-on-Programmable Chip (SoPC) embedded platforms, which is independent of in-house tools. There are two steps involved in this tool development. The first step is power model generation. For the power model development, we used functional parameters to set up generic power models for the different parts of the system. This is a onetime activity. In the second step, a simulation based virtual platform framework is developed to evaluate accurately the activities used in the related power models developed in the first step. The combination of the two steps lead to a hybrid power estimation, which gives a better trade-off between accuracy and speed. The proposed tool has several benefits: it considers the power consumption of the embedded system in its entirety and leads to accurate estimates without a costly and complex material. The proposed tool is also scalable for exploring complex embedded multi-core architectures. The effectiveness of our proposed tool is validated through dualcore RISC processor designed around the FPGA board and extended to accommodate futuristic multi-core processors for a reliable energy based design space exploration. The accuracy of our proposed tool is evaluated by using a variety of industrial benchmarks such as Multimedia, EEMBC and SPEC2006. Estimated power values are compared to real board measurements and also to McPAT. Our obtained power/energy estimation results provide less than 9% of error for heterogeneous MPSoC based system and are 200% faster compared to other state-of-the-art power estimation tools.
不断增加的应用程序复杂性导致了耗电处理器的开发。在设计流程的早期阶段,很少有独立的工具能够很好地在估计速度和准确性之间进行权衡,以估计功率/能量。很少有工具能够解决基于电力和能源的设计空间探索问题。在本文中,我们提出了一个基于虚拟平台的独立的可编程芯片系统(SoPC)嵌入式平台的功耗和能量估计工具,它独立于内部工具。这个工具的开发涉及两个步骤。第一步是功率模型生成。对于功率模型的开发,我们使用功能参数建立了系统不同部分的通用功率模型。这是一次性的活动。在第二步中,开发了基于仿真的虚拟平台框架,以准确评估第一步中开发的相关功率模型中使用的活动。这两个步骤的结合导致了混合功率估计,它在精度和速度之间提供了更好的权衡。提出的工具有几个好处:它从整体上考虑嵌入式系统的功耗,并在没有昂贵和复杂材料的情况下进行准确的估计。该工具还可扩展,用于探索复杂的嵌入式多核体系结构。我们提出的工具的有效性通过围绕FPGA板设计的双核RISC处理器进行验证,并扩展到适应未来的多核处理器,以实现可靠的基于能源的设计空间探索。我们提出的工具的准确性通过使用各种工业基准,如多媒体,EEMBC和SPEC2006进行评估。估计的功率值与实际电路板测量值以及McPAT进行比较。我们获得的功率/能量估计结果为基于异构MPSoC的系统提供小于9%的误差,与其他最先进的功率估计工具相比,速度快200%。
{"title":"Power estimation tool for system on programmable chip based platforms (abstract only)","authors":"S. Rethinagiri, Oscar Palomar, A. Cristal, O. Unsal","doi":"10.1145/2554688.2554718","DOIUrl":"https://doi.org/10.1145/2554688.2554718","url":null,"abstract":"The ever increasing complexity of the applications result in the development of power hungry processors. There is a scarcity of standalone tools that have a good trade off between estimation speed and accuracy to estimate power/energy at an earlier phase of design flow. There are very few tools that addresses the design space exploration issue based on power and energy. In this paper, we propose a virtual platform based standalone power and energy estimation tool for System-on-Programmable Chip (SoPC) embedded platforms, which is independent of in-house tools. There are two steps involved in this tool development. The first step is power model generation. For the power model development, we used functional parameters to set up generic power models for the different parts of the system. This is a onetime activity. In the second step, a simulation based virtual platform framework is developed to evaluate accurately the activities used in the related power models developed in the first step. The combination of the two steps lead to a hybrid power estimation, which gives a better trade-off between accuracy and speed. The proposed tool has several benefits: it considers the power consumption of the embedded system in its entirety and leads to accurate estimates without a costly and complex material. The proposed tool is also scalable for exploring complex embedded multi-core architectures. The effectiveness of our proposed tool is validated through dualcore RISC processor designed around the FPGA board and extended to accommodate futuristic multi-core processors for a reliable energy based design space exploration. The accuracy of our proposed tool is evaluated by using a variety of industrial benchmarks such as Multimedia, EEMBC and SPEC2006. Estimated power values are compared to real board measurements and also to McPAT. Our obtained power/energy estimation results provide less than 9% of error for heterogeneous MPSoC based system and are 200% faster compared to other state-of-the-art power estimation tools.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"09 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116536653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimally mitigating BTI-induced FPGA device aging with discriminative voltage scaling (abstract only) 通过区分电压缩放优化缓解bti诱导的FPGA器件老化(仅摘要)
Yu Bai, Mohammed Alawad, Mingjie Lin
With the CMOS technology aggressively scaling towards the 22nm node, modern FPGA devices face tremendous aging- induced reliability challenges due to Bias Temperature In- stability (BTI) and Hot Carrier Injection (HCI). This paper presents a novel antiaging technique at logic level that is both scalable and applicable for VLSI digital circuits implemented with FPGA devices. The key idea is to prolong the lifetime of FPGA-mapped designs by strategically elevating the VDD values of some LUTs based on their modular criticality values. Although the idea of scaling VDD in order to improve either energy efficiency or circuit reliability has been explored extensively, our study distinguishes itself by approaching this challenge through analytical procedure, therefore able to maximize the overall reliability of target FPGA design by rigorously modelling the BTI-induce de- vice reliability and optimally solving the VDD assignment problem.
随着CMOS技术向22nm节点的积极扩展,由于偏置温度稳定性(BTI)和热载流子注入(HCI),现代FPGA器件面临着巨大的老化引起的可靠性挑战。本文提出了一种新的逻辑级抗老化技术,该技术既可扩展,又适用于用FPGA器件实现的超大规模集成电路数字电路。关键思想是通过基于模块临界值战略性地提高一些lut的VDD值来延长fpga映射设计的寿命。虽然为了提高能源效率或电路可靠性而扩展VDD的想法已经被广泛探索,但我们的研究通过分析过程来解决这一挑战,因此能够通过严格建模bti诱导的设备可靠性和最佳解决VDD分配问题来最大化目标FPGA设计的整体可靠性。
{"title":"Optimally mitigating BTI-induced FPGA device aging with discriminative voltage scaling (abstract only)","authors":"Yu Bai, Mohammed Alawad, Mingjie Lin","doi":"10.1145/2554688.2554752","DOIUrl":"https://doi.org/10.1145/2554688.2554752","url":null,"abstract":"With the CMOS technology aggressively scaling towards the 22nm node, modern FPGA devices face tremendous aging- induced reliability challenges due to Bias Temperature In- stability (BTI) and Hot Carrier Injection (HCI). This paper presents a novel antiaging technique at logic level that is both scalable and applicable for VLSI digital circuits implemented with FPGA devices. The key idea is to prolong the lifetime of FPGA-mapped designs by strategically elevating the VDD values of some LUTs based on their modular criticality values. Although the idea of scaling VDD in order to improve either energy efficiency or circuit reliability has been explored extensively, our study distinguishes itself by approaching this challenge through analytical procedure, therefore able to maximize the overall reliability of target FPGA design by rigorously modelling the BTI-induce de- vice reliability and optimally solving the VDD assignment problem.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134192810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Future inter-FPGA communication architecture for multi-FPGA based prototyping (abstract only) 基于多fpga原型的未来fpga间通信架构(仅抽象)
Qingshan Tang, M. Tuna, H. Mehrez
Multi-FPGA boards are widely used for rapid system prototyping. Even though the prototyping is trying to reach the maximum performance, the performance is limited by the inter-FPGA communication. As the capacity per I/O for each FPGA generation is increasing, FPGA I/Os are becoming a scarce resource. The design is divided into several parts, each part's capacity fits in a single FPGA. Signals crossing design's parts located in different FPGAs are called cut nets. In order to resolve pin limitation problem, cut nets are sent between FPGAs in pipelined way using the Time-Division-Multiplexing technique. The maximum number of cut nets passing through one FPGA I/O is called the TDM ratio. There are two multiplexing architectures used for multi-FPGA based prototyping: Logic Multiplexing and ISERDES/OSERDES. In this paper, a new multiplexing architecture Multi-Gigabit Transceiver (MGT) is proposed. Experiments are done in a multi-FPGA board with the testbench LFSR to validate the achieved performance. Assume that all the FPGA I/Os used for inter-FPGA communication are MGT capable in the future. Analyses show that the proposed multiplexing architecture can achieve higher performance when the TDM ratio exceeds 67. The gain in performance of the proposed architecture over the existing architecture augments as the TDM ratio increases.
多fpga板广泛用于快速系统原型设计。尽管原型设计试图达到最大性能,但性能受到fpga间通信的限制。随着每一代FPGA的I/O容量的增加,FPGA I/O正成为一种稀缺资源。该设计分为几个部分,每个部分的容量都适合单个FPGA。信号交叉设计的部分位于不同的fpga中,称为截网。为了解决引脚限制问题,采用时分复用技术在fpga之间以流水线方式发送截网。通过一个FPGA I/O的最大截网数称为TDM比率。有两种多路复用架构用于基于多fpga的原型:逻辑多路复用和ISERDES/OSERDES。本文提出了一种新的多路复用结构——多千兆收发器(MGT)。利用LFSR测试平台在多fpga板上进行了实验,验证了所实现的性能。假设未来用于FPGA间通信的所有FPGA I/ o都具有MGT功能。分析表明,当时分分复用比超过67时,所提出的复用结构可以获得更高的性能。随着TDM比率的增加,所建议的体系结构的性能优于现有体系结构。
{"title":"Future inter-FPGA communication architecture for multi-FPGA based prototyping (abstract only)","authors":"Qingshan Tang, M. Tuna, H. Mehrez","doi":"10.1145/2554688.2554747","DOIUrl":"https://doi.org/10.1145/2554688.2554747","url":null,"abstract":"Multi-FPGA boards are widely used for rapid system prototyping. Even though the prototyping is trying to reach the maximum performance, the performance is limited by the inter-FPGA communication. As the capacity per I/O for each FPGA generation is increasing, FPGA I/Os are becoming a scarce resource. The design is divided into several parts, each part's capacity fits in a single FPGA. Signals crossing design's parts located in different FPGAs are called cut nets. In order to resolve pin limitation problem, cut nets are sent between FPGAs in pipelined way using the Time-Division-Multiplexing technique. The maximum number of cut nets passing through one FPGA I/O is called the TDM ratio. There are two multiplexing architectures used for multi-FPGA based prototyping: Logic Multiplexing and ISERDES/OSERDES. In this paper, a new multiplexing architecture Multi-Gigabit Transceiver (MGT) is proposed. Experiments are done in a multi-FPGA board with the testbench LFSR to validate the achieved performance. Assume that all the FPGA I/Os used for inter-FPGA communication are MGT capable in the future. Analyses show that the proposed multiplexing architecture can achieve higher performance when the TDM ratio exceeds 67. The gain in performance of the proposed architecture over the existing architecture augments as the TDM ratio increases.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133748362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Co-processing with dynamic reconfiguration on heterogeneous MPSoC: practices and design tradeoffs (abstract only) 异构MPSoC上具有动态重构的协同处理:实践和设计权衡(仅摘要)
Chao Wang, Xi Li, Xuehai Zhou, Yunji Chen, K. Bertels
Reconfiguration technique has been considered as one of the most promising electronic design automation (EDA) technologies in MPSoC design paradigms. However, due to the unavoidable latency in the reconfiguration procedure, it still poses a significant challenge to efficiently analyze the trade-offs for the software/hardware execution, static reconfiguration and dynamic reconfiguration. In this paper we first present a heterogeneous MPSoC middleware to support state-of-the-art dynamic partial reconfigurable technologies. Furthermore, we evaluate the reconfiguration latency and analyze the trade-off for the dynamic partial reconfiguration technologies. As a practical study, a heterogeneous MPSoC prototype with JPEG application has been developed on Xilinx Zynq FPGA with state-of-the-art static/dynamic partial reconfigurable technologies. Experimental results on the JPEG case studies demonstrated the leverage among the software execution, hardware execution, and static/dynamic reconfiguration. For the quantitative approach, we have demonstrated the execution time for the different configuration of the hardware steps in JPEG, and the quantitative impact of the dynamic reconfiguration execution. The dynamic reconfiguration could gain the performance benefits for large scale (larger than a certain threshold) computational tasks. Furthermore, overheads and HWICAP hardware utilization have been measured discussed. This work was supported by the NSFC grants No. 61379040, No. 61272131 and No. 61202053.
重构技术被认为是MPSoC设计范式中最有前途的电子设计自动化(EDA)技术之一。然而,由于重构过程中不可避免的延迟,如何有效地分析软件/硬件执行、静态重构和动态重构之间的权衡仍然是一个重大挑战。在本文中,我们首先提出了一个异构MPSoC中间件来支持最先进的动态部分可重构技术。此外,我们评估了重新配置的延迟,并分析了动态部分重新配置技术的权衡。作为一项实际研究,采用最先进的静态/动态部分可重构技术,在Xilinx Zynq FPGA上开发了具有JPEG应用的异构MPSoC原型。JPEG案例研究的实验结果证明了软件执行、硬件执行和静态/动态重新配置之间的平衡。对于定量方法,我们演示了JPEG中硬件步骤的不同配置的执行时间,以及动态重新配置执行的定量影响。对于大规模(大于某个阈值)的计算任务,动态重构可以获得性能优势。此外,还对开销和HWICAP硬件利用率进行了测量。国家自然科学基金项目(61379040、61272131和61202053)资助。
{"title":"Co-processing with dynamic reconfiguration on heterogeneous MPSoC: practices and design tradeoffs (abstract only)","authors":"Chao Wang, Xi Li, Xuehai Zhou, Yunji Chen, K. Bertels","doi":"10.1145/2554688.2554695","DOIUrl":"https://doi.org/10.1145/2554688.2554695","url":null,"abstract":"Reconfiguration technique has been considered as one of the most promising electronic design automation (EDA) technologies in MPSoC design paradigms. However, due to the unavoidable latency in the reconfiguration procedure, it still poses a significant challenge to efficiently analyze the trade-offs for the software/hardware execution, static reconfiguration and dynamic reconfiguration. In this paper we first present a heterogeneous MPSoC middleware to support state-of-the-art dynamic partial reconfigurable technologies. Furthermore, we evaluate the reconfiguration latency and analyze the trade-off for the dynamic partial reconfiguration technologies. As a practical study, a heterogeneous MPSoC prototype with JPEG application has been developed on Xilinx Zynq FPGA with state-of-the-art static/dynamic partial reconfigurable technologies. Experimental results on the JPEG case studies demonstrated the leverage among the software execution, hardware execution, and static/dynamic reconfiguration. For the quantitative approach, we have demonstrated the execution time for the different configuration of the hardware steps in JPEG, and the quantitative impact of the dynamic reconfiguration execution. The dynamic reconfiguration could gain the performance benefits for large scale (larger than a certain threshold) computational tasks. Furthermore, overheads and HWICAP hardware utilization have been measured discussed. This work was supported by the NSFC grants No. 61379040, No. 61272131 and No. 61202053.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"162 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116161565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Session details: Tools and methods 会话详细信息:工具和方法
J. Anderson
{"title":"Session details: Tools and methods","authors":"J. Anderson","doi":"10.1145/3260938","DOIUrl":"https://doi.org/10.1145/3260938","url":null,"abstract":"","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123799353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1