首页 > 最新文献

Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays最新文献

英文 中文
Binary stochastic implementation of digital logic 数字逻辑二进制随机实现
Yanzi Zhu, Peiran Suo, K. Bazargan
Stochastic computing refers to a mode of computation in which numbers are treated as probabilities implemented as 0/1 bit streams, which essentially is a unary encoding scheme. Previous work has shown significant reduction in area and increase in fault tolerance for low to medium resolution values (6-10 bits). However, this comes at very high latency cost. We propose a novel hybrid approach combining traditional binary with unary stochastic encoding, called binary stochastic. Similar to the binary representation, it is a positional number system, but instead of only 0/1 digits, the digits would be fractions. We show how simple logic such as adders and multipliers can be implemented, and then show more complex function implementations such as the gamma correction function and functions such as tanh, absolute and exponentiation using both combinational and sequential binary stochastic logic. Our experiments show significant reduction in latency compared to unary stochastic, while using significantly smaller area compared to binary implementations on FPGAs.
随机计算指的是一种计算模式,其中数字被视为实现为0/1比特流的概率,本质上是一种一元编码方案。以前的工作表明,在低到中等分辨率值(6-10位)下,面积显著减少,容错性增加。然而,这带来了非常高的延迟成本。我们提出了一种将传统二进制编码与一元随机编码相结合的新方法,称为二进制随机编码。类似于二进制表示,它是一个位置数字系统,但不是只有0/1的数字,这些数字将是分数。我们展示了如何实现简单的逻辑,如加法器和乘法器,然后展示了更复杂的函数实现,如伽马校正函数和函数,如tanh,绝对和幂,使用组合和顺序二进制随机逻辑。我们的实验表明,与一元随机相比,延迟显著减少,而与fpga上的二进制实现相比,使用的面积明显更小。
{"title":"Binary stochastic implementation of digital logic","authors":"Yanzi Zhu, Peiran Suo, K. Bazargan","doi":"10.1145/2554688.2554778","DOIUrl":"https://doi.org/10.1145/2554688.2554778","url":null,"abstract":"Stochastic computing refers to a mode of computation in which numbers are treated as probabilities implemented as 0/1 bit streams, which essentially is a unary encoding scheme. Previous work has shown significant reduction in area and increase in fault tolerance for low to medium resolution values (6-10 bits). However, this comes at very high latency cost. We propose a novel hybrid approach combining traditional binary with unary stochastic encoding, called binary stochastic. Similar to the binary representation, it is a positional number system, but instead of only 0/1 digits, the digits would be fractions. We show how simple logic such as adders and multipliers can be implemented, and then show more complex function implementations such as the gamma correction function and functions such as tanh, absolute and exponentiation using both combinational and sequential binary stochastic logic. Our experiments show significant reduction in latency compared to unary stochastic, while using significantly smaller area compared to binary implementations on FPGAs.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130490016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A new basic logic structure for data-path computation (abstract only) 一种新的数据路径计算的基本逻辑结构(仅抽象)
P. Gaillardon, L. Amarù, G. Micheli
Nowadays, Field Programmable Gate Arrays (FPGA) implement arithmetic functions using specific circuits at the logic block level, such as the carry paths, or at the structure level adopting Digital Signal Processing (DSP) blocks. Nevertheless, all these approaches, introduced to ease the realization of specific functions, are lacking of generality. In this paper, we introduce a new logic block that natively realizes arithmetic functions while preserving the versatility to implement general logic functions. It consists of a partially interconnected matrix of signal routers driven by comparators. We demonstrate that this structure can realize (i) any 2-output 2-input logic function or (ii) any single-output 3-input logic function or (iii) specific logic, such as arithmetic functions, with up to 4-output and 8-inputs. As compared to a standard 6-input Look Up Table (LUT), the proposed block requires roughly the same area but is 35.3% faster. Even though the proposed block has not the same exhaustive configurability of a 6-input LUT, there are arithmetic functions realizable in a single block that do not fit in one, or even more, 6-input LUT. For example, a single block inherently implements an entire 3-bit adder that requires 3× more resources with LUTs plus also custom circuitry. From a system level perspective, we show that a 256-bit adder is implemented with a gain on area×delay product of 31% as compared to its traditional LUT-based counterpart.
目前,现场可编程门阵列(FPGA)在逻辑块级(如进位路径)或结构级(采用数字信号处理(DSP)块)使用特定电路实现算术功能。然而,所有这些方法都是为了简化特定功能的实现而引入的,缺乏通用性。在本文中,我们引入了一种新的逻辑块,它既能实现算术函数,又能保持实现一般逻辑函数的通用性。它由由比较器驱动的部分互连的信号路由器矩阵组成。我们证明了这种结构可以实现(i)任何2输出2输入逻辑函数或(ii)任何单输出3输入逻辑函数或(iii)特定逻辑,如算术函数,最多有4输出和8输入。与标准的6输入查找表(LUT)相比,建议的块需要大致相同的面积,但速度快35.3%。尽管所建议的块不具有6输入LUT的穷举可配置性,但在单个块中可以实现的算术函数并不适合一个或多个6输入LUT。例如,单个块固有地实现了一个完整的3位加法器,它需要3倍多的lut资源以及自定义电路。从系统级的角度来看,我们展示了一个256位加法器的实现,与传统的基于lut的加法器相比,area×delay产品的增益为31%。
{"title":"A new basic logic structure for data-path computation (abstract only)","authors":"P. Gaillardon, L. Amarù, G. Micheli","doi":"10.1145/2554688.2554701","DOIUrl":"https://doi.org/10.1145/2554688.2554701","url":null,"abstract":"Nowadays, Field Programmable Gate Arrays (FPGA) implement arithmetic functions using specific circuits at the logic block level, such as the carry paths, or at the structure level adopting Digital Signal Processing (DSP) blocks. Nevertheless, all these approaches, introduced to ease the realization of specific functions, are lacking of generality. In this paper, we introduce a new logic block that natively realizes arithmetic functions while preserving the versatility to implement general logic functions. It consists of a partially interconnected matrix of signal routers driven by comparators. We demonstrate that this structure can realize (i) any 2-output 2-input logic function or (ii) any single-output 3-input logic function or (iii) specific logic, such as arithmetic functions, with up to 4-output and 8-inputs. As compared to a standard 6-input Look Up Table (LUT), the proposed block requires roughly the same area but is 35.3% faster. Even though the proposed block has not the same exhaustive configurability of a 6-input LUT, there are arithmetic functions realizable in a single block that do not fit in one, or even more, 6-input LUT. For example, a single block inherently implements an entire 3-bit adder that requires 3× more resources with LUTs plus also custom circuitry. From a system level perspective, we show that a 256-bit adder is implemented with a gain on area×delay product of 31% as compared to its traditional LUT-based counterpart.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127897732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An automatic netlist and floorplanning approach to improve the MTTR of scrubbing techniques (abstract only) 一种自动网表和地板规划方法来提高擦洗技术的MTTR(仅摘要)
Bernhard Schmidt, Daniel Ziener, J. Teich
We introduce a new SEU mitigation approach which minimizes the scrubbing effort by a) using an automatic classification of the criticality of netlist instances and their resulting configuration bits, and by b) minimizing the number of frames which must be scrubbed by using intelligent floorplanning. The criticality of configuration bits is defined by the actions needed to correct a radiation-induced SEU at this bit. Indeed, circuits that involve feedback loops might still and infinitely cause a malfunction even if scrubbing is applied to involved configuration frames. Here, only supplementary state-restoring might be a viable solution. By analyzing an FPGA design already at the logic level and partition configuration bits of the resulting FPGA mapping into so-called essential bits and critical bits, we are able to significantly reduce the number of time consuming state-restoring actions. Moreover, by using placement and routing constraints, it is shown how to minimize the number of frames which have to be reconfigured or checked when using scrubbing. By applying both methods, we will show a reduction of the Mean-Time-To-Repair (MTTR) for sequential benchmark circuits by up to 48.5% compared to a state-of-the-art approach.
我们引入了一种新的SEU缓解方法,通过a)使用对网表实例的临界性及其产生的配置位的自动分类,以及b)通过使用智能地板规划最小化必须擦洗的帧数,从而最大限度地减少擦洗工作。配置位的关键程度取决于在该位纠正辐射诱发的SEU所需的措施。实际上,包含反馈回路的电路,即使对所涉及的配置帧应用了擦洗,仍然可能无限地引起故障。在这里,只有补充状态恢复可能是一个可行的解决方案。通过分析已经在逻辑级别的FPGA设计,并将结果FPGA的配置位划分为所谓的基本位和关键位,我们能够显着减少耗时的状态恢复操作的数量。此外,通过使用位置和路由约束,展示了如何在使用擦洗时最小化必须重新配置或检查的帧的数量。通过应用这两种方法,我们将显示,与最先进的方法相比,顺序基准电路的平均维修时间(MTTR)减少了48.5%。
{"title":"An automatic netlist and floorplanning approach to improve the MTTR of scrubbing techniques (abstract only)","authors":"Bernhard Schmidt, Daniel Ziener, J. Teich","doi":"10.1145/2554688.2554730","DOIUrl":"https://doi.org/10.1145/2554688.2554730","url":null,"abstract":"We introduce a new SEU mitigation approach which minimizes the scrubbing effort by a) using an automatic classification of the criticality of netlist instances and their resulting configuration bits, and by b) minimizing the number of frames which must be scrubbed by using intelligent floorplanning. The criticality of configuration bits is defined by the actions needed to correct a radiation-induced SEU at this bit. Indeed, circuits that involve feedback loops might still and infinitely cause a malfunction even if scrubbing is applied to involved configuration frames. Here, only supplementary state-restoring might be a viable solution. By analyzing an FPGA design already at the logic level and partition configuration bits of the resulting FPGA mapping into so-called essential bits and critical bits, we are able to significantly reduce the number of time consuming state-restoring actions. Moreover, by using placement and routing constraints, it is shown how to minimize the number of frames which have to be reconfigured or checked when using scrubbing. By applying both methods, we will show a reduction of the Mean-Time-To-Repair (MTTR) for sequential benchmark circuits by up to 48.5% compared to a state-of-the-art approach.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123259606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Energy-efficient multiplier-less discrete convolver through probabilistic domain transformation 基于概率域变换的节能无乘子离散卷积器
Mohammed Alawad, Yu Bai, R. Demara, Mingjie Lin
Energy efficiency and algorithmic robustness typically are conflicting circuit characteristics, yet with CMOS technology scaling towards 10-nm feature size, both become critical design metrics simultaneously for modern logic circuits. This paper propose a novel computing scheme hinged on probabilistic domain transformation aiming for both low power operation and fault resilience. In such a computing paradigm, algorithm inputs are first encoded through probabilistic means, which translates the input values into a number of random samples. Subsequently, light-weight operations, such as sim- ple additions will be performed onto these random samples in order to generate new random variables. Finally, the resulting random samples will be decoded probabilistically to give the final results.
能量效率和算法鲁棒性通常是相互冲突的电路特性,但随着CMOS技术向10纳米特征尺寸的扩展,两者同时成为现代逻辑电路的关键设计指标。本文提出了一种基于概率域变换的新型计算方案,以实现低功耗运行和故障恢复。在这样的计算范式中,算法输入首先通过概率方法编码,将输入值转换为一些随机样本。随后,将对这些随机样本执行轻量级操作,例如简单的加法,以生成新的随机变量。最后,将得到的随机样本进行概率解码,从而得到最终结果。
{"title":"Energy-efficient multiplier-less discrete convolver through probabilistic domain transformation","authors":"Mohammed Alawad, Yu Bai, R. Demara, Mingjie Lin","doi":"10.1145/2554688.2554769","DOIUrl":"https://doi.org/10.1145/2554688.2554769","url":null,"abstract":"Energy efficiency and algorithmic robustness typically are conflicting circuit characteristics, yet with CMOS technology scaling towards 10-nm feature size, both become critical design metrics simultaneously for modern logic circuits. This paper propose a novel computing scheme hinged on probabilistic domain transformation aiming for both low power operation and fault resilience. In such a computing paradigm, algorithm inputs are first encoded through probabilistic means, which translates the input values into a number of random samples. Subsequently, light-weight operations, such as sim- ple additions will be performed onto these random samples in order to generate new random variables. Finally, the resulting random samples will be decoded probabilistically to give the final results.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114078938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
Redefining the role of FPGAs in the next generation avionic systems (abstract only) 重新定义fpga在下一代航空电子系统中的作用(仅摘要)
V. Viswanathan, R. B. Atitallah, J. Dekeyser, Benjamin Nakache, M. Nakache
Embedded reconfigurable computing is becoming a new paradigm for system designers in avionic applications. In fact, FPGAs can be used for more than just computational purpose in order to improve the system performance. The introduction of FPGA Mezzanine Card (FMC) I/O standard has given a new purpose for FPGAs to be used as a communication platform. Taking into account the features offered by FPGAs and FMCs, such as runtime reconfiguration and modularity, we have redefined the role of these devices to be used as a generic communication and computation-centric platform. A new modular, runtime reconfigurable, Intellectual Property (IP)-based communication-centric platform for avionic applications has been designed. This means that, when the communication requirement of an avionic system changes, the necessary communication protocol is installed and executed on demand, without disturbing the normal operation of a time-critical avionic system. The efficiency and the performances of our platform are illustrated through a real industrial use-case designed using a computationally intensive application and several avionic I/O bus standards. The reconfiguration latency can be hidden totally in many cases. While in certain others, the overhead of reconfiguration can be justified by the reduction in the resource utilization.
嵌入式可重构计算正在成为航空电子应用系统设计者的一种新范式。事实上,为了提高系统性能,fpga可以用于不仅仅是计算目的。FPGA mezz卡(FMC) I/O标准的引入,为FPGA作为通信平台提供了新的用途。考虑到fpga和fmc提供的功能,例如运行时重构和模块化,我们重新定义了这些设备的角色,将其用作通用的通信和以计算为中心的平台。为航空电子应用设计了一种新的模块化、运行时可重构、基于知识产权(IP)的通信中心平台。这意味着,当航空电子系统的通信需求发生变化时,必要的通信协议被安装并按需执行,而不会干扰时间关键型航空电子系统的正常运行。通过使用计算密集型应用程序和几种航空电子I/O总线标准设计的实际工业用例,说明了我们平台的效率和性能。在许多情况下,重新配置延迟可以完全隐藏。而在某些其他情况下,可以通过减少资源利用率来证明重新配置的开销是合理的。
{"title":"Redefining the role of FPGAs in the next generation avionic systems (abstract only)","authors":"V. Viswanathan, R. B. Atitallah, J. Dekeyser, Benjamin Nakache, M. Nakache","doi":"10.1145/2554688.2554744","DOIUrl":"https://doi.org/10.1145/2554688.2554744","url":null,"abstract":"Embedded reconfigurable computing is becoming a new paradigm for system designers in avionic applications. In fact, FPGAs can be used for more than just computational purpose in order to improve the system performance. The introduction of FPGA Mezzanine Card (FMC) I/O standard has given a new purpose for FPGAs to be used as a communication platform. Taking into account the features offered by FPGAs and FMCs, such as runtime reconfiguration and modularity, we have redefined the role of these devices to be used as a generic communication and computation-centric platform. A new modular, runtime reconfigurable, Intellectual Property (IP)-based communication-centric platform for avionic applications has been designed. This means that, when the communication requirement of an avionic system changes, the necessary communication protocol is installed and executed on demand, without disturbing the normal operation of a time-critical avionic system. The efficiency and the performances of our platform are illustrated through a real industrial use-case designed using a computationally intensive application and several avionic I/O bus standards. The reconfiguration latency can be hidden totally in many cases. While in certain others, the overhead of reconfiguration can be justified by the reduction in the resource utilization.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116212236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Power estimation tool for system on programmable chip based platforms (abstract only) 基于可编程芯片平台的系统功率估计工具(仅摘要)
S. Rethinagiri, Oscar Palomar, A. Cristal, O. Unsal
The ever increasing complexity of the applications result in the development of power hungry processors. There is a scarcity of standalone tools that have a good trade off between estimation speed and accuracy to estimate power/energy at an earlier phase of design flow. There are very few tools that addresses the design space exploration issue based on power and energy. In this paper, we propose a virtual platform based standalone power and energy estimation tool for System-on-Programmable Chip (SoPC) embedded platforms, which is independent of in-house tools. There are two steps involved in this tool development. The first step is power model generation. For the power model development, we used functional parameters to set up generic power models for the different parts of the system. This is a onetime activity. In the second step, a simulation based virtual platform framework is developed to evaluate accurately the activities used in the related power models developed in the first step. The combination of the two steps lead to a hybrid power estimation, which gives a better trade-off between accuracy and speed. The proposed tool has several benefits: it considers the power consumption of the embedded system in its entirety and leads to accurate estimates without a costly and complex material. The proposed tool is also scalable for exploring complex embedded multi-core architectures. The effectiveness of our proposed tool is validated through dualcore RISC processor designed around the FPGA board and extended to accommodate futuristic multi-core processors for a reliable energy based design space exploration. The accuracy of our proposed tool is evaluated by using a variety of industrial benchmarks such as Multimedia, EEMBC and SPEC2006. Estimated power values are compared to real board measurements and also to McPAT. Our obtained power/energy estimation results provide less than 9% of error for heterogeneous MPSoC based system and are 200% faster compared to other state-of-the-art power estimation tools.
不断增加的应用程序复杂性导致了耗电处理器的开发。在设计流程的早期阶段,很少有独立的工具能够很好地在估计速度和准确性之间进行权衡,以估计功率/能量。很少有工具能够解决基于电力和能源的设计空间探索问题。在本文中,我们提出了一个基于虚拟平台的独立的可编程芯片系统(SoPC)嵌入式平台的功耗和能量估计工具,它独立于内部工具。这个工具的开发涉及两个步骤。第一步是功率模型生成。对于功率模型的开发,我们使用功能参数建立了系统不同部分的通用功率模型。这是一次性的活动。在第二步中,开发了基于仿真的虚拟平台框架,以准确评估第一步中开发的相关功率模型中使用的活动。这两个步骤的结合导致了混合功率估计,它在精度和速度之间提供了更好的权衡。提出的工具有几个好处:它从整体上考虑嵌入式系统的功耗,并在没有昂贵和复杂材料的情况下进行准确的估计。该工具还可扩展,用于探索复杂的嵌入式多核体系结构。我们提出的工具的有效性通过围绕FPGA板设计的双核RISC处理器进行验证,并扩展到适应未来的多核处理器,以实现可靠的基于能源的设计空间探索。我们提出的工具的准确性通过使用各种工业基准,如多媒体,EEMBC和SPEC2006进行评估。估计的功率值与实际电路板测量值以及McPAT进行比较。我们获得的功率/能量估计结果为基于异构MPSoC的系统提供小于9%的误差,与其他最先进的功率估计工具相比,速度快200%。
{"title":"Power estimation tool for system on programmable chip based platforms (abstract only)","authors":"S. Rethinagiri, Oscar Palomar, A. Cristal, O. Unsal","doi":"10.1145/2554688.2554718","DOIUrl":"https://doi.org/10.1145/2554688.2554718","url":null,"abstract":"The ever increasing complexity of the applications result in the development of power hungry processors. There is a scarcity of standalone tools that have a good trade off between estimation speed and accuracy to estimate power/energy at an earlier phase of design flow. There are very few tools that addresses the design space exploration issue based on power and energy. In this paper, we propose a virtual platform based standalone power and energy estimation tool for System-on-Programmable Chip (SoPC) embedded platforms, which is independent of in-house tools. There are two steps involved in this tool development. The first step is power model generation. For the power model development, we used functional parameters to set up generic power models for the different parts of the system. This is a onetime activity. In the second step, a simulation based virtual platform framework is developed to evaluate accurately the activities used in the related power models developed in the first step. The combination of the two steps lead to a hybrid power estimation, which gives a better trade-off between accuracy and speed. The proposed tool has several benefits: it considers the power consumption of the embedded system in its entirety and leads to accurate estimates without a costly and complex material. The proposed tool is also scalable for exploring complex embedded multi-core architectures. The effectiveness of our proposed tool is validated through dualcore RISC processor designed around the FPGA board and extended to accommodate futuristic multi-core processors for a reliable energy based design space exploration. The accuracy of our proposed tool is evaluated by using a variety of industrial benchmarks such as Multimedia, EEMBC and SPEC2006. Estimated power values are compared to real board measurements and also to McPAT. Our obtained power/energy estimation results provide less than 9% of error for heterogeneous MPSoC based system and are 200% faster compared to other state-of-the-art power estimation tools.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116536653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Session details: Tools and methods 会话详细信息:工具和方法
J. Anderson
{"title":"Session details: Tools and methods","authors":"J. Anderson","doi":"10.1145/3260938","DOIUrl":"https://doi.org/10.1145/3260938","url":null,"abstract":"","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123799353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Co-processing with dynamic reconfiguration on heterogeneous MPSoC: practices and design tradeoffs (abstract only) 异构MPSoC上具有动态重构的协同处理:实践和设计权衡(仅摘要)
Chao Wang, Xi Li, Xuehai Zhou, Yunji Chen, K. Bertels
Reconfiguration technique has been considered as one of the most promising electronic design automation (EDA) technologies in MPSoC design paradigms. However, due to the unavoidable latency in the reconfiguration procedure, it still poses a significant challenge to efficiently analyze the trade-offs for the software/hardware execution, static reconfiguration and dynamic reconfiguration. In this paper we first present a heterogeneous MPSoC middleware to support state-of-the-art dynamic partial reconfigurable technologies. Furthermore, we evaluate the reconfiguration latency and analyze the trade-off for the dynamic partial reconfiguration technologies. As a practical study, a heterogeneous MPSoC prototype with JPEG application has been developed on Xilinx Zynq FPGA with state-of-the-art static/dynamic partial reconfigurable technologies. Experimental results on the JPEG case studies demonstrated the leverage among the software execution, hardware execution, and static/dynamic reconfiguration. For the quantitative approach, we have demonstrated the execution time for the different configuration of the hardware steps in JPEG, and the quantitative impact of the dynamic reconfiguration execution. The dynamic reconfiguration could gain the performance benefits for large scale (larger than a certain threshold) computational tasks. Furthermore, overheads and HWICAP hardware utilization have been measured discussed. This work was supported by the NSFC grants No. 61379040, No. 61272131 and No. 61202053.
重构技术被认为是MPSoC设计范式中最有前途的电子设计自动化(EDA)技术之一。然而,由于重构过程中不可避免的延迟,如何有效地分析软件/硬件执行、静态重构和动态重构之间的权衡仍然是一个重大挑战。在本文中,我们首先提出了一个异构MPSoC中间件来支持最先进的动态部分可重构技术。此外,我们评估了重新配置的延迟,并分析了动态部分重新配置技术的权衡。作为一项实际研究,采用最先进的静态/动态部分可重构技术,在Xilinx Zynq FPGA上开发了具有JPEG应用的异构MPSoC原型。JPEG案例研究的实验结果证明了软件执行、硬件执行和静态/动态重新配置之间的平衡。对于定量方法,我们演示了JPEG中硬件步骤的不同配置的执行时间,以及动态重新配置执行的定量影响。对于大规模(大于某个阈值)的计算任务,动态重构可以获得性能优势。此外,还对开销和HWICAP硬件利用率进行了测量。国家自然科学基金项目(61379040、61272131和61202053)资助。
{"title":"Co-processing with dynamic reconfiguration on heterogeneous MPSoC: practices and design tradeoffs (abstract only)","authors":"Chao Wang, Xi Li, Xuehai Zhou, Yunji Chen, K. Bertels","doi":"10.1145/2554688.2554695","DOIUrl":"https://doi.org/10.1145/2554688.2554695","url":null,"abstract":"Reconfiguration technique has been considered as one of the most promising electronic design automation (EDA) technologies in MPSoC design paradigms. However, due to the unavoidable latency in the reconfiguration procedure, it still poses a significant challenge to efficiently analyze the trade-offs for the software/hardware execution, static reconfiguration and dynamic reconfiguration. In this paper we first present a heterogeneous MPSoC middleware to support state-of-the-art dynamic partial reconfigurable technologies. Furthermore, we evaluate the reconfiguration latency and analyze the trade-off for the dynamic partial reconfiguration technologies. As a practical study, a heterogeneous MPSoC prototype with JPEG application has been developed on Xilinx Zynq FPGA with state-of-the-art static/dynamic partial reconfigurable technologies. Experimental results on the JPEG case studies demonstrated the leverage among the software execution, hardware execution, and static/dynamic reconfiguration. For the quantitative approach, we have demonstrated the execution time for the different configuration of the hardware steps in JPEG, and the quantitative impact of the dynamic reconfiguration execution. The dynamic reconfiguration could gain the performance benefits for large scale (larger than a certain threshold) computational tasks. Furthermore, overheads and HWICAP hardware utilization have been measured discussed. This work was supported by the NSFC grants No. 61379040, No. 61272131 and No. 61202053.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116161565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A FPGA prototype design emphasis on low power technique 以低功耗技术为重点的FPGA原型设计
Xu Hanyang, Wang Jian, Jin Meilai
In this paper, we propose a fully-functional Nanometer FPGA prototype chip. Compared to traditional single supply voltage, single threshold voltage design, we explore low power nanometer FPGA design challenges with Multi-Vt, Static Voltage Scaling and sleep mode technique. Compared to Dynamic Voltage Scaling (DVS), we make a table of Voltage-Delay parameter pairs under different voltage conditions so that timing information can be calculated by a Static Timing Analysis (STA) tool. Thus a lowest supply power is chosen among all results which meet the timing requirements. This approach would simplify the hardware design since we don't need a complex workload detection circuit compared to DVS system. By separating supply voltages, we can directly shutdown power supply of the unused circuits. Compared to inserting sleep transistor in pull-up or pull-down networks, we can eliminate the speed penalty cased by the additional sleep transistor. We implement a tile-based heterogeneous architecture with island style routing and embedded specific blocks such as DSP and memory. The array size is 64×31 (Row×Col) including 64×24 CLBs. The final design is fabricated using a 1P10M 65-nm bulk CMOS process. Test results show a 53% reduction in static power compared to a commercial FPGA device which is also fabricated in 65nm process and has a similar array size.
在本文中,我们提出了一个全功能的纳米FPGA原型芯片。与传统的单电源电压、单阈值电压设计相比,我们探索了基于Multi-Vt、静态电压缩放和休眠模式技术的低功耗纳米FPGA设计挑战。与动态电压缩放(DVS)相比,我们制作了不同电压条件下的电压延迟参数对表,以便通过静态时序分析(STA)工具计算时序信息。从而在满足时序要求的所有结果中选择最小的电源功率。这种方法可以简化硬件设计,因为与分布式交换机系统相比,我们不需要复杂的工作负载检测电路。通过分离电源电压,我们可以直接关闭未使用电路的电源。与在上拉或下拉网络中插入睡眠晶体管相比,我们可以消除额外的睡眠晶体管造成的速度损失。我们实现了一个基于tile的异构架构,带有岛式路由和嵌入式特定块,如DSP和内存。数组大小为64×31 (Row×Col),包含64×24 clb。最终设计采用1P10M 65nm块体CMOS工艺制造。测试结果表明,与同样采用65nm工艺制造且具有相似阵列尺寸的商用FPGA器件相比,静态功耗降低了53%。
{"title":"A FPGA prototype design emphasis on low power technique","authors":"Xu Hanyang, Wang Jian, Jin Meilai","doi":"10.1145/2554688.2554762","DOIUrl":"https://doi.org/10.1145/2554688.2554762","url":null,"abstract":"In this paper, we propose a fully-functional Nanometer FPGA prototype chip. Compared to traditional single supply voltage, single threshold voltage design, we explore low power nanometer FPGA design challenges with Multi-Vt, Static Voltage Scaling and sleep mode technique. Compared to Dynamic Voltage Scaling (DVS), we make a table of Voltage-Delay parameter pairs under different voltage conditions so that timing information can be calculated by a Static Timing Analysis (STA) tool. Thus a lowest supply power is chosen among all results which meet the timing requirements. This approach would simplify the hardware design since we don't need a complex workload detection circuit compared to DVS system. By separating supply voltages, we can directly shutdown power supply of the unused circuits. Compared to inserting sleep transistor in pull-up or pull-down networks, we can eliminate the speed penalty cased by the additional sleep transistor. We implement a tile-based heterogeneous architecture with island style routing and embedded specific blocks such as DSP and memory. The array size is 64×31 (Row×Col) including 64×24 CLBs. The final design is fabricated using a 1P10M 65-nm bulk CMOS process. Test results show a 53% reduction in static power compared to a commercial FPGA device which is also fabricated in 65nm process and has a similar array size.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121336954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Control signal aware slice-level window based legalization method for FPGA placement (abstract only) 基于控制信号感知的片级窗口的FPGA放置合法化方法(仅摘要)
Yu Wang, Donghoon Yeo, Muhammad Sohail, Hyunchul Shin
The control signal sharing while packing flip-flops and other instances in slices is a necessary constraint in the placement of instances in FPGAs. Global placement usually does not consider signal sharing. In this paper, we propose a control signal aware slice-level packing algorithm within the framework of window based legalization method to obtain an optimized legal layout, satisfying all constraints, after global placement. We select a target window with the highest number of overlaps. Then, we check the capacity of the target window and adjust its size to secure enough space required for legalization. Lastly, window based legalization takes three constraints into account: 1) Control Signal Sharing: Two Flip-Flops in a slice must share a single control signal in FPGA architecture. 2) CLB Architecture Matching: Instances should be placed within a half slice to minimize the routing requirement. 3) Slice Level Packing: Instances are packed into slices for effective utilization of available empty space within a window. The experimental results show that our algorithm performs better with 45% less block displacement and 10% less runtime with the same wirelength when compared to a previous well-known mixed size block greedy legalization method [1].
当将触发器和其他实例打包成片时,控制信号共享是fpga中实例放置的必要约束。全局布局通常不考虑信号共享。本文在基于窗口的合法化方法框架内提出了一种控制信号感知的片级封装算法,以获得全局布局后满足所有约束条件的优化合法布局。我们选择一个有最多重叠的目标窗口。然后,我们检查目标窗口的容量并调整其大小以确保合法化所需的足够空间。最后,基于窗口的合法化考虑了三个约束条件:1)控制信号共享:在FPGA架构中,片中的两个触发器必须共享单个控制信号。2) CLB架构匹配:实例应该放在半片内,以最小化路由需求。3)片级打包:实例被打包到片中,以便有效利用窗口内的可用空白空间。实验结果表明,与之前众所周知的混合大小块贪婪合法化方法[1]相比,我们的算法在相同的无线长度下,块位移减少45%,运行时间减少10%,性能更好。
{"title":"Control signal aware slice-level window based legalization method for FPGA placement (abstract only)","authors":"Yu Wang, Donghoon Yeo, Muhammad Sohail, Hyunchul Shin","doi":"10.1145/2554688.2554727","DOIUrl":"https://doi.org/10.1145/2554688.2554727","url":null,"abstract":"The control signal sharing while packing flip-flops and other instances in slices is a necessary constraint in the placement of instances in FPGAs. Global placement usually does not consider signal sharing. In this paper, we propose a control signal aware slice-level packing algorithm within the framework of window based legalization method to obtain an optimized legal layout, satisfying all constraints, after global placement. We select a target window with the highest number of overlaps. Then, we check the capacity of the target window and adjust its size to secure enough space required for legalization. Lastly, window based legalization takes three constraints into account: 1) Control Signal Sharing: Two Flip-Flops in a slice must share a single control signal in FPGA architecture. 2) CLB Architecture Matching: Instances should be placed within a half slice to minimize the routing requirement. 3) Slice Level Packing: Instances are packed into slices for effective utilization of available empty space within a window. The experimental results show that our algorithm performs better with 45% less block displacement and 10% less runtime with the same wirelength when compared to a previous well-known mixed size block greedy legalization method [1].","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127764393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1