Accelerating Computations on FPGA Carry Chains by Operand Compaction

2011 IEEE 20th Symposium on Computer Arithmetic Pub Date : 2011-07-25 DOI:10.1109/ARITH.2011.22

Thomas B. Preußer, M. Zabel, R. Spallek

{"title":"Accelerating Computations on FPGA Carry Chains by Operand Compaction","authors":"Thomas B. Preußer, M. Zabel, R. Spallek","doi":"10.1109/ARITH.2011.22","DOIUrl":null,"url":null,"abstract":"This work describes the carry-compact addition (CCA), a novel addition scheme that allows the acceleration of carry-chain computations on contemporary FPGA devices. While based on concepts known from the carry-look ahead addition and from parallel prefix adders, their adaptation by the CCA takes the context of an FPGA as implementation environment into account. These typically provide carry-chain structures to accelerate the simple ripple-carry addition (RCA). Rather than contrasting this scheme with the hierarchical addition approaches favored in hard-core VLSI designs, the CCA combines the benefits of both and uses hierarchical structures to shorten the critical path, which is still left on a core carry chain. In contrast to previous studies examining the asymptotically superior parallel prefix adders on FPGAs, the CCA is shown to outperform the standard RCA already for operand widths starting at 50~bits. Wider adders such as used in extended-precision floating-point units and in cryptographic applications even benefit from increasing speedups. The concrete mapping of the CCA as achieved for current Xilinx and Altera architectures is described and shown to be very favorable so as to yield a high speedup for a very modest investment of additional LUT resources.","PeriodicalId":272151,"journal":{"name":"2011 IEEE 20th Symposium on Computer Arithmetic","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE 20th Symposium on Computer Arithmetic","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ARITH.2011.22","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

Abstract

This work describes the carry-compact addition (CCA), a novel addition scheme that allows the acceleration of carry-chain computations on contemporary FPGA devices. While based on concepts known from the carry-look ahead addition and from parallel prefix adders, their adaptation by the CCA takes the context of an FPGA as implementation environment into account. These typically provide carry-chain structures to accelerate the simple ripple-carry addition (RCA). Rather than contrasting this scheme with the hierarchical addition approaches favored in hard-core VLSI designs, the CCA combines the benefits of both and uses hierarchical structures to shorten the critical path, which is still left on a core carry chain. In contrast to previous studies examining the asymptotically superior parallel prefix adders on FPGAs, the CCA is shown to outperform the standard RCA already for operand widths starting at 50~bits. Wider adders such as used in extended-precision floating-point units and in cryptographic applications even benefit from increasing speedups. The concrete mapping of the CCA as achieved for current Xilinx and Altera architectures is described and shown to be very favorable so as to yield a high speedup for a very modest investment of additional LUT resources.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过操作数压缩加速FPGA进位链的计算

这项工作描述了进位紧凑型加法(CCA)，这是一种新颖的加法方案，可以加速当代FPGA设备上的进位链计算。虽然基于从前移加法和并行前缀加法器中已知的概念，但CCA对它们的适应考虑了FPGA作为实现环境的上下文。它们通常提供携带链结构来加速简单的波纹携带加法(RCA)。CCA并没有将这种方案与硬核VLSI设计中青睐的分层加法方法进行对比，而是结合了两者的优点，并使用分层结构来缩短关键路径，而关键路径仍然留在核心进位链上。与之前研究fpga上渐近优越的并行前缀加法器相比，CCA在操作数宽度从50~bits开始时已经优于标准RCA。更宽的加法器，例如用于扩展精度浮点单元和加密应用程序的加法器，甚至可以从提高速度中受益。对于当前Xilinx和Altera架构实现的CCA的具体映射进行了描述，并显示出非常有利的效果，从而以非常适度的额外LUT资源投资产生高加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2011 IEEE 20th Symposium on Computer Arithmetic

自引率

0.00%

发文量

期刊最新文献

Fused Multiply-Add Microarchitecture Comprising Separate Early-Normalizing Multiply and Add Pipelines A 1.5 Ghz VLIW DSP CPU with Integrated Floating Point and Fixed Point Instructions in 40 nm CMOS Flocq: A Unified Library for Proving Floating-Point Algorithms in Coq Teraflop FPGA Design Self Checking in Current Floating-Point Units