Rethinking Integer Divider Design for FPGA-Based Soft-Processors

2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2019-04-01 DOI:10.1145/3502492

Eric Matthews, Alec Lu, Zhenman Fang, Lesley Shannon

{"title":"Rethinking Integer Divider Design for FPGA-Based Soft-Processors","authors":"Eric Matthews, Alec Lu, Zhenman Fang, Lesley Shannon","doi":"10.1145/3502492","DOIUrl":null,"url":null,"abstract":"Most existing soft-processors on FPGAs today support a fixed-latency instruction pipeline. Therefore, for integer division, a simple fixed-latency radix-2 integer divider is typically used, or algorithm-level changes are made to avoid integer divisions. However, for certain important application domains the simple radix-2 integer divider becomes the performance bottleneck, as every 32-bit division operation takes 32 cycles. In this paper, we explore integer divider designs for FPGA-based soft-processors, by leveraging the recent support of variable-latency execution units in their instruction pipeline. We implement a high-performance, data-dependent, variable-latency integer divider called Quick-Div, optimize its performance on FPGAs, and integrate it into a RISC-V soft-processor called Taiga that supports a variable-latency instruction pipeline. We perform a comprehensive analysis and comparison—in terms of cycles, clock frequency, and resource usage—for both the fixed-latency radix-2/4/8/16 dividers and our variable-latency Quick-Div divider with various optimizations. Experimental results on a Xilinx Virtex UltraScale+ VCU118 FPGA board show that our Quick-Div divider can provide over 5x better performance and over 4x better performance/LUT compared to a radix-2 divider for certain applications like random number generation. Finally, through a case study of integer square root, we demonstrate that our Quick-Div divider provides opportunities for reconsidering simpler and faster algorithmic choices.","PeriodicalId":116955,"journal":{"name":"2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3502492","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

Most existing soft-processors on FPGAs today support a fixed-latency instruction pipeline. Therefore, for integer division, a simple fixed-latency radix-2 integer divider is typically used, or algorithm-level changes are made to avoid integer divisions. However, for certain important application domains the simple radix-2 integer divider becomes the performance bottleneck, as every 32-bit division operation takes 32 cycles. In this paper, we explore integer divider designs for FPGA-based soft-processors, by leveraging the recent support of variable-latency execution units in their instruction pipeline. We implement a high-performance, data-dependent, variable-latency integer divider called Quick-Div, optimize its performance on FPGAs, and integrate it into a RISC-V soft-processor called Taiga that supports a variable-latency instruction pipeline. We perform a comprehensive analysis and comparison—in terms of cycles, clock frequency, and resource usage—for both the fixed-latency radix-2/4/8/16 dividers and our variable-latency Quick-Div divider with various optimizations. Experimental results on a Xilinx Virtex UltraScale+ VCU118 FPGA board show that our Quick-Div divider can provide over 5x better performance and over 4x better performance/LUT compared to a radix-2 divider for certain applications like random number generation. Finally, through a case study of integer square root, we demonstrate that our Quick-Div divider provides opportunities for reconsidering simpler and faster algorithmic choices.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于fpga的软处理器整数除法器设计的再思考

目前fpga上的大多数现有软处理器都支持固定延迟指令管道。因此，对于整数除法，通常使用简单的固定延迟基数-2整数除法器，或者进行算法级更改以避免整数除法。然而，对于某些重要的应用领域，简单的基数-2整数除法成为性能瓶颈，因为每个32位除法操作需要32个周期。在本文中，我们探讨了基于fpga的软处理器的整数除法设计，通过利用其指令管道中可变延迟执行单元的最新支持。我们实现了一种高性能、数据依赖、可变延迟的整数除法，称为Quick-Div，优化了其在fpga上的性能，并将其集成到一个名为Taiga的RISC-V软处理器中，该处理器支持可变延迟指令管道。我们在周期、时钟频率和资源使用方面对固定延迟基数2/4/8/16除法和可变延迟快速除法进行了全面的分析和比较，并进行了各种优化。在Xilinx Virtex UltraScale+ VCU118 FPGA板上的实验结果表明，对于某些应用(如随机数生成)，与基数2除法器相比，我们的快速除法器可以提供超过5倍的性能和超过4倍的性能/LUT。最后，通过整数平方根的案例研究，我们证明了我们的快速除法为重新考虑更简单和更快的算法选择提供了机会。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

自引率

0.00%

发文量