Rethinking Integer Divider Design for FPGA-Based Soft-Processors

Eric Matthews, Alec Lu, Zhenman Fang, Lesley Shannon
{"title":"Rethinking Integer Divider Design for FPGA-Based Soft-Processors","authors":"Eric Matthews, Alec Lu, Zhenman Fang, Lesley Shannon","doi":"10.1145/3502492","DOIUrl":null,"url":null,"abstract":"Most existing soft-processors on FPGAs today support a fixed-latency instruction pipeline. Therefore, for integer division, a simple fixed-latency radix-2 integer divider is typically used, or algorithm-level changes are made to avoid integer divisions. However, for certain important application domains the simple radix-2 integer divider becomes the performance bottleneck, as every 32-bit division operation takes 32 cycles. In this paper, we explore integer divider designs for FPGA-based soft-processors, by leveraging the recent support of variable-latency execution units in their instruction pipeline. We implement a high-performance, data-dependent, variable-latency integer divider called Quick-Div, optimize its performance on FPGAs, and integrate it into a RISC-V soft-processor called Taiga that supports a variable-latency instruction pipeline. We perform a comprehensive analysis and comparison—in terms of cycles, clock frequency, and resource usage—for both the fixed-latency radix-2/4/8/16 dividers and our variable-latency Quick-Div divider with various optimizations. Experimental results on a Xilinx Virtex UltraScale+ VCU118 FPGA board show that our Quick-Div divider can provide over 5x better performance and over 4x better performance/LUT compared to a radix-2 divider for certain applications like random number generation. Finally, through a case study of integer square root, we demonstrate that our Quick-Div divider provides opportunities for reconsidering simpler and faster algorithmic choices.","PeriodicalId":116955,"journal":{"name":"2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3502492","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

Most existing soft-processors on FPGAs today support a fixed-latency instruction pipeline. Therefore, for integer division, a simple fixed-latency radix-2 integer divider is typically used, or algorithm-level changes are made to avoid integer divisions. However, for certain important application domains the simple radix-2 integer divider becomes the performance bottleneck, as every 32-bit division operation takes 32 cycles. In this paper, we explore integer divider designs for FPGA-based soft-processors, by leveraging the recent support of variable-latency execution units in their instruction pipeline. We implement a high-performance, data-dependent, variable-latency integer divider called Quick-Div, optimize its performance on FPGAs, and integrate it into a RISC-V soft-processor called Taiga that supports a variable-latency instruction pipeline. We perform a comprehensive analysis and comparison—in terms of cycles, clock frequency, and resource usage—for both the fixed-latency radix-2/4/8/16 dividers and our variable-latency Quick-Div divider with various optimizations. Experimental results on a Xilinx Virtex UltraScale+ VCU118 FPGA board show that our Quick-Div divider can provide over 5x better performance and over 4x better performance/LUT compared to a radix-2 divider for certain applications like random number generation. Finally, through a case study of integer square root, we demonstrate that our Quick-Div divider provides opportunities for reconsidering simpler and faster algorithmic choices.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于fpga的软处理器整数除法器设计的再思考
目前fpga上的大多数现有软处理器都支持固定延迟指令管道。因此,对于整数除法,通常使用简单的固定延迟基数-2整数除法器,或者进行算法级更改以避免整数除法。然而,对于某些重要的应用领域,简单的基数-2整数除法成为性能瓶颈,因为每个32位除法操作需要32个周期。在本文中,我们探讨了基于fpga的软处理器的整数除法设计,通过利用其指令管道中可变延迟执行单元的最新支持。我们实现了一种高性能、数据依赖、可变延迟的整数除法,称为Quick-Div,优化了其在fpga上的性能,并将其集成到一个名为Taiga的RISC-V软处理器中,该处理器支持可变延迟指令管道。我们在周期、时钟频率和资源使用方面对固定延迟基数2/4/8/16除法和可变延迟快速除法进行了全面的分析和比较,并进行了各种优化。在Xilinx Virtex UltraScale+ VCU118 FPGA板上的实验结果表明,对于某些应用(如随机数生成),与基数2除法器相比,我们的快速除法器可以提供超过5倍的性能和超过4倍的性能/LUT。最后,通过整数平方根的案例研究,我们证明了我们的快速除法为重新考虑更简单和更快的算法选择提供了机会。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Hardware Acceleration of Long Read Pairwise Overlapping in Genome Sequencing: A Race Between FPGA and GPU MEG: A RISCV-Based System Simulation Infrastructure for Exploring Memory Optimization Using FPGAs and Hybrid Memory Cube π-BA: Bundle Adjustment Acceleration on Embedded FPGAs with Co-observation Optimization Safe Task Interruption for FPGAs Analyzing the Energy-Efficiency of Vision Kernels on Embedded CPU, GPU and FPGA Platforms
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1