Approaching overhead-free execution on FPGA soft-processors

Charles Eric LaForest, J. Anderson, J. Gregory Steffan
{"title":"Approaching overhead-free execution on FPGA soft-processors","authors":"Charles Eric LaForest, J. Anderson, J. Gregory Steffan","doi":"10.1109/FPT.2014.7082760","DOIUrl":null,"url":null,"abstract":"Implementing systems on FPGA soft-processors, rather than as custom hardware, eases and accelerates the development process, but at the cost of a great reduction in performance. Orthogonal to limitations in parallelism or clock frequency, this reduction in performance primarily originates in the intrinsic addressing and flow-control overheads of scalar microprocessors, which expend a considerable number of cycles interleaving address calculations and branch decisions within the actual useful work. We present an improved FPGA soft-processor architecture which statically overlaps \"overhead\" computations and executes them in parallel with the \"useful\" computations, significantly reducing the number of processor cycles needed to execute sequential programs, while reducing maximum clock frequency to 0.939x of its original value. In addition to eliminating almost all overhead computations, the proposed soft-processor can operate at 500 MHz on the Altera Stratix IV FPGA - 0.909x of the absolute maximum rating. Combined, the high speed and execution efficiency increase the range of FPGA designs amenable to soft-processors rather than custom hardware. We evaluate our cycle count improvements with multiple benchmarks, achieving speedups ranging from 1.07x for control-heavy code, to 1.92x for looping code, never performing worse than the original sequential code, and always performing better than a totally unrolled loop.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"6 1","pages":"99-106"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Field-Programmable Technology (FPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FPT.2014.7082760","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Implementing systems on FPGA soft-processors, rather than as custom hardware, eases and accelerates the development process, but at the cost of a great reduction in performance. Orthogonal to limitations in parallelism or clock frequency, this reduction in performance primarily originates in the intrinsic addressing and flow-control overheads of scalar microprocessors, which expend a considerable number of cycles interleaving address calculations and branch decisions within the actual useful work. We present an improved FPGA soft-processor architecture which statically overlaps "overhead" computations and executes them in parallel with the "useful" computations, significantly reducing the number of processor cycles needed to execute sequential programs, while reducing maximum clock frequency to 0.939x of its original value. In addition to eliminating almost all overhead computations, the proposed soft-processor can operate at 500 MHz on the Altera Stratix IV FPGA - 0.909x of the absolute maximum rating. Combined, the high speed and execution efficiency increase the range of FPGA designs amenable to soft-processors rather than custom hardware. We evaluate our cycle count improvements with multiple benchmarks, achieving speedups ranging from 1.07x for control-heavy code, to 1.92x for looping code, never performing worse than the original sequential code, and always performing better than a totally unrolled loop.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在FPGA软处理器上接近无开销执行
在FPGA软处理器上实现系统,而不是作为定制硬件,简化并加速了开发过程,但代价是性能大大降低。与并行性或时钟频率的限制无关,这种性能的降低主要源于标量微处理器的固有寻址和流量控制开销,在实际有用的工作中,它们在交叉地址计算和分支决策中花费了相当多的周期。我们提出了一种改进的FPGA软处理器架构,它静态地重叠“开销”计算,并与“有用”计算并行执行,显著减少执行顺序程序所需的处理器周期数,同时将最大时钟频率降低到原始值的0.939x。除了消除几乎所有的开销计算外,所提出的软处理器可以在Altera Stratix IV FPGA上以500 MHz的频率工作-绝对最大额定的0.909倍。结合起来,高速度和执行效率增加了适合软处理器而不是定制硬件的FPGA设计范围。我们用多个基准测试来评估我们的循环计数改进,实现了从重控制代码的1.07倍到循环代码的1.92倍的加速,性能从来没有比原始顺序代码差,并且总是比完全展开的循环表现得更好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Message from the General Chair and Program Co-Chairs Accelerator-in-Switch: A Novel Cooperation Framework for FPGAs and GPUs FPGA Accelerated HPC and Data Analytics Novel Neural Network Applications on New Python Enabled Platforms High-level synthesis - the right side of history
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1