High Performance Instruction Scheduling Circuits for Out-of-Order Soft Processors

Henry Wong, Vaughn Betz, Jonathan Rose
{"title":"High Performance Instruction Scheduling Circuits for Out-of-Order Soft Processors","authors":"Henry Wong, Vaughn Betz, Jonathan Rose","doi":"10.1109/FCCM.2016.11","DOIUrl":null,"url":null,"abstract":"Soft processors have a role to play in easing the difficulty of designing applications into FPGAs for two reasons: first, they can be deployed only when needed, unlike permanent on-die hard processors. Second, for the portions of an application that can function sufficiently fast on a soft processor, it is far easier to write and debug single-threaded software code than to create hardware. The breadth of this second role increases when the performance of the soft processor increases, yet there has been little progress in the performance of soft processors since their commercial inception -- in particular, the sophisticated out-of-order superscalar approaches that arrived in the mid 1990s are not employed, despite the fact that their area cost is now easily tolerable. In this paper we take an important step towards out-of-order execution in soft processors by exploring instruction scheduling in an FPGA substrate. This differs from the hard-processor design problem because the logic substrate is restricted to LUTs, whereas hard processor scheduling circuits employ CAM and wired-OR structures to great benefit. We discuss both circuit and microarchitectural trade-offs, and compare three circuit structures for the scheduler, including a new structure called a fused-logic matrix scheduler. With this circuit, large schedulers up to 40 entries can be built with the same cycle time as the commercial Nios II/f soft processor (240~MHz). This careful design has the potential to significantly increase both the IPC and raw compute performance of a soft processor, compared to current commercial soft processors.","PeriodicalId":113498,"journal":{"name":"2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FCCM.2016.11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

Soft processors have a role to play in easing the difficulty of designing applications into FPGAs for two reasons: first, they can be deployed only when needed, unlike permanent on-die hard processors. Second, for the portions of an application that can function sufficiently fast on a soft processor, it is far easier to write and debug single-threaded software code than to create hardware. The breadth of this second role increases when the performance of the soft processor increases, yet there has been little progress in the performance of soft processors since their commercial inception -- in particular, the sophisticated out-of-order superscalar approaches that arrived in the mid 1990s are not employed, despite the fact that their area cost is now easily tolerable. In this paper we take an important step towards out-of-order execution in soft processors by exploring instruction scheduling in an FPGA substrate. This differs from the hard-processor design problem because the logic substrate is restricted to LUTs, whereas hard processor scheduling circuits employ CAM and wired-OR structures to great benefit. We discuss both circuit and microarchitectural trade-offs, and compare three circuit structures for the scheduler, including a new structure called a fused-logic matrix scheduler. With this circuit, large schedulers up to 40 entries can be built with the same cycle time as the commercial Nios II/f soft processor (240~MHz). This careful design has the potential to significantly increase both the IPC and raw compute performance of a soft processor, compared to current commercial soft processors.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
无序软处理器的高性能指令调度电路
软处理器在缓解将应用程序设计成fpga的困难方面发挥了作用,原因有两个:首先,它们可以只在需要时部署,不像永久的硬处理器。其次,对于在软处理器上运行速度足够快的应用程序部分,编写和调试单线程软件代码要比创建硬件容易得多。当软处理器的性能提高时,第二个角色的广度也会增加,但是自从软处理器开始商业化以来,它的性能几乎没有进步——特别是,20世纪90年代中期出现的复杂的无序超标量方法没有被采用,尽管它们的面积成本现在很容易接受。在本文中,我们通过探索FPGA衬底中的指令调度,向软处理器中的乱序执行迈出了重要的一步。这与硬处理器设计问题不同,因为逻辑基板仅限于lut,而硬处理器调度电路采用CAM和有线或结构,从而受益匪浅。我们讨论了电路和微架构的权衡,并比较了调度器的三种电路结构,包括一种称为融合逻辑矩阵调度器的新结构。利用该电路,可以构建多达40个条目的大型调度器,其周期时间与商用Nios II/f软处理器(240~MHz)相同。与当前的商用软处理器相比,这种精心设计有可能显著提高软处理器的IPC和原始计算性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Spatial Predicates Evaluation in the Geohash Domain Using Reconfigurable Hardware Two-Hit Filter Synthesis for Genomic Database Search Initiation Interval Aware Resource Sharing for FPGA DSP Blocks Finding Space-Time Stream Permutations for Minimum Memory and Latency Runtime Parameterizable Regular Expression Operators for Databases
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1