High Performance Instruction Scheduling Circuits for Out-of-Order Soft Processors

2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2016-05-01 DOI:10.1109/FCCM.2016.11

Henry Wong, Vaughn Betz, Jonathan Rose

{"title":"High Performance Instruction Scheduling Circuits for Out-of-Order Soft Processors","authors":"Henry Wong, Vaughn Betz, Jonathan Rose","doi":"10.1109/FCCM.2016.11","DOIUrl":null,"url":null,"abstract":"Soft processors have a role to play in easing the difficulty of designing applications into FPGAs for two reasons: first, they can be deployed only when needed, unlike permanent on-die hard processors. Second, for the portions of an application that can function sufficiently fast on a soft processor, it is far easier to write and debug single-threaded software code than to create hardware. The breadth of this second role increases when the performance of the soft processor increases, yet there has been little progress in the performance of soft processors since their commercial inception -- in particular, the sophisticated out-of-order superscalar approaches that arrived in the mid 1990s are not employed, despite the fact that their area cost is now easily tolerable. In this paper we take an important step towards out-of-order execution in soft processors by exploring instruction scheduling in an FPGA substrate. This differs from the hard-processor design problem because the logic substrate is restricted to LUTs, whereas hard processor scheduling circuits employ CAM and wired-OR structures to great benefit. We discuss both circuit and microarchitectural trade-offs, and compare three circuit structures for the scheduler, including a new structure called a fused-logic matrix scheduler. With this circuit, large schedulers up to 40 entries can be built with the same cycle time as the commercial Nios II/f soft processor (240~MHz). This careful design has the potential to significantly increase both the IPC and raw compute performance of a soft processor, compared to current commercial soft processors.","PeriodicalId":113498,"journal":{"name":"2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FCCM.2016.11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

Soft processors have a role to play in easing the difficulty of designing applications into FPGAs for two reasons: first, they can be deployed only when needed, unlike permanent on-die hard processors. Second, for the portions of an application that can function sufficiently fast on a soft processor, it is far easier to write and debug single-threaded software code than to create hardware. The breadth of this second role increases when the performance of the soft processor increases, yet there has been little progress in the performance of soft processors since their commercial inception -- in particular, the sophisticated out-of-order superscalar approaches that arrived in the mid 1990s are not employed, despite the fact that their area cost is now easily tolerable. In this paper we take an important step towards out-of-order execution in soft processors by exploring instruction scheduling in an FPGA substrate. This differs from the hard-processor design problem because the logic substrate is restricted to LUTs, whereas hard processor scheduling circuits employ CAM and wired-OR structures to great benefit. We discuss both circuit and microarchitectural trade-offs, and compare three circuit structures for the scheduler, including a new structure called a fused-logic matrix scheduler. With this circuit, large schedulers up to 40 entries can be built with the same cycle time as the commercial Nios II/f soft processor (240~MHz). This careful design has the potential to significantly increase both the IPC and raw compute performance of a soft processor, compared to current commercial soft processors.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

无序软处理器的高性能指令调度电路

软处理器在缓解将应用程序设计成fpga的困难方面发挥了作用，原因有两个:首先，它们可以只在需要时部署，不像永久的硬处理器。其次，对于在软处理器上运行速度足够快的应用程序部分，编写和调试单线程软件代码要比创建硬件容易得多。当软处理器的性能提高时，第二个角色的广度也会增加，但是自从软处理器开始商业化以来，它的性能几乎没有进步——特别是，20世纪90年代中期出现的复杂的无序超标量方法没有被采用，尽管它们的面积成本现在很容易接受。在本文中，我们通过探索FPGA衬底中的指令调度，向软处理器中的乱序执行迈出了重要的一步。这与硬处理器设计问题不同，因为逻辑基板仅限于lut，而硬处理器调度电路采用CAM和有线或结构，从而受益匪浅。我们讨论了电路和微架构的权衡，并比较了调度器的三种电路结构，包括一种称为融合逻辑矩阵调度器的新结构。利用该电路，可以构建多达40个条目的大型调度器，其周期时间与商用Nios II/f软处理器(240~MHz)相同。与当前的商用软处理器相比，这种精心设计有可能显著提高软处理器的IPC和原始计算性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

自引率

0.00%

发文量

期刊最新文献

Spatial Predicates Evaluation in the Geohash Domain Using Reconfigurable Hardware Two-Hit Filter Synthesis for Genomic Database Search Initiation Interval Aware Resource Sharing for FPGA DSP Blocks Finding Space-Time Stream Permutations for Minimum Memory and Latency Runtime Parameterizable Regular Expression Operators for Databases