使用多操作指令生成具有优化管道结构的api

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 1900-01-01 DOI:10.1109/FCCM.2014.16

Y. Ben-Asher, Irina Lipov, V. Tartakovsky, Dror Tiv

{"title":"使用多操作指令生成具有优化管道结构的api","authors":"Y. Ben-Asher, Irina Lipov, V. Tartakovsky, Dror Tiv","doi":"10.1109/FCCM.2014.16","DOIUrl":null,"url":null,"abstract":"We propose automatic synthesis of application specific instruction set processors (ASIPs). We use pipeline execution of multi-op machine-instructions, e.g., *(reg1*reg2) = (*reg3) + (*reg4) (C-syntax) an instruction with three memory stages and two arithmetic stages pipeline. The problem is, for a given set of loops, to find a pipeline configuration and a multi-op ISA that maximizes the IPC (instructions per cycle) while minimizing the resource usage and the cost of interconnections to the register-file of the resulting CPU. The algorithm is based on finding an efficient cover of a large graph by a small set of convex sub-graphs gis that are consistent with a given structure of a pipeline. Unlike previous works, gis are not synthesized to circuits that are executed in a co-processor mode but rather both gis and the rest of the program are executed by the same set of multiop pipeline units. In this way we eliminate the overhead associated with the co-processor mode of regular ASIPs but maintain high values of IPC of these ASIPs. The main advantage of using pipeline execution of multi-op versus VLIW instructions is shown to be the cost of interconnections between the CPU's execution units and the register file. Thus, we devise a grading function that for each possible multi-op pipeline configuration balance between the expected IPC (Instructions Per Cycle) and the complexity of the interconnections. Using this grading function we show that in most cases the VLIW configuration is not always the best choice.","PeriodicalId":246162,"journal":{"name":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Using Multi-op Instructions as a Way to Generate ASIPs with Optimized Pipeline Structure\",\"authors\":\"Y. Ben-Asher, Irina Lipov, V. Tartakovsky, Dror Tiv\",\"doi\":\"10.1109/FCCM.2014.16\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose automatic synthesis of application specific instruction set processors (ASIPs). We use pipeline execution of multi-op machine-instructions, e.g., *(reg1*reg2) = (*reg3) + (*reg4) (C-syntax) an instruction with three memory stages and two arithmetic stages pipeline. The problem is, for a given set of loops, to find a pipeline configuration and a multi-op ISA that maximizes the IPC (instructions per cycle) while minimizing the resource usage and the cost of interconnections to the register-file of the resulting CPU. The algorithm is based on finding an efficient cover of a large graph by a small set of convex sub-graphs gis that are consistent with a given structure of a pipeline. Unlike previous works, gis are not synthesized to circuits that are executed in a co-processor mode but rather both gis and the rest of the program are executed by the same set of multiop pipeline units. In this way we eliminate the overhead associated with the co-processor mode of regular ASIPs but maintain high values of IPC of these ASIPs. The main advantage of using pipeline execution of multi-op versus VLIW instructions is shown to be the cost of interconnections between the CPU's execution units and the register file. Thus, we devise a grading function that for each possible multi-op pipeline configuration balance between the expected IPC (Instructions Per Cycle) and the complexity of the interconnections. Using this grading function we show that in most cases the VLIW configuration is not always the best choice.\",\"PeriodicalId\":246162,\"journal\":{\"name\":\"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FCCM.2014.16\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FCCM.2014.16","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

我们提出了应用特定指令集处理器(asip)的自动合成。我们使用流水线执行多操作机器指令，例如，*(reg1*reg2) = (*reg3) + (*reg4) (c语法)一条指令具有三个内存阶段和两个算术阶段的流水线。问题是，对于给定的一组循环，找到一个管道配置和一个多操作ISA，使IPC(每周期指令)最大化，同时最小化资源使用和连接到最终CPU的寄存器文件的成本。该算法基于一组与给定管道结构一致的小凸子图来找到一个大图的有效覆盖。与以前的工作不同，gis不是合成成在协处理器模式下执行的电路，而是gis和程序的其余部分由同一组多操作管道单元执行。通过这种方式，我们消除了与常规api的协处理器模式相关的开销，但保持了这些api的高IPC值。与VLIW指令相比，使用管道执行多操作指令的主要优势在于CPU执行单元和寄存器文件之间的互连成本。因此，我们设计了一个分级函数，用于每个可能的多操作管道配置，在预期的IPC(每周期指令)和互连的复杂性之间取得平衡。使用这个分级函数，我们发现在大多数情况下，VLIW配置并不总是最好的选择。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Using Multi-op Instructions as a Way to Generate ASIPs with Optimized Pipeline Structure

We propose automatic synthesis of application specific instruction set processors (ASIPs). We use pipeline execution of multi-op machine-instructions, e.g., *(reg1*reg2) = (*reg3) + (*reg4) (C-syntax) an instruction with three memory stages and two arithmetic stages pipeline. The problem is, for a given set of loops, to find a pipeline configuration and a multi-op ISA that maximizes the IPC (instructions per cycle) while minimizing the resource usage and the cost of interconnections to the register-file of the resulting CPU. The algorithm is based on finding an efficient cover of a large graph by a small set of convex sub-graphs gis that are consistent with a given structure of a pipeline. Unlike previous works, gis are not synthesized to circuits that are executed in a co-processor mode but rather both gis and the rest of the program are executed by the same set of multiop pipeline units. In this way we eliminate the overhead associated with the co-processor mode of regular ASIPs but maintain high values of IPC of these ASIPs. The main advantage of using pipeline execution of multi-op versus VLIW instructions is shown to be the cost of interconnections between the CPU's execution units and the register file. Thus, we devise a grading function that for each possible multi-op pipeline configuration balance between the expected IPC (Instructions Per Cycle) and the complexity of the interconnections. Using this grading function we show that in most cases the VLIW configuration is not always the best choice.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines

自引率

0.00%

发文量

期刊最新文献

An Architectural Approach to Characterizing and Eliminating Sources of Inefficiency in a Soft Processor Design High-Throughput Fixed-Point Object Detection on FPGAs A Hierarchical Memory Architecture with NoC Support for MPSoC on FPGAs System-Level Retiming and Pipelining Harmonica: An FPGA-Based Data Parallel Soft Core