Program Balancing in Compilation for Buffered Hybrid Dataflow Processors

2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC) Pub Date : 2023-06-01 DOI:10.1109/COMPSAC57700.2023.00018

Anoop Bhagyanath, K. Schneider

{"title":"Program Balancing in Compilation for Buffered Hybrid Dataflow Processors","authors":"Anoop Bhagyanath, K. Schneider","doi":"10.1109/COMPSAC57700.2023.00018","DOIUrl":null,"url":null,"abstract":"In traditional von Neumann processors, the central register file is an inherent limiting factor in exploiting the instruction-level parallelism (ILP) of programs. To alleviate this problem, many processors follow a hybrid von Neumann/dataflow computing model in which specific instruction sequences are executed in dataflow order by communicating intermediate values directly from producer processing units (PUs) to consumer PUs without using a central register file. However, the intermediate values often reside in local registers of the PUs, which requires a synchronization of the data transports that still limits the exploitation of the ILP.To avoid the use of a central register file and the need for any synchronization between PUs, some newer architectures suggest first-in-first-out (FIFO) buffers instead of local registers at the input and output ports of the PUs. Since values are produced and consumed, and are thus never overwritten (as in registers), the compiler must determine the required number of copies of each value. Furthermore, it is necessary to control the number of copies of values to develop buffer size aware compilation methods. However, the number of variable uses in a sequential program may depend on the future execution. This paper presents transformations for ‘balancing’ a given program, i.e., transforming the program so that for all points in the program, the number of future uses of all variables can be accurately determined in order to allocate the required buffer sizes in the later compilation phases. The classical space-time trade-off is demonstrated by the experimental results which show an improvement of the processor performance with increasing buffer sizes and vice versa. More importantly, the experimental results demonstrate the potential of buffered hybrid dataflow architectures for a scalable use of ILP.","PeriodicalId":296288,"journal":{"name":"2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMPSAC57700.2023.00018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

In traditional von Neumann processors, the central register file is an inherent limiting factor in exploiting the instruction-level parallelism (ILP) of programs. To alleviate this problem, many processors follow a hybrid von Neumann/dataflow computing model in which specific instruction sequences are executed in dataflow order by communicating intermediate values directly from producer processing units (PUs) to consumer PUs without using a central register file. However, the intermediate values often reside in local registers of the PUs, which requires a synchronization of the data transports that still limits the exploitation of the ILP.To avoid the use of a central register file and the need for any synchronization between PUs, some newer architectures suggest first-in-first-out (FIFO) buffers instead of local registers at the input and output ports of the PUs. Since values are produced and consumed, and are thus never overwritten (as in registers), the compiler must determine the required number of copies of each value. Furthermore, it is necessary to control the number of copies of values to develop buffer size aware compilation methods. However, the number of variable uses in a sequential program may depend on the future execution. This paper presents transformations for ‘balancing’ a given program, i.e., transforming the program so that for all points in the program, the number of future uses of all variables can be accurately determined in order to allocate the required buffer sizes in the later compilation phases. The classical space-time trade-off is demonstrated by the experimental results which show an improvement of the processor performance with increasing buffer sizes and vice versa. More importantly, the experimental results demonstrate the potential of buffered hybrid dataflow architectures for a scalable use of ILP.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

缓冲混合数据流处理器编译中的程序平衡

在传统的冯·诺依曼处理器中，中央寄存器文件是限制程序实现指令级并行性(ILP)的固有因素。为了缓解这个问题，许多处理器遵循冯·诺伊曼/数据流混合计算模型，在该模型中，特定指令序列按照数据流顺序执行，通过直接从生产者处理单元(pu)传递中间值到消费者处理器，而不使用中央寄存器文件。然而，中间值通常驻留在pu的本地寄存器中，这需要数据传输的同步，这仍然限制了ILP的利用。为了避免使用中央寄存器文件和需要在pu之间进行任何同步，一些较新的体系结构建议在pu的输入和输出端口使用先进先出(FIFO)缓冲区而不是本地寄存器。由于值是产生和消耗的，因此永远不会被覆盖(如在寄存器中)，编译器必须确定每个值所需的副本数量。此外，有必要控制值的副本数量，以开发缓冲区大小感知的编译方法。然而，顺序程序中变量的使用数量可能取决于以后的执行。本文介绍了“平衡”给定程序的转换，即转换程序，以便对于程序中的所有点，所有变量的未来使用次数可以准确地确定，以便在稍后的编译阶段分配所需的缓冲区大小。实验结果证明了经典的时空权衡，表明处理器性能随着缓冲区大小的增加而提高，反之亦然。更重要的是，实验结果证明了缓冲混合数据流架构在可扩展使用ILP方面的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC)

自引率

0.00%

发文量