{"title":"Program Balancing in Compilation for Buffered Hybrid Dataflow Processors","authors":"Anoop Bhagyanath, K. Schneider","doi":"10.1109/COMPSAC57700.2023.00018","DOIUrl":null,"url":null,"abstract":"In traditional von Neumann processors, the central register file is an inherent limiting factor in exploiting the instruction-level parallelism (ILP) of programs. To alleviate this problem, many processors follow a hybrid von Neumann/dataflow computing model in which specific instruction sequences are executed in dataflow order by communicating intermediate values directly from producer processing units (PUs) to consumer PUs without using a central register file. However, the intermediate values often reside in local registers of the PUs, which requires a synchronization of the data transports that still limits the exploitation of the ILP.To avoid the use of a central register file and the need for any synchronization between PUs, some newer architectures suggest first-in-first-out (FIFO) buffers instead of local registers at the input and output ports of the PUs. Since values are produced and consumed, and are thus never overwritten (as in registers), the compiler must determine the required number of copies of each value. Furthermore, it is necessary to control the number of copies of values to develop buffer size aware compilation methods. However, the number of variable uses in a sequential program may depend on the future execution. This paper presents transformations for ‘balancing’ a given program, i.e., transforming the program so that for all points in the program, the number of future uses of all variables can be accurately determined in order to allocate the required buffer sizes in the later compilation phases. The classical space-time trade-off is demonstrated by the experimental results which show an improvement of the processor performance with increasing buffer sizes and vice versa. More importantly, the experimental results demonstrate the potential of buffered hybrid dataflow architectures for a scalable use of ILP.","PeriodicalId":296288,"journal":{"name":"2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMPSAC57700.2023.00018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
In traditional von Neumann processors, the central register file is an inherent limiting factor in exploiting the instruction-level parallelism (ILP) of programs. To alleviate this problem, many processors follow a hybrid von Neumann/dataflow computing model in which specific instruction sequences are executed in dataflow order by communicating intermediate values directly from producer processing units (PUs) to consumer PUs without using a central register file. However, the intermediate values often reside in local registers of the PUs, which requires a synchronization of the data transports that still limits the exploitation of the ILP.To avoid the use of a central register file and the need for any synchronization between PUs, some newer architectures suggest first-in-first-out (FIFO) buffers instead of local registers at the input and output ports of the PUs. Since values are produced and consumed, and are thus never overwritten (as in registers), the compiler must determine the required number of copies of each value. Furthermore, it is necessary to control the number of copies of values to develop buffer size aware compilation methods. However, the number of variable uses in a sequential program may depend on the future execution. This paper presents transformations for ‘balancing’ a given program, i.e., transforming the program so that for all points in the program, the number of future uses of all variables can be accurately determined in order to allocate the required buffer sizes in the later compilation phases. The classical space-time trade-off is demonstrated by the experimental results which show an improvement of the processor performance with increasing buffer sizes and vice versa. More importantly, the experimental results demonstrate the potential of buffered hybrid dataflow architectures for a scalable use of ILP.