Peter Koek, Stefan J. Geuns, J. Hausmans, H. Corporaal, M. Bekooij
{"title":"CSDFa: A Model for Exploiting the Trade-Off between Data and Pipeline Parallelism","authors":"Peter Koek, Stefan J. Geuns, J. Hausmans, H. Corporaal, M. Bekooij","doi":"10.1145/2906363.2906364","DOIUrl":null,"url":null,"abstract":"Real-time stream processing applications, such as Software Defined Radio applications, are often executed concurrently on multiprocessor systems. A unified data flow model and analysis method have been proposed that can be used to simultaneously determine the amount of pipeline and coarse-grained data parallelism required to meet the temporal constraints of such applications. However, this unified model is only defined for Synchronous Data Flow (SDF) graphs. Defining a unified model for a more expressive model such as Cyclo-Static Data Flow (CSDF) is not possible, because auto-concurrency can cause a time-dependent order of tokens and dependencies. This paper introduces the Cyclo-Static Data Flow with Auto-concurrency (CSDFa) model. In CSDFa, tokens have indices and the consumption order of tokens is static and time-independent. This allows expressing and trading off pipeline and coarse-grained data parallelism in a single, unified model. Furthermore, we introduce a new type of circular buffer that implements the same static order as is used by the CSDFa model. The overhead of operations on this buffer is independent of the amount of auto-concurrency, which corresponds to the constant firing durations in the CSDFa model. Exploiting the trade-off between data and pipeline parallelism with the CSDFa model is demonstrated with a part of a FMCW radar processing pipeline. We show that the CSDFa model enables optimizing the balance between processing units and memory, resulting in a significant reduction of silicon area. Additionally, it is shown that reducing the maximum allowed latency increases the minimum required amount of data parallelism by up to a factor of 16.","PeriodicalId":344390,"journal":{"name":"Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems","volume":"159 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2906363.2906364","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Real-time stream processing applications, such as Software Defined Radio applications, are often executed concurrently on multiprocessor systems. A unified data flow model and analysis method have been proposed that can be used to simultaneously determine the amount of pipeline and coarse-grained data parallelism required to meet the temporal constraints of such applications. However, this unified model is only defined for Synchronous Data Flow (SDF) graphs. Defining a unified model for a more expressive model such as Cyclo-Static Data Flow (CSDF) is not possible, because auto-concurrency can cause a time-dependent order of tokens and dependencies. This paper introduces the Cyclo-Static Data Flow with Auto-concurrency (CSDFa) model. In CSDFa, tokens have indices and the consumption order of tokens is static and time-independent. This allows expressing and trading off pipeline and coarse-grained data parallelism in a single, unified model. Furthermore, we introduce a new type of circular buffer that implements the same static order as is used by the CSDFa model. The overhead of operations on this buffer is independent of the amount of auto-concurrency, which corresponds to the constant firing durations in the CSDFa model. Exploiting the trade-off between data and pipeline parallelism with the CSDFa model is demonstrated with a part of a FMCW radar processing pipeline. We show that the CSDFa model enables optimizing the balance between processing units and memory, resulting in a significant reduction of silicon area. Additionally, it is shown that reducing the maximum allowed latency increases the minimum required amount of data parallelism by up to a factor of 16.