T. Lieske, M. Reichenbach, Burkhard Ringlein, D. Fey
{"title":"可编程嵌入式图像预处理加速器的数据流优化","authors":"T. Lieske, M. Reichenbach, Burkhard Ringlein, D. Fey","doi":"10.1109/ReConFig.2016.7857161","DOIUrl":null,"url":null,"abstract":"Image processing is an omnipresent topic in current embedded industrial and consumer applications. Therefore, it is important to investigate processing architectures to extract design guidelines for developing efficient image processors. While SIMD (single instruction, multiple data) processor arrays were often proposed to accelerate image processing tasks, the internal architecture of processor elements (PEs) has not been optimized. Nevertheless, it is necessary to evaluate the optimal complexity of PEs to trade off performance and architectural overhead caused by complex processor architectures. Hence, the goal of this paper is to present a deep evaluation of finding the right architectural complexity of PEs in a processor field to meet given performance and logic area constraints. In order to determine the optimal complexity, the ADL (architecture description language) based FAUPU framework for image preprocessing architectures is utilized and after evaluation extended with pipelining support. The newly introduced pipelining features enable resource-efficient performance optimizations and are a significant improvement to the FAUPU ADL. Due to the fine-grained configurability of the FAUPU architecture, several design variants can be easily generated and it is possible to evaluate the effects of instruction set architecture (ISA) complexity and pipelining on design properties and how these features are best combined. Consequently, the FAUPU framework can be used to address the question, whether it is better to use many lightweight cores or do less but more complex cores yield a greater performance to area ratio? The results show that lightweight cores are best suited to achieve a targeted frame rate with the least resources. However, more complex cores on the other hand yield better performance to area ratios.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Dataflow optimization for programmable embedded image preprocessing accelerators\",\"authors\":\"T. Lieske, M. Reichenbach, Burkhard Ringlein, D. Fey\",\"doi\":\"10.1109/ReConFig.2016.7857161\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Image processing is an omnipresent topic in current embedded industrial and consumer applications. Therefore, it is important to investigate processing architectures to extract design guidelines for developing efficient image processors. While SIMD (single instruction, multiple data) processor arrays were often proposed to accelerate image processing tasks, the internal architecture of processor elements (PEs) has not been optimized. Nevertheless, it is necessary to evaluate the optimal complexity of PEs to trade off performance and architectural overhead caused by complex processor architectures. Hence, the goal of this paper is to present a deep evaluation of finding the right architectural complexity of PEs in a processor field to meet given performance and logic area constraints. In order to determine the optimal complexity, the ADL (architecture description language) based FAUPU framework for image preprocessing architectures is utilized and after evaluation extended with pipelining support. The newly introduced pipelining features enable resource-efficient performance optimizations and are a significant improvement to the FAUPU ADL. Due to the fine-grained configurability of the FAUPU architecture, several design variants can be easily generated and it is possible to evaluate the effects of instruction set architecture (ISA) complexity and pipelining on design properties and how these features are best combined. Consequently, the FAUPU framework can be used to address the question, whether it is better to use many lightweight cores or do less but more complex cores yield a greater performance to area ratio? The results show that lightweight cores are best suited to achieve a targeted frame rate with the least resources. However, more complex cores on the other hand yield better performance to area ratios.\",\"PeriodicalId\":431909,\"journal\":{\"name\":\"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ReConFig.2016.7857161\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ReConFig.2016.7857161","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Dataflow optimization for programmable embedded image preprocessing accelerators
Image processing is an omnipresent topic in current embedded industrial and consumer applications. Therefore, it is important to investigate processing architectures to extract design guidelines for developing efficient image processors. While SIMD (single instruction, multiple data) processor arrays were often proposed to accelerate image processing tasks, the internal architecture of processor elements (PEs) has not been optimized. Nevertheless, it is necessary to evaluate the optimal complexity of PEs to trade off performance and architectural overhead caused by complex processor architectures. Hence, the goal of this paper is to present a deep evaluation of finding the right architectural complexity of PEs in a processor field to meet given performance and logic area constraints. In order to determine the optimal complexity, the ADL (architecture description language) based FAUPU framework for image preprocessing architectures is utilized and after evaluation extended with pipelining support. The newly introduced pipelining features enable resource-efficient performance optimizations and are a significant improvement to the FAUPU ADL. Due to the fine-grained configurability of the FAUPU architecture, several design variants can be easily generated and it is possible to evaluate the effects of instruction set architecture (ISA) complexity and pipelining on design properties and how these features are best combined. Consequently, the FAUPU framework can be used to address the question, whether it is better to use many lightweight cores or do less but more complex cores yield a greater performance to area ratio? The results show that lightweight cores are best suited to achieve a targeted frame rate with the least resources. However, more complex cores on the other hand yield better performance to area ratios.