可编程嵌入式图像预处理加速器的数据流优化

2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig) Pub Date : 2016-11-01 DOI:10.1109/ReConFig.2016.7857161

T. Lieske, M. Reichenbach, Burkhard Ringlein, D. Fey

{"title":"可编程嵌入式图像预处理加速器的数据流优化","authors":"T. Lieske, M. Reichenbach, Burkhard Ringlein, D. Fey","doi":"10.1109/ReConFig.2016.7857161","DOIUrl":null,"url":null,"abstract":"Image processing is an omnipresent topic in current embedded industrial and consumer applications. Therefore, it is important to investigate processing architectures to extract design guidelines for developing efficient image processors. While SIMD (single instruction, multiple data) processor arrays were often proposed to accelerate image processing tasks, the internal architecture of processor elements (PEs) has not been optimized. Nevertheless, it is necessary to evaluate the optimal complexity of PEs to trade off performance and architectural overhead caused by complex processor architectures. Hence, the goal of this paper is to present a deep evaluation of finding the right architectural complexity of PEs in a processor field to meet given performance and logic area constraints. In order to determine the optimal complexity, the ADL (architecture description language) based FAUPU framework for image preprocessing architectures is utilized and after evaluation extended with pipelining support. The newly introduced pipelining features enable resource-efficient performance optimizations and are a significant improvement to the FAUPU ADL. Due to the fine-grained configurability of the FAUPU architecture, several design variants can be easily generated and it is possible to evaluate the effects of instruction set architecture (ISA) complexity and pipelining on design properties and how these features are best combined. Consequently, the FAUPU framework can be used to address the question, whether it is better to use many lightweight cores or do less but more complex cores yield a greater performance to area ratio? The results show that lightweight cores are best suited to achieve a targeted frame rate with the least resources. However, more complex cores on the other hand yield better performance to area ratios.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Dataflow optimization for programmable embedded image preprocessing accelerators\",\"authors\":\"T. Lieske, M. Reichenbach, Burkhard Ringlein, D. Fey\",\"doi\":\"10.1109/ReConFig.2016.7857161\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Image processing is an omnipresent topic in current embedded industrial and consumer applications. Therefore, it is important to investigate processing architectures to extract design guidelines for developing efficient image processors. While SIMD (single instruction, multiple data) processor arrays were often proposed to accelerate image processing tasks, the internal architecture of processor elements (PEs) has not been optimized. Nevertheless, it is necessary to evaluate the optimal complexity of PEs to trade off performance and architectural overhead caused by complex processor architectures. Hence, the goal of this paper is to present a deep evaluation of finding the right architectural complexity of PEs in a processor field to meet given performance and logic area constraints. In order to determine the optimal complexity, the ADL (architecture description language) based FAUPU framework for image preprocessing architectures is utilized and after evaluation extended with pipelining support. The newly introduced pipelining features enable resource-efficient performance optimizations and are a significant improvement to the FAUPU ADL. Due to the fine-grained configurability of the FAUPU architecture, several design variants can be easily generated and it is possible to evaluate the effects of instruction set architecture (ISA) complexity and pipelining on design properties and how these features are best combined. Consequently, the FAUPU framework can be used to address the question, whether it is better to use many lightweight cores or do less but more complex cores yield a greater performance to area ratio? The results show that lightweight cores are best suited to achieve a targeted frame rate with the least resources. However, more complex cores on the other hand yield better performance to area ratios.\",\"PeriodicalId\":431909,\"journal\":{\"name\":\"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ReConFig.2016.7857161\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ReConFig.2016.7857161","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

图像处理是当前嵌入式工业和消费应用中无处不在的主题。因此，研究处理架构以提取设计准则以开发高效的图像处理器是非常重要的。虽然SIMD(单指令多数据)处理器阵列经常被提出来加速图像处理任务，但处理器元件的内部架构尚未得到优化。然而，有必要评估pe的最佳复杂性，以权衡复杂处理器体系结构引起的性能和体系结构开销。因此，本文的目标是深入评估如何在处理器领域中找到合适的pe架构复杂性，以满足给定的性能和逻辑领域约束。为了确定最优复杂度，采用了基于体系结构描述语言(ADL)的图像预处理体系结构FAUPU框架，并在评估后扩展为支持流水线。新引入的流水线功能实现了资源效率的性能优化，是对FAUPU ADL的重大改进。由于FAUPU体系结构的细粒度可配置性，可以很容易地生成几个设计变体，并且可以评估指令集体系结构(ISA)复杂性和流水线对设计属性的影响，以及如何最好地组合这些特性。因此，FAUPU框架可以用来解决这样一个问题:是使用许多轻量级内核更好，还是使用更少但更复杂的内核产生更高的性能面积比?结果表明，轻量级内核最适合用最少的资源实现目标帧率。然而，另一方面，更复杂的核心产生更好的性能面积比。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Dataflow optimization for programmable embedded image preprocessing accelerators

Image processing is an omnipresent topic in current embedded industrial and consumer applications. Therefore, it is important to investigate processing architectures to extract design guidelines for developing efficient image processors. While SIMD (single instruction, multiple data) processor arrays were often proposed to accelerate image processing tasks, the internal architecture of processor elements (PEs) has not been optimized. Nevertheless, it is necessary to evaluate the optimal complexity of PEs to trade off performance and architectural overhead caused by complex processor architectures. Hence, the goal of this paper is to present a deep evaluation of finding the right architectural complexity of PEs in a processor field to meet given performance and logic area constraints. In order to determine the optimal complexity, the ADL (architecture description language) based FAUPU framework for image preprocessing architectures is utilized and after evaluation extended with pipelining support. The newly introduced pipelining features enable resource-efficient performance optimizations and are a significant improvement to the FAUPU ADL. Due to the fine-grained configurability of the FAUPU architecture, several design variants can be easily generated and it is possible to evaluate the effects of instruction set architecture (ISA) complexity and pipelining on design properties and how these features are best combined. Consequently, the FAUPU framework can be used to address the question, whether it is better to use many lightweight cores or do less but more complex cores yield a greater performance to area ratio? The results show that lightweight cores are best suited to achieve a targeted frame rate with the least resources. However, more complex cores on the other hand yield better performance to area ratios.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)

自引率

0.00%

发文量