首页 > 最新文献

2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)最新文献

英文 中文
Optimal processor interface for CGRA-based accelerators implemented on FPGAs fpga上实现基于cgra的加速器的最优处理器接口
Pub Date : 2016-11-30 DOI: 10.1109/ReConFig.2016.7857178
L. Jung, C. Hochberger
Coarse Grained Reconfigurable Arrays (CGRA) can be used to substantially boost the processing power of embedded applications. They can be included in typical system-on-chip architectures to execute computationally demanding parts of the application. Delegating execution to the CGRA requires the exchange of live in/out variables between the processor core and the CGRA. In this paper we search the optimal interface between the surrounding system and the CGRA with respect to impact on the operating frequency, the used resources and the runtime overhead.
粗粒度可重构阵列(CGRA)可用于大幅提高嵌入式应用程序的处理能力。它们可以包含在典型的片上系统架构中,以执行应用程序中计算要求很高的部分。将执行委托给CGRA需要在处理器核心和CGRA之间交换活的in/out变量。本文从对运行频率、资源使用和运行时开销的影响等方面研究了周边系统与CGRA之间的最优接口。
{"title":"Optimal processor interface for CGRA-based accelerators implemented on FPGAs","authors":"L. Jung, C. Hochberger","doi":"10.1109/ReConFig.2016.7857178","DOIUrl":"https://doi.org/10.1109/ReConFig.2016.7857178","url":null,"abstract":"Coarse Grained Reconfigurable Arrays (CGRA) can be used to substantially boost the processing power of embedded applications. They can be included in typical system-on-chip architectures to execute computationally demanding parts of the application. Delegating execution to the CGRA requires the exchange of live in/out variables between the processor core and the CGRA. In this paper we search the optimal interface between the surrounding system and the CGRA with respect to impact on the operating frequency, the used resources and the runtime overhead.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133914186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Zynq-based testbed for the experimental benchmarking of algorithms competing in cryptographic contests 一个基于zynq的测试平台,用于在密码学竞赛中竞争算法的实验基准测试
Pub Date : 2016-11-01 DOI: 10.1109/ReConFig.2016.7857148
Farnoud Farahmand, Ekawat Homsirikamol, K. Gaj
Hardware performance evaluation of candidates competing in cryptographic contests, such as SHA-3 and CAE-SAR, is very important for ranking their suitability for standardization. One of the most essential performance metrics is the throughput, which highly depends on the algorithm, hardware implementation architecture, coding style, and options of tools. The maximum throughput is calculated based on the maximum clock frequency supported by each algorithm. A common way of determining the maximum clock frequency is static timing analysis provided by the CAD toolsets such as Xilinx ISE, Xilinx Vivado, and Altera Quartus Prime. In this project, we have developed a universal testbed, which is capable of measuring the maximum clock frequency experimentally, using a prototyping board. We are targeting cryptographic hardware cores, such as implementations of SHA-3 candidates. Our testbed is designed using a Zynq platform and takes advantage of software/hardware co-design. It supports two separate clock domains, one for a hardware module under test, and the other for the communication between an ARM core and hardware accelerator. We measured the maximum clock frequency and the execution time of 12 Round 2 SHA-3 candidates experimentally on ZedBoard and compared the results with the frequencies reported by Xilinx Vivado. Our results indicate that depending on the characteristics of each algorithm, we may achieve either much higher or the same experimental frequency than the results reported by the tools using static timing analysis. This behavior is then further analyzed, and the relevant conclusions drawn.
对参加加密竞赛的候选算法(如SHA-3和CAE-SAR)进行硬件性能评估,对于确定它们是否适合标准化非常重要。最重要的性能指标之一是吞吐量,它高度依赖于算法、硬件实现体系结构、编码风格和工具选项。最大吞吐量是根据各算法支持的最大时钟频率计算得出的。确定最大时钟频率的常用方法是由CAD工具集(如Xilinx ISE, Xilinx Vivado和Altera Quartus Prime)提供的静态时序分析。在这个项目中,我们开发了一个通用的测试平台,它能够通过实验测量最大时钟频率,使用原型板。我们的目标是加密硬件核心,例如SHA-3候选实现。我们的测试平台使用Zynq平台设计,并利用软件/硬件协同设计的优势。它支持两个独立的时钟域,一个用于被测硬件模块,另一个用于ARM内核和硬件加速器之间的通信。我们在ZedBoard上实验测量了12个Round 2 SHA-3候选算法的最大时钟频率和执行时间,并将结果与Xilinx Vivado报告的频率进行了比较。我们的结果表明,根据每个算法的特性,我们可以获得比使用静态时序分析的工具报告的结果高得多或相同的实验频率。然后进一步分析这种行为,并得出相关结论。
{"title":"A Zynq-based testbed for the experimental benchmarking of algorithms competing in cryptographic contests","authors":"Farnoud Farahmand, Ekawat Homsirikamol, K. Gaj","doi":"10.1109/ReConFig.2016.7857148","DOIUrl":"https://doi.org/10.1109/ReConFig.2016.7857148","url":null,"abstract":"Hardware performance evaluation of candidates competing in cryptographic contests, such as SHA-3 and CAE-SAR, is very important for ranking their suitability for standardization. One of the most essential performance metrics is the throughput, which highly depends on the algorithm, hardware implementation architecture, coding style, and options of tools. The maximum throughput is calculated based on the maximum clock frequency supported by each algorithm. A common way of determining the maximum clock frequency is static timing analysis provided by the CAD toolsets such as Xilinx ISE, Xilinx Vivado, and Altera Quartus Prime. In this project, we have developed a universal testbed, which is capable of measuring the maximum clock frequency experimentally, using a prototyping board. We are targeting cryptographic hardware cores, such as implementations of SHA-3 candidates. Our testbed is designed using a Zynq platform and takes advantage of software/hardware co-design. It supports two separate clock domains, one for a hardware module under test, and the other for the communication between an ARM core and hardware accelerator. We measured the maximum clock frequency and the execution time of 12 Round 2 SHA-3 candidates experimentally on ZedBoard and compared the results with the frequencies reported by Xilinx Vivado. Our results indicate that depending on the characteristics of each algorithm, we may achieve either much higher or the same experimental frequency than the results reported by the tools using static timing analysis. This behavior is then further analyzed, and the relevant conclusions drawn.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126156050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A configurable architecture for the generalized hough transform applied to the analysis of huge aerial images and to traffic sign detection 一种可配置的广义霍夫变换体系结构,应用于大型航空图像分析和交通标志检测
Pub Date : 2016-11-01 DOI: 10.1109/ReConFig.2016.7857143
G. Kiefer, Matthias Vahl, Julian Sarcher, M. Schaeferling
Object recognition in huge image data sets or in live camera images at interactive frame rates is a very demanding task, especially within embedded systems. The recognition task includes the localization of a reference object and its rotation and scaling in a search image. The Generalized Hough Transform (GHT) is known as a powerful and robust technique to support this task by transforming the search image into a 4D parameter space. However, the GHT itself is very complex and demanding towards computational power and memory consumption. This paper presents a novel hardware architecture to perform a complete 4D GHT at interactive frame rates in an FPGA. The architecture is configurable in order to allow a trade-off between performance, accuracy and hardware usage. The proposed architecture has been implemented in a low-cost Zynq-7000 FPGA and successfully evaluated in two practical applications, namely groyne detection in aerial images and traffic sign detection.
在巨大的图像数据集或实时摄像机图像中以交互帧率进行对象识别是一项非常苛刻的任务,特别是在嵌入式系统中。识别任务包括参考对象的定位及其在搜索图像中的旋转和缩放。广义霍夫变换(GHT)是一种强大的鲁棒技术,通过将搜索图像转换为四维参数空间来支持这一任务。然而,GHT本身非常复杂,对计算能力和内存消耗要求很高。本文提出了一种新颖的硬件架构,可以在FPGA中以交互帧率执行完整的4D GHT。该架构是可配置的,以便在性能、精度和硬件使用之间进行权衡。该架构已在低成本Zynq-7000 FPGA上实现,并在航空图像中的groyne检测和交通标志检测两个实际应用中成功进行了评估。
{"title":"A configurable architecture for the generalized hough transform applied to the analysis of huge aerial images and to traffic sign detection","authors":"G. Kiefer, Matthias Vahl, Julian Sarcher, M. Schaeferling","doi":"10.1109/ReConFig.2016.7857143","DOIUrl":"https://doi.org/10.1109/ReConFig.2016.7857143","url":null,"abstract":"Object recognition in huge image data sets or in live camera images at interactive frame rates is a very demanding task, especially within embedded systems. The recognition task includes the localization of a reference object and its rotation and scaling in a search image. The Generalized Hough Transform (GHT) is known as a powerful and robust technique to support this task by transforming the search image into a 4D parameter space. However, the GHT itself is very complex and demanding towards computational power and memory consumption. This paper presents a novel hardware architecture to perform a complete 4D GHT at interactive frame rates in an FPGA. The architecture is configurable in order to allow a trade-off between performance, accuracy and hardware usage. The proposed architecture has been implemented in a low-cost Zynq-7000 FPGA and successfully evaluated in two practical applications, namely groyne detection in aerial images and traffic sign detection.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122062439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Hardware-accelerated pose estimation for embedded systems using Vivado HLS 基于Vivado HLS的嵌入式系统硬件加速姿态估计
Pub Date : 2016-11-01 DOI: 10.1109/ReConFig.2016.7857173
J. Joseph, Tobias Winker, Kristian Ehlers, Christopher Blochwitz, Thilo Pionteck
The focus of this work is to facilitate pose estimation and, thus, gesture recognition for embedded systems, although these are tasks with high computational performance requirements. Therefore, an existing pose estimation algorithm is optimized for Xilinx High Level Synthesis (HLS). The resulting hardware acceleration cores are compared for different optimizations and, finally, we propose a hardware/software system design for a Xilinx Zynq Zedboard. Using this method, we achieve a speedup of 1.6 in comparison to a software solution on the ARM processor and, thus, facilitate hand tracking for embedded systems with low power consumption.
这项工作的重点是促进姿态估计,从而促进嵌入式系统的手势识别,尽管这些任务具有很高的计算性能要求。因此,针对Xilinx High Level Synthesis (HLS),对现有的姿态估计算法进行了优化。最后,我们提出了Xilinx Zynq Zedboard的硬件/软件系统设计方案。使用这种方法,与ARM处理器上的软件解决方案相比,我们实现了1.6的加速,从而促进了低功耗嵌入式系统的手部跟踪。
{"title":"Hardware-accelerated pose estimation for embedded systems using Vivado HLS","authors":"J. Joseph, Tobias Winker, Kristian Ehlers, Christopher Blochwitz, Thilo Pionteck","doi":"10.1109/ReConFig.2016.7857173","DOIUrl":"https://doi.org/10.1109/ReConFig.2016.7857173","url":null,"abstract":"The focus of this work is to facilitate pose estimation and, thus, gesture recognition for embedded systems, although these are tasks with high computational performance requirements. Therefore, an existing pose estimation algorithm is optimized for Xilinx High Level Synthesis (HLS). The resulting hardware acceleration cores are compared for different optimizations and, finally, we propose a hardware/software system design for a Xilinx Zynq Zedboard. Using this method, we achieve a speedup of 1.6 in comparison to a software solution on the ARM processor and, thus, facilitate hand tracking for embedded systems with low power consumption.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129245338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
ReOrder: Runtime datapath generation for high-throughput multi-stream processing ReOrder:用于高吞吐量多流处理的运行时数据路径生成
Pub Date : 2016-11-01 DOI: 10.1109/ReConFig.2016.7857185
Andreas Becher, S. Wildermann, Moritz Mühlenthaler, J. Teich
Modern Programmable FPGA-based SoCs that tightly couple CPU and programmable logic enable the acceleration of stream processing in hardware on-demand by making use of the available high input and output throughputs and the reconfigurability both in software and hardware. In this paper, we present the concept and implementation of a hardware unit called ReOrder that serves as a converter for multiple parallel streams of data read from and written to an accelerator. Our technique and programmable design allows flexible data access and connects different stream processing accelerators independent of the host data layout. In order to achieve a high accelerator throughput, it is necessary to determine an optimized datapath according to the accelerator's internal schedule of input and output data. We are concerned with an online setting, in which either the data layout (e.g., in the case of modern database systems) or the accelerator operational mode change dynamically. Therefore, an algorithm is required which can be used at “runtime” in order to maintain an optimized datapath configuration. We propose an efficient heuristic algorithm and corresponding FPGA design that is able to translate arbitrary (multi-source) data layouts of the connected host system to generate any specified data stream of the accelerator at runtime within ms.
现代基于可编程fpga的soc紧密耦合CPU和可编程逻辑,通过利用可用的高输入和输出吞吐量以及软件和硬件中的可重构性,可以按需加速硬件中的流处理。在本文中,我们提出了一个叫做ReOrder的硬件单元的概念和实现,它可以作为多个并行数据流从加速器读取和写入的转换器。我们的技术和可编程设计允许灵活的数据访问和连接不同的流处理加速器独立于主机数据布局。为了实现加速器的高吞吐量,需要根据加速器内部输入和输出数据的调度来确定优化的数据路径。我们关注的是在线设置,其中数据布局(例如,在现代数据库系统的情况下)或加速器操作模式都是动态变化的。因此,需要一种可以在“运行时”使用的算法,以维护优化的数据路径配置。我们提出了一种有效的启发式算法和相应的FPGA设计,能够转换连接的主机系统的任意(多源)数据布局,以在毫秒内生成加速器的任何指定数据流。
{"title":"ReOrder: Runtime datapath generation for high-throughput multi-stream processing","authors":"Andreas Becher, S. Wildermann, Moritz Mühlenthaler, J. Teich","doi":"10.1109/ReConFig.2016.7857185","DOIUrl":"https://doi.org/10.1109/ReConFig.2016.7857185","url":null,"abstract":"Modern Programmable FPGA-based SoCs that tightly couple CPU and programmable logic enable the acceleration of stream processing in hardware on-demand by making use of the available high input and output throughputs and the reconfigurability both in software and hardware. In this paper, we present the concept and implementation of a hardware unit called ReOrder that serves as a converter for multiple parallel streams of data read from and written to an accelerator. Our technique and programmable design allows flexible data access and connects different stream processing accelerators independent of the host data layout. In order to achieve a high accelerator throughput, it is necessary to determine an optimized datapath according to the accelerator's internal schedule of input and output data. We are concerned with an online setting, in which either the data layout (e.g., in the case of modern database systems) or the accelerator operational mode change dynamically. Therefore, an algorithm is required which can be used at “runtime” in order to maintain an optimized datapath configuration. We propose an efficient heuristic algorithm and corresponding FPGA design that is able to translate arbitrary (multi-source) data layouts of the connected host system to generate any specified data stream of the accelerator at runtime within ms.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130379407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Coarse grain reconfiguration: Power estimation and management flow for hybrid gated systems 粗粒重构:混合门控系统的功率估计和管理流程
Pub Date : 2016-11-01 DOI: 10.1109/ReConFig.2016.7857160
Tiziana Fanni, L. Raffo
This work presents an automatic power estimation and implementation flow for coarse-grained reconfigurable systems, capable of guiding designers towards the optimal implementation of power-efficient systems. The entire flow is assessed over the reconfigurable computing core of a dedicated image processing accelerator, targeting an ASIC 45 nm technology.
这项工作提出了一种粗粒度可重构系统的自动功率估计和实现流程,能够指导设计人员实现节能系统的最佳实现。整个流程是在专用图像处理加速器的可重构计算核心上进行评估的,目标是ASIC 45纳米技术。
{"title":"Coarse grain reconfiguration: Power estimation and management flow for hybrid gated systems","authors":"Tiziana Fanni, L. Raffo","doi":"10.1109/ReConFig.2016.7857160","DOIUrl":"https://doi.org/10.1109/ReConFig.2016.7857160","url":null,"abstract":"This work presents an automatic power estimation and implementation flow for coarse-grained reconfigurable systems, capable of guiding designers towards the optimal implementation of power-efficient systems. The entire flow is assessed over the reconfigurable computing core of a dedicated image processing accelerator, targeting an ASIC 45 nm technology.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134558039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-throughput cellular imaging with high-speed asymmetric-detection time-stretch optical microscopy under FPGA platform FPGA平台下高速非对称检测时间拉伸光学显微镜的高通量细胞成像
Pub Date : 2016-11-01 DOI: 10.1109/RECONFIG.2016.7857175
Ho-Cheung Ng, Maolin Wang, Bob M. F. Chung, B. S. C. Varma, M. Jaiswal, S. M. H. Ho, K. Tsia, H. Shum, Hayden Kwok-Hay So
Asymmetric-Detection Time-Stretch Optical Microscopy (ATOM) is a recently emerged technology that provides ultra-fast cell imaging with a frame rate up to MHz — orders-of-magnitude higher than any classical imaging systems. However, existing measuring instruments are unable to fully exploit the capability of ATOM. For example, the volume of imaging data-set of ATOM quickly increases beyond the capacity of available onboard buffer of a modern high-speed oscilloscope. This paper presents an open source, FPGA-based solution which serves as a dual role of collecting low-level signals from ATOM frontend as well as processing and transferring data to backing store. Optical signals are sampled by a high-speed analog-to-digital converter and the resulting values are collected by an FPGA. The quantized values received are then further processed and divided into four segments for subsequent data transfer with 10 Gbit Ethernet. Four computing units are attached to these channels with direct connection in order to reliably receive the data for post-processing. Experiments show that, with decent quality images for single-cell analysis, the proposed system can store 10x more dataset than existing high-end oscilloscope. With 8x decrease in equipment cost, the proposed FPGA-based system will definitely be beneficial for many bio imaging applications with ATOM technology such as rare cancer cell imaging and identification.
非对称检测时间-拉伸光学显微镜(ATOM)是最近出现的一种技术,它提供超高速细胞成像,帧速率高达MHz -比任何经典成像系统都要高几个数量级。然而,现有的测量仪器无法充分利用ATOM的能力。例如,ATOM的成像数据集的容量迅速增加,超过了现代高速示波器可用板载缓冲器的容量。本文提出了一种开源的、基于fpga的解决方案,它既可以从ATOM前端收集低级信号,又可以处理和传输数据到后台存储。光信号由高速模数转换器采样,结果值由FPGA采集。接收到的量化值将被进一步处理并分成4个段,用于随后的10gb以太网数据传输。为了可靠地接收数据进行后处理,四个计算单元与这些通道直接连接。实验表明,对于单细胞分析的高质量图像,该系统可以比现有的高端示波器多存储10倍的数据集。由于设备成本降低了8倍,所提出的基于fpga的系统将绝对有利于许多采用ATOM技术的生物成像应用,例如罕见的癌细胞成像和识别。
{"title":"High-throughput cellular imaging with high-speed asymmetric-detection time-stretch optical microscopy under FPGA platform","authors":"Ho-Cheung Ng, Maolin Wang, Bob M. F. Chung, B. S. C. Varma, M. Jaiswal, S. M. H. Ho, K. Tsia, H. Shum, Hayden Kwok-Hay So","doi":"10.1109/RECONFIG.2016.7857175","DOIUrl":"https://doi.org/10.1109/RECONFIG.2016.7857175","url":null,"abstract":"Asymmetric-Detection Time-Stretch Optical Microscopy (ATOM) is a recently emerged technology that provides ultra-fast cell imaging with a frame rate up to MHz — orders-of-magnitude higher than any classical imaging systems. However, existing measuring instruments are unable to fully exploit the capability of ATOM. For example, the volume of imaging data-set of ATOM quickly increases beyond the capacity of available onboard buffer of a modern high-speed oscilloscope. This paper presents an open source, FPGA-based solution which serves as a dual role of collecting low-level signals from ATOM frontend as well as processing and transferring data to backing store. Optical signals are sampled by a high-speed analog-to-digital converter and the resulting values are collected by an FPGA. The quantized values received are then further processed and divided into four segments for subsequent data transfer with 10 Gbit Ethernet. Four computing units are attached to these channels with direct connection in order to reliably receive the data for post-processing. Experiments show that, with decent quality images for single-cell analysis, the proposed system can store 10x more dataset than existing high-end oscilloscope. With 8x decrease in equipment cost, the proposed FPGA-based system will definitely be beneficial for many bio imaging applications with ATOM technology such as rare cancer cell imaging and identification.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131204906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Adaptive single-event effect mitigation for dependable processing systems 可靠处理系统的自适应单事件效应缓解
Pub Date : 2016-11-01 DOI: 10.1109/ReConFig.2016.7857149
R. Glein, F. Rittner, A. Heuberger
For application in radiation-harsh environments, designers apply mitigation techniques according the worst-case (solar) condition to achieve a dependable design. This results in a resource overhead, which is most of the time unnecessary. To overcome this problem, adaptive mitigation techniques are used. This technique is a trade-off between two parameters, such as performance and reliability, according to different operating modes by toggling between these modes. In this context, we propose an Adaptive Single-Event Effect Mitigation (ASEEM) method. It is based on adaptive reconfiguration of an FPGA between two modes, specifically a performance mode and a high reliability mode. The performance mode offers high processing power and thus higher signal processing throughput. We evaluate ASEEM by calculating results with particle data from 2010 until 2016 for one space-grade and two commercial-grade FPGAs. Based on radiation data, we calculate upset rates, availability, performance and performability. We discuss one realization of ASEEM in detail with fixed upset rates. The examples presented in this paper show a reduction of the upset rate form a sixth to a ninth (compared with the performance mode) and the availability of the high processing power over 90 % in the considered time interval. We conclude that the investigated ASEEM realization is optimal for moderate and long mean times to repair. In a processing case study, with a fixed mean time to repair of one hour, we obtain a performability improvement of 14% and an availability improvement of 21 % over the performance mode for an FPGA using the latest semiconductor technology.
对于在辐射恶劣环境中的应用,设计师根据最坏情况(太阳能)条件应用缓解技术,以实现可靠的设计。这将导致资源开销,而这在大多数情况下是不必要的。为了克服这个问题,使用了自适应缓解技术。该技术是在性能和可靠性等两个参数之间进行权衡,根据不同的操作模式在这些模式之间切换。在此背景下,我们提出了一种自适应单事件效应缓解(ASEEM)方法。它基于FPGA在两种模式之间的自适应重构,特别是性能模式和高可靠性模式。性能模式提供高处理能力,从而提高信号处理吞吐量。我们通过对一个空间级和两个商业级fpga从2010年到2016年的粒子数据计算结果来评估ASEEM。根据辐射数据,我们计算出了拆迁率、可用性、性能和性能。我们详细讨论了一种具有固定扰流率的ASEEM实现。本文给出的例子表明,在考虑的时间间隔内,搅乱率从六分之一降低到九分之一(与性能模式相比),高处理能力的可用性超过90%。我们得出结论,所研究的ASEEM实现对于中等和较长的平均修复时间是最佳的。在处理案例研究中,在固定的平均修复时间为一小时的情况下,我们获得了使用最新半导体技术的FPGA性能模式的14%的性能改进和21%的可用性改进。
{"title":"Adaptive single-event effect mitigation for dependable processing systems","authors":"R. Glein, F. Rittner, A. Heuberger","doi":"10.1109/ReConFig.2016.7857149","DOIUrl":"https://doi.org/10.1109/ReConFig.2016.7857149","url":null,"abstract":"For application in radiation-harsh environments, designers apply mitigation techniques according the worst-case (solar) condition to achieve a dependable design. This results in a resource overhead, which is most of the time unnecessary. To overcome this problem, adaptive mitigation techniques are used. This technique is a trade-off between two parameters, such as performance and reliability, according to different operating modes by toggling between these modes. In this context, we propose an Adaptive Single-Event Effect Mitigation (ASEEM) method. It is based on adaptive reconfiguration of an FPGA between two modes, specifically a performance mode and a high reliability mode. The performance mode offers high processing power and thus higher signal processing throughput. We evaluate ASEEM by calculating results with particle data from 2010 until 2016 for one space-grade and two commercial-grade FPGAs. Based on radiation data, we calculate upset rates, availability, performance and performability. We discuss one realization of ASEEM in detail with fixed upset rates. The examples presented in this paper show a reduction of the upset rate form a sixth to a ninth (compared with the performance mode) and the availability of the high processing power over 90 % in the considered time interval. We conclude that the investigated ASEEM realization is optimal for moderate and long mean times to repair. In a processing case study, with a fixed mean time to repair of one hour, we obtain a performability improvement of 14% and an availability improvement of 21 % over the performance mode for an FPGA using the latest semiconductor technology.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124246178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Dual fixed-point CORDIC processor: Architecture and FPGA implementation 双定点CORDIC处理器:体系结构和FPGA实现
Pub Date : 2016-11-01 DOI: 10.1109/ReConFig.2016.7857166
Andres Jacoby, D. Llamocca
We introduce Dual Fixed Point CORDIC, that provides a compromise between Fixed Point and Floating Point CORDIC hardware implementations. A fully parameterized hardware is presented that allows for extensive exploration of the resources-accuracy design space, from which we generate optimal (in the multi-objective sense) realizations. We compare Fixed Point, Dual Fixed Point, and Floating Point CORDIC units in terms of resources and accuracy. Results show the effectiveness of Dual Fixed Point for CORDIC implementation where the increase in resources is largely offset by the high accuracy improvements.
我们介绍了双定点CORDIC,它提供了定点和浮点CORDIC硬件实现之间的折衷。提出了一种完全参数化的硬件,允许对资源精度设计空间进行广泛的探索,从中我们产生最优(在多目标意义上)实现。我们在资源和精度方面比较了定点、双定点和浮点CORDIC单元。结果表明双不动点在CORDIC实现中的有效性,其中资源的增加在很大程度上被高精度的改进所抵消。
{"title":"Dual fixed-point CORDIC processor: Architecture and FPGA implementation","authors":"Andres Jacoby, D. Llamocca","doi":"10.1109/ReConFig.2016.7857166","DOIUrl":"https://doi.org/10.1109/ReConFig.2016.7857166","url":null,"abstract":"We introduce Dual Fixed Point CORDIC, that provides a compromise between Fixed Point and Floating Point CORDIC hardware implementations. A fully parameterized hardware is presented that allows for extensive exploration of the resources-accuracy design space, from which we generate optimal (in the multi-objective sense) realizations. We compare Fixed Point, Dual Fixed Point, and Floating Point CORDIC units in terms of resources and accuracy. Results show the effectiveness of Dual Fixed Point for CORDIC implementation where the increase in resources is largely offset by the high accuracy improvements.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123399635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A high-efficiency runtime reconfigurable IP for CNN acceleration on a mid-range all-programmable SoC 在中档全可编程SoC上用于CNN加速的高效运行时可重构IP
Pub Date : 2016-11-01 DOI: 10.1109/ReConFig.2016.7857144
P. Meloni, Gianfranco Deriu, Francesco Conti, Igor Loi, L. Raffo, L. Benini
Convolutional Neural Networks (CNNs) are a nature-inspired model, extensively employed in a broad range of applications in computer vision, machine learning and pattern recognition. The CNN algorithm requires execution of multiple layers, commonly called convolution layers, that involve application of 2D convolution filters of different sizes over a set of input image features. Such a computation kernel is intrinsically parallel, thus significantly benefits from acceleration on parallel hardware. In this work, we propose an accelerator architecture, suitable to be implemented on mid-to high-range FPGA devices, that can be re-configured at runtime to adapt to different filter sizes in different convolution layers. We present an accelerator configuration, mapped on a Xilinx Zynq XC-Z7045 device, that achieves up to 120 GMAC/s (16 bit precision) when executing 5×5 filters and up to 129 GMAC/s when executing 3×3 filters, consuming less than 10W of power, reaching more than 97% DSP resource utilizazion at 150MHz operating frequency and requiring only 16B/cycle I/O bandwidth.
卷积神经网络(cnn)是一种受自然启发的模型,广泛应用于计算机视觉、机器学习和模式识别等领域。CNN算法需要执行多层,通常称为卷积层,涉及在一组输入图像特征上应用不同大小的二维卷积滤波器。这种计算内核本质上是并行的,因此从并行硬件上的加速中获益良多。在这项工作中,我们提出了一种适合在中高档FPGA器件上实现的加速器架构,该架构可以在运行时重新配置以适应不同卷积层中的不同滤波器尺寸。我们提出了一种加速器配置,映射到Xilinx Zynq XC-Z7045器件上,在执行5×5滤波器时达到120 GMAC/s(16位精度),在执行3×3滤波器时达到129 GMAC/s,消耗不到10W的功率,在150MHz工作频率下达到97%以上的DSP资源利用率,只需要16B/周期I/O带宽。
{"title":"A high-efficiency runtime reconfigurable IP for CNN acceleration on a mid-range all-programmable SoC","authors":"P. Meloni, Gianfranco Deriu, Francesco Conti, Igor Loi, L. Raffo, L. Benini","doi":"10.1109/ReConFig.2016.7857144","DOIUrl":"https://doi.org/10.1109/ReConFig.2016.7857144","url":null,"abstract":"Convolutional Neural Networks (CNNs) are a nature-inspired model, extensively employed in a broad range of applications in computer vision, machine learning and pattern recognition. The CNN algorithm requires execution of multiple layers, commonly called convolution layers, that involve application of 2D convolution filters of different sizes over a set of input image features. Such a computation kernel is intrinsically parallel, thus significantly benefits from acceleration on parallel hardware. In this work, we propose an accelerator architecture, suitable to be implemented on mid-to high-range FPGA devices, that can be re-configured at runtime to adapt to different filter sizes in different convolution layers. We present an accelerator configuration, mapped on a Xilinx Zynq XC-Z7045 device, that achieves up to 120 GMAC/s (16 bit precision) when executing 5×5 filters and up to 129 GMAC/s when executing 3×3 filters, consuming less than 10W of power, reaching more than 97% DSP resource utilizazion at 150MHz operating frequency and requiring only 16B/cycle I/O bandwidth.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125026773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
期刊
2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1