Pub Date : 2016-11-30DOI: 10.1109/ReConFig.2016.7857178
L. Jung, C. Hochberger
Coarse Grained Reconfigurable Arrays (CGRA) can be used to substantially boost the processing power of embedded applications. They can be included in typical system-on-chip architectures to execute computationally demanding parts of the application. Delegating execution to the CGRA requires the exchange of live in/out variables between the processor core and the CGRA. In this paper we search the optimal interface between the surrounding system and the CGRA with respect to impact on the operating frequency, the used resources and the runtime overhead.
{"title":"Optimal processor interface for CGRA-based accelerators implemented on FPGAs","authors":"L. Jung, C. Hochberger","doi":"10.1109/ReConFig.2016.7857178","DOIUrl":"https://doi.org/10.1109/ReConFig.2016.7857178","url":null,"abstract":"Coarse Grained Reconfigurable Arrays (CGRA) can be used to substantially boost the processing power of embedded applications. They can be included in typical system-on-chip architectures to execute computationally demanding parts of the application. Delegating execution to the CGRA requires the exchange of live in/out variables between the processor core and the CGRA. In this paper we search the optimal interface between the surrounding system and the CGRA with respect to impact on the operating frequency, the used resources and the runtime overhead.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133914186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-11-01DOI: 10.1109/ReConFig.2016.7857148
Farnoud Farahmand, Ekawat Homsirikamol, K. Gaj
Hardware performance evaluation of candidates competing in cryptographic contests, such as SHA-3 and CAE-SAR, is very important for ranking their suitability for standardization. One of the most essential performance metrics is the throughput, which highly depends on the algorithm, hardware implementation architecture, coding style, and options of tools. The maximum throughput is calculated based on the maximum clock frequency supported by each algorithm. A common way of determining the maximum clock frequency is static timing analysis provided by the CAD toolsets such as Xilinx ISE, Xilinx Vivado, and Altera Quartus Prime. In this project, we have developed a universal testbed, which is capable of measuring the maximum clock frequency experimentally, using a prototyping board. We are targeting cryptographic hardware cores, such as implementations of SHA-3 candidates. Our testbed is designed using a Zynq platform and takes advantage of software/hardware co-design. It supports two separate clock domains, one for a hardware module under test, and the other for the communication between an ARM core and hardware accelerator. We measured the maximum clock frequency and the execution time of 12 Round 2 SHA-3 candidates experimentally on ZedBoard and compared the results with the frequencies reported by Xilinx Vivado. Our results indicate that depending on the characteristics of each algorithm, we may achieve either much higher or the same experimental frequency than the results reported by the tools using static timing analysis. This behavior is then further analyzed, and the relevant conclusions drawn.
对参加加密竞赛的候选算法(如SHA-3和CAE-SAR)进行硬件性能评估,对于确定它们是否适合标准化非常重要。最重要的性能指标之一是吞吐量,它高度依赖于算法、硬件实现体系结构、编码风格和工具选项。最大吞吐量是根据各算法支持的最大时钟频率计算得出的。确定最大时钟频率的常用方法是由CAD工具集(如Xilinx ISE, Xilinx Vivado和Altera Quartus Prime)提供的静态时序分析。在这个项目中,我们开发了一个通用的测试平台,它能够通过实验测量最大时钟频率,使用原型板。我们的目标是加密硬件核心,例如SHA-3候选实现。我们的测试平台使用Zynq平台设计,并利用软件/硬件协同设计的优势。它支持两个独立的时钟域,一个用于被测硬件模块,另一个用于ARM内核和硬件加速器之间的通信。我们在ZedBoard上实验测量了12个Round 2 SHA-3候选算法的最大时钟频率和执行时间,并将结果与Xilinx Vivado报告的频率进行了比较。我们的结果表明,根据每个算法的特性,我们可以获得比使用静态时序分析的工具报告的结果高得多或相同的实验频率。然后进一步分析这种行为,并得出相关结论。
{"title":"A Zynq-based testbed for the experimental benchmarking of algorithms competing in cryptographic contests","authors":"Farnoud Farahmand, Ekawat Homsirikamol, K. Gaj","doi":"10.1109/ReConFig.2016.7857148","DOIUrl":"https://doi.org/10.1109/ReConFig.2016.7857148","url":null,"abstract":"Hardware performance evaluation of candidates competing in cryptographic contests, such as SHA-3 and CAE-SAR, is very important for ranking their suitability for standardization. One of the most essential performance metrics is the throughput, which highly depends on the algorithm, hardware implementation architecture, coding style, and options of tools. The maximum throughput is calculated based on the maximum clock frequency supported by each algorithm. A common way of determining the maximum clock frequency is static timing analysis provided by the CAD toolsets such as Xilinx ISE, Xilinx Vivado, and Altera Quartus Prime. In this project, we have developed a universal testbed, which is capable of measuring the maximum clock frequency experimentally, using a prototyping board. We are targeting cryptographic hardware cores, such as implementations of SHA-3 candidates. Our testbed is designed using a Zynq platform and takes advantage of software/hardware co-design. It supports two separate clock domains, one for a hardware module under test, and the other for the communication between an ARM core and hardware accelerator. We measured the maximum clock frequency and the execution time of 12 Round 2 SHA-3 candidates experimentally on ZedBoard and compared the results with the frequencies reported by Xilinx Vivado. Our results indicate that depending on the characteristics of each algorithm, we may achieve either much higher or the same experimental frequency than the results reported by the tools using static timing analysis. This behavior is then further analyzed, and the relevant conclusions drawn.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126156050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-11-01DOI: 10.1109/ReConFig.2016.7857143
G. Kiefer, Matthias Vahl, Julian Sarcher, M. Schaeferling
Object recognition in huge image data sets or in live camera images at interactive frame rates is a very demanding task, especially within embedded systems. The recognition task includes the localization of a reference object and its rotation and scaling in a search image. The Generalized Hough Transform (GHT) is known as a powerful and robust technique to support this task by transforming the search image into a 4D parameter space. However, the GHT itself is very complex and demanding towards computational power and memory consumption. This paper presents a novel hardware architecture to perform a complete 4D GHT at interactive frame rates in an FPGA. The architecture is configurable in order to allow a trade-off between performance, accuracy and hardware usage. The proposed architecture has been implemented in a low-cost Zynq-7000 FPGA and successfully evaluated in two practical applications, namely groyne detection in aerial images and traffic sign detection.
{"title":"A configurable architecture for the generalized hough transform applied to the analysis of huge aerial images and to traffic sign detection","authors":"G. Kiefer, Matthias Vahl, Julian Sarcher, M. Schaeferling","doi":"10.1109/ReConFig.2016.7857143","DOIUrl":"https://doi.org/10.1109/ReConFig.2016.7857143","url":null,"abstract":"Object recognition in huge image data sets or in live camera images at interactive frame rates is a very demanding task, especially within embedded systems. The recognition task includes the localization of a reference object and its rotation and scaling in a search image. The Generalized Hough Transform (GHT) is known as a powerful and robust technique to support this task by transforming the search image into a 4D parameter space. However, the GHT itself is very complex and demanding towards computational power and memory consumption. This paper presents a novel hardware architecture to perform a complete 4D GHT at interactive frame rates in an FPGA. The architecture is configurable in order to allow a trade-off between performance, accuracy and hardware usage. The proposed architecture has been implemented in a low-cost Zynq-7000 FPGA and successfully evaluated in two practical applications, namely groyne detection in aerial images and traffic sign detection.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122062439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-11-01DOI: 10.1109/ReConFig.2016.7857173
J. Joseph, Tobias Winker, Kristian Ehlers, Christopher Blochwitz, Thilo Pionteck
The focus of this work is to facilitate pose estimation and, thus, gesture recognition for embedded systems, although these are tasks with high computational performance requirements. Therefore, an existing pose estimation algorithm is optimized for Xilinx High Level Synthesis (HLS). The resulting hardware acceleration cores are compared for different optimizations and, finally, we propose a hardware/software system design for a Xilinx Zynq Zedboard. Using this method, we achieve a speedup of 1.6 in comparison to a software solution on the ARM processor and, thus, facilitate hand tracking for embedded systems with low power consumption.
这项工作的重点是促进姿态估计,从而促进嵌入式系统的手势识别,尽管这些任务具有很高的计算性能要求。因此,针对Xilinx High Level Synthesis (HLS),对现有的姿态估计算法进行了优化。最后,我们提出了Xilinx Zynq Zedboard的硬件/软件系统设计方案。使用这种方法,与ARM处理器上的软件解决方案相比,我们实现了1.6的加速,从而促进了低功耗嵌入式系统的手部跟踪。
{"title":"Hardware-accelerated pose estimation for embedded systems using Vivado HLS","authors":"J. Joseph, Tobias Winker, Kristian Ehlers, Christopher Blochwitz, Thilo Pionteck","doi":"10.1109/ReConFig.2016.7857173","DOIUrl":"https://doi.org/10.1109/ReConFig.2016.7857173","url":null,"abstract":"The focus of this work is to facilitate pose estimation and, thus, gesture recognition for embedded systems, although these are tasks with high computational performance requirements. Therefore, an existing pose estimation algorithm is optimized for Xilinx High Level Synthesis (HLS). The resulting hardware acceleration cores are compared for different optimizations and, finally, we propose a hardware/software system design for a Xilinx Zynq Zedboard. Using this method, we achieve a speedup of 1.6 in comparison to a software solution on the ARM processor and, thus, facilitate hand tracking for embedded systems with low power consumption.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129245338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-11-01DOI: 10.1109/ReConFig.2016.7857185
Andreas Becher, S. Wildermann, Moritz Mühlenthaler, J. Teich
Modern Programmable FPGA-based SoCs that tightly couple CPU and programmable logic enable the acceleration of stream processing in hardware on-demand by making use of the available high input and output throughputs and the reconfigurability both in software and hardware. In this paper, we present the concept and implementation of a hardware unit called ReOrder that serves as a converter for multiple parallel streams of data read from and written to an accelerator. Our technique and programmable design allows flexible data access and connects different stream processing accelerators independent of the host data layout. In order to achieve a high accelerator throughput, it is necessary to determine an optimized datapath according to the accelerator's internal schedule of input and output data. We are concerned with an online setting, in which either the data layout (e.g., in the case of modern database systems) or the accelerator operational mode change dynamically. Therefore, an algorithm is required which can be used at “runtime” in order to maintain an optimized datapath configuration. We propose an efficient heuristic algorithm and corresponding FPGA design that is able to translate arbitrary (multi-source) data layouts of the connected host system to generate any specified data stream of the accelerator at runtime within ms.
{"title":"ReOrder: Runtime datapath generation for high-throughput multi-stream processing","authors":"Andreas Becher, S. Wildermann, Moritz Mühlenthaler, J. Teich","doi":"10.1109/ReConFig.2016.7857185","DOIUrl":"https://doi.org/10.1109/ReConFig.2016.7857185","url":null,"abstract":"Modern Programmable FPGA-based SoCs that tightly couple CPU and programmable logic enable the acceleration of stream processing in hardware on-demand by making use of the available high input and output throughputs and the reconfigurability both in software and hardware. In this paper, we present the concept and implementation of a hardware unit called ReOrder that serves as a converter for multiple parallel streams of data read from and written to an accelerator. Our technique and programmable design allows flexible data access and connects different stream processing accelerators independent of the host data layout. In order to achieve a high accelerator throughput, it is necessary to determine an optimized datapath according to the accelerator's internal schedule of input and output data. We are concerned with an online setting, in which either the data layout (e.g., in the case of modern database systems) or the accelerator operational mode change dynamically. Therefore, an algorithm is required which can be used at “runtime” in order to maintain an optimized datapath configuration. We propose an efficient heuristic algorithm and corresponding FPGA design that is able to translate arbitrary (multi-source) data layouts of the connected host system to generate any specified data stream of the accelerator at runtime within ms.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130379407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-11-01DOI: 10.1109/ReConFig.2016.7857160
Tiziana Fanni, L. Raffo
This work presents an automatic power estimation and implementation flow for coarse-grained reconfigurable systems, capable of guiding designers towards the optimal implementation of power-efficient systems. The entire flow is assessed over the reconfigurable computing core of a dedicated image processing accelerator, targeting an ASIC 45 nm technology.
{"title":"Coarse grain reconfiguration: Power estimation and management flow for hybrid gated systems","authors":"Tiziana Fanni, L. Raffo","doi":"10.1109/ReConFig.2016.7857160","DOIUrl":"https://doi.org/10.1109/ReConFig.2016.7857160","url":null,"abstract":"This work presents an automatic power estimation and implementation flow for coarse-grained reconfigurable systems, capable of guiding designers towards the optimal implementation of power-efficient systems. The entire flow is assessed over the reconfigurable computing core of a dedicated image processing accelerator, targeting an ASIC 45 nm technology.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134558039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-11-01DOI: 10.1109/RECONFIG.2016.7857175
Ho-Cheung Ng, Maolin Wang, Bob M. F. Chung, B. S. C. Varma, M. Jaiswal, S. M. H. Ho, K. Tsia, H. Shum, Hayden Kwok-Hay So
Asymmetric-Detection Time-Stretch Optical Microscopy (ATOM) is a recently emerged technology that provides ultra-fast cell imaging with a frame rate up to MHz — orders-of-magnitude higher than any classical imaging systems. However, existing measuring instruments are unable to fully exploit the capability of ATOM. For example, the volume of imaging data-set of ATOM quickly increases beyond the capacity of available onboard buffer of a modern high-speed oscilloscope. This paper presents an open source, FPGA-based solution which serves as a dual role of collecting low-level signals from ATOM frontend as well as processing and transferring data to backing store. Optical signals are sampled by a high-speed analog-to-digital converter and the resulting values are collected by an FPGA. The quantized values received are then further processed and divided into four segments for subsequent data transfer with 10 Gbit Ethernet. Four computing units are attached to these channels with direct connection in order to reliably receive the data for post-processing. Experiments show that, with decent quality images for single-cell analysis, the proposed system can store 10x more dataset than existing high-end oscilloscope. With 8x decrease in equipment cost, the proposed FPGA-based system will definitely be beneficial for many bio imaging applications with ATOM technology such as rare cancer cell imaging and identification.
{"title":"High-throughput cellular imaging with high-speed asymmetric-detection time-stretch optical microscopy under FPGA platform","authors":"Ho-Cheung Ng, Maolin Wang, Bob M. F. Chung, B. S. C. Varma, M. Jaiswal, S. M. H. Ho, K. Tsia, H. Shum, Hayden Kwok-Hay So","doi":"10.1109/RECONFIG.2016.7857175","DOIUrl":"https://doi.org/10.1109/RECONFIG.2016.7857175","url":null,"abstract":"Asymmetric-Detection Time-Stretch Optical Microscopy (ATOM) is a recently emerged technology that provides ultra-fast cell imaging with a frame rate up to MHz — orders-of-magnitude higher than any classical imaging systems. However, existing measuring instruments are unable to fully exploit the capability of ATOM. For example, the volume of imaging data-set of ATOM quickly increases beyond the capacity of available onboard buffer of a modern high-speed oscilloscope. This paper presents an open source, FPGA-based solution which serves as a dual role of collecting low-level signals from ATOM frontend as well as processing and transferring data to backing store. Optical signals are sampled by a high-speed analog-to-digital converter and the resulting values are collected by an FPGA. The quantized values received are then further processed and divided into four segments for subsequent data transfer with 10 Gbit Ethernet. Four computing units are attached to these channels with direct connection in order to reliably receive the data for post-processing. Experiments show that, with decent quality images for single-cell analysis, the proposed system can store 10x more dataset than existing high-end oscilloscope. With 8x decrease in equipment cost, the proposed FPGA-based system will definitely be beneficial for many bio imaging applications with ATOM technology such as rare cancer cell imaging and identification.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131204906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-11-01DOI: 10.1109/ReConFig.2016.7857150
A. A. Sohanghpurwala, P. Athanas
Boolean Satisfiability (SAT) is an important problem both theoretically and for a variety of practical applications. While the general SAT problem is NP complete, advanced solver algorithms and heuristics can provide fast and efficient solving of otherwise intractable problems. While much advancement has been made with Conflict Driven Clause Learning (CDCL) based sequential solvers, Stochastic Local Search (SLS) solvers such as WalkSAT, Sparrow and probSAT have proven effective for certain instance types. SLS solvers are well suited to parallelization and hardware implementation due to the simplified control flow and lack of data dependencies between solver instances started with different seeds. This paper presents a hardware implementation of the probSAT algorithm using High-Level Synthesis (HLS) for rapid porting of the design from the original C implementation. Specifically, the presented approach shows very strong performance on the class of small, but difficult SAT problems with speedups between 89–828x over MiniSAT and 5–99x over the software implementation of probSAT on such problems.
{"title":"An effective probability distribution SAT solver on reconfigurable hardware","authors":"A. A. Sohanghpurwala, P. Athanas","doi":"10.1109/ReConFig.2016.7857150","DOIUrl":"https://doi.org/10.1109/ReConFig.2016.7857150","url":null,"abstract":"Boolean Satisfiability (SAT) is an important problem both theoretically and for a variety of practical applications. While the general SAT problem is NP complete, advanced solver algorithms and heuristics can provide fast and efficient solving of otherwise intractable problems. While much advancement has been made with Conflict Driven Clause Learning (CDCL) based sequential solvers, Stochastic Local Search (SLS) solvers such as WalkSAT, Sparrow and probSAT have proven effective for certain instance types. SLS solvers are well suited to parallelization and hardware implementation due to the simplified control flow and lack of data dependencies between solver instances started with different seeds. This paper presents a hardware implementation of the probSAT algorithm using High-Level Synthesis (HLS) for rapid porting of the design from the original C implementation. Specifically, the presented approach shows very strong performance on the class of small, but difficult SAT problems with speedups between 89–828x over MiniSAT and 5–99x over the software implementation of probSAT on such problems.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115398947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-11-01DOI: 10.1109/ReConFig.2016.7857149
R. Glein, F. Rittner, A. Heuberger
For application in radiation-harsh environments, designers apply mitigation techniques according the worst-case (solar) condition to achieve a dependable design. This results in a resource overhead, which is most of the time unnecessary. To overcome this problem, adaptive mitigation techniques are used. This technique is a trade-off between two parameters, such as performance and reliability, according to different operating modes by toggling between these modes. In this context, we propose an Adaptive Single-Event Effect Mitigation (ASEEM) method. It is based on adaptive reconfiguration of an FPGA between two modes, specifically a performance mode and a high reliability mode. The performance mode offers high processing power and thus higher signal processing throughput. We evaluate ASEEM by calculating results with particle data from 2010 until 2016 for one space-grade and two commercial-grade FPGAs. Based on radiation data, we calculate upset rates, availability, performance and performability. We discuss one realization of ASEEM in detail with fixed upset rates. The examples presented in this paper show a reduction of the upset rate form a sixth to a ninth (compared with the performance mode) and the availability of the high processing power over 90 % in the considered time interval. We conclude that the investigated ASEEM realization is optimal for moderate and long mean times to repair. In a processing case study, with a fixed mean time to repair of one hour, we obtain a performability improvement of 14% and an availability improvement of 21 % over the performance mode for an FPGA using the latest semiconductor technology.
{"title":"Adaptive single-event effect mitigation for dependable processing systems","authors":"R. Glein, F. Rittner, A. Heuberger","doi":"10.1109/ReConFig.2016.7857149","DOIUrl":"https://doi.org/10.1109/ReConFig.2016.7857149","url":null,"abstract":"For application in radiation-harsh environments, designers apply mitigation techniques according the worst-case (solar) condition to achieve a dependable design. This results in a resource overhead, which is most of the time unnecessary. To overcome this problem, adaptive mitigation techniques are used. This technique is a trade-off between two parameters, such as performance and reliability, according to different operating modes by toggling between these modes. In this context, we propose an Adaptive Single-Event Effect Mitigation (ASEEM) method. It is based on adaptive reconfiguration of an FPGA between two modes, specifically a performance mode and a high reliability mode. The performance mode offers high processing power and thus higher signal processing throughput. We evaluate ASEEM by calculating results with particle data from 2010 until 2016 for one space-grade and two commercial-grade FPGAs. Based on radiation data, we calculate upset rates, availability, performance and performability. We discuss one realization of ASEEM in detail with fixed upset rates. The examples presented in this paper show a reduction of the upset rate form a sixth to a ninth (compared with the performance mode) and the availability of the high processing power over 90 % in the considered time interval. We conclude that the investigated ASEEM realization is optimal for moderate and long mean times to repair. In a processing case study, with a fixed mean time to repair of one hour, we obtain a performability improvement of 14% and an availability improvement of 21 % over the performance mode for an FPGA using the latest semiconductor technology.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":"68 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124246178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-11-01DOI: 10.1109/ReConFig.2016.7857166
Andres Jacoby, D. Llamocca
We introduce Dual Fixed Point CORDIC, that provides a compromise between Fixed Point and Floating Point CORDIC hardware implementations. A fully parameterized hardware is presented that allows for extensive exploration of the resources-accuracy design space, from which we generate optimal (in the multi-objective sense) realizations. We compare Fixed Point, Dual Fixed Point, and Floating Point CORDIC units in terms of resources and accuracy. Results show the effectiveness of Dual Fixed Point for CORDIC implementation where the increase in resources is largely offset by the high accuracy improvements.
{"title":"Dual fixed-point CORDIC processor: Architecture and FPGA implementation","authors":"Andres Jacoby, D. Llamocca","doi":"10.1109/ReConFig.2016.7857166","DOIUrl":"https://doi.org/10.1109/ReConFig.2016.7857166","url":null,"abstract":"We introduce Dual Fixed Point CORDIC, that provides a compromise between Fixed Point and Floating Point CORDIC hardware implementations. A fully parameterized hardware is presented that allows for extensive exploration of the resources-accuracy design space, from which we generate optimal (in the multi-objective sense) realizations. We compare Fixed Point, Dual Fixed Point, and Floating Point CORDIC units in terms of resources and accuracy. Results show the effectiveness of Dual Fixed Point for CORDIC implementation where the increase in resources is largely offset by the high accuracy improvements.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123399635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}