{"title":"Session details: Applications 1","authors":"P. Lysaght","doi":"10.1145/3260939","DOIUrl":"https://doi.org/10.1145/3260939","url":null,"abstract":"","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"25 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116630762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Architecture","authors":"M. Hutton","doi":"10.1145/3260937","DOIUrl":"https://doi.org/10.1145/3260937","url":null,"abstract":"","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125928707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. M. Abdellatif, R. Chotin-Avot, Z. Marrakchi, H. Mehrez, Qingshan Tang
AES-GCM has been utilized in various security applications. It consists of two components: an Advanced Encryption Standard (AES) engine and a Galois Hash (GHASH) core. The performance of the system is determined by the GHASH architecture because of the inherent computation feedback. This paper introduces a modification for the pipelined Karatsuba Ofman Algorithm (KOA)-based GHASH. In particular, the computation feedback is removed by analyzing the complexity of the computation process. The proposed GHASH core is evaluated with three different implementations of AES ( BRAMs-based SubBytes, composite field-based SubBytes, and LUT-based SubBytes). The presented AES-GCM architectures are implemented using Xilinx Virtex5 FPGAs. Our comparison to previous work reveals that our architectures are more performance-efficient (Thr. /Slices).
{"title":"Towards high performance GHASH for pipelined AES-GCM using FPGAs (abstract only)","authors":"K. M. Abdellatif, R. Chotin-Avot, Z. Marrakchi, H. Mehrez, Qingshan Tang","doi":"10.1145/2554688.2554709","DOIUrl":"https://doi.org/10.1145/2554688.2554709","url":null,"abstract":"AES-GCM has been utilized in various security applications. It consists of two components: an Advanced Encryption Standard (AES) engine and a Galois Hash (GHASH) core. The performance of the system is determined by the GHASH architecture because of the inherent computation feedback. This paper introduces a modification for the pipelined Karatsuba Ofman Algorithm (KOA)-based GHASH. In particular, the computation feedback is removed by analyzing the complexity of the computation process. The proposed GHASH core is evaluated with three different implementations of AES ( BRAMs-based SubBytes, composite field-based SubBytes, and LUT-based SubBytes). The presented AES-GCM architectures are implemented using Xilinx Virtex5 FPGAs. Our comparison to previous work reveals that our architectures are more performance-efficient (Thr. /Slices).","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128922322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
FPGA behavioral synthesis has gained significant momentum recently with the growing interests in accelerating high-performance computing applications. While the latest generation of high-level synthesis (HLS) tools has made significant progress, they still lack the support for certain high-level language features such as dynamic memory allocation, despite the fact that efficiently utilization of the on-chip memory resources in FPGAs is critical to achieve the performance and power consumption target for many designs. To tackle the above problem, in this paper, we propose a novel hybrid memory allocation scheme to map malloc/free in C programing language onto FPGA platforms. By estimating the memory usage and available FPGA memory resources, the scheme judiciously allocates static memory blocks and/or instantiate hardware allocators for memory requests. And the partition between these two parts is based on estimated access counts and solving an ILP to minimize overhead from dynamic memory allocation. Experimental results on benchmark circuits demonstrate the efficacy of the proposed technique.
{"title":"On hybrid memory allocation for FPGA behavioral synthesis (abstract only)","authors":"Qian Zhang, Chenfei Ma, Q. Xu","doi":"10.1145/2554688.2554697","DOIUrl":"https://doi.org/10.1145/2554688.2554697","url":null,"abstract":"FPGA behavioral synthesis has gained significant momentum recently with the growing interests in accelerating high-performance computing applications. While the latest generation of high-level synthesis (HLS) tools has made significant progress, they still lack the support for certain high-level language features such as dynamic memory allocation, despite the fact that efficiently utilization of the on-chip memory resources in FPGAs is critical to achieve the performance and power consumption target for many designs. To tackle the above problem, in this paper, we propose a novel hybrid memory allocation scheme to map malloc/free in C programing language onto FPGA platforms. By estimating the memory usage and available FPGA memory resources, the scheme judiciously allocates static memory blocks and/or instantiate hardware allocators for memory requests. And the partition between these two parts is based on estimated access counts and solving an ILP to minimize overhead from dynamic memory allocation. Experimental results on benchmark circuits demonstrate the efficacy of the proposed technique.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125419844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Today's SRAM-based FPGAs provide a reach set of computing resources which makes them attractive in demanding and critical application domains, such as avionics and space. Unfortunately, their high reliance on SRAM configuration memory arise reliability issues due to the single-event upsets (SEUs). Considering the criticality of these applications, the vulnerability analysis of FPGA designs to SEUs becomes essential part of the design flow. In this context, we present an open-source framework for the soft error vulnerability analysis of Xilinx FPGA devices. The proposed framework will allow researchers to evaluate their reliability-aware CAD algorithms and estimate the soft error susceptibility of the designs at early stages of the implementation flow for the latest Xilinx architectures.
{"title":"A soft error vulnerability analysis framework for Xilinx FPGAs","authors":"Aitzan Sari, D. Agiakatsikas, M. Psarakis","doi":"10.1145/2554688.2554767","DOIUrl":"https://doi.org/10.1145/2554688.2554767","url":null,"abstract":"Today's SRAM-based FPGAs provide a reach set of computing resources which makes them attractive in demanding and critical application domains, such as avionics and space. Unfortunately, their high reliance on SRAM configuration memory arise reliability issues due to the single-event upsets (SEUs). Considering the criticality of these applications, the vulnerability analysis of FPGA designs to SEUs becomes essential part of the design flow. In this context, we present an open-source framework for the soft error vulnerability analysis of Xilinx FPGA devices. The proposed framework will allow researchers to evaluate their reliability-aware CAD algorithms and estimate the soft error susceptibility of the designs at early stages of the implementation flow for the latest Xilinx architectures.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129151347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Processors and systems","authors":"M. Leeser","doi":"10.1145/3260940","DOIUrl":"https://doi.org/10.1145/3260940","url":null,"abstract":"","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131789643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, electronic industries have been facing an increased amount of hardware counterfeits. These counterfeit components, when assembled into a product or a system, can not only jeopardize performance and reliability but also create safety issues. Physical Unclonable Function (PUF) provides means to enhance physical security of Integrated Circuits (IC) against piracy and unauthorized access. The proposed design illustrates the feasibility of using self-timed ring oscillators as a novel approach towards PUF implementation for FPGA authentication. The proposed Self-Timed Ring Oscillator PUF (STRO-PUF) consists of two groups of identically laid-out self-timed ring oscillators. Inputs to the PUF are given through a challenge generator, which selects two self-timed ring oscillators from each group. Outputs of oscillators are fed to multiplexers of corresponding groups. Self-timed ring oscillators exploit the inherent features of random process variations by producing varying frequencies. These unpredictable variations in frequencies are captured using frequency comparator, which generates a output bit. A unique set of output bits , or response is generated for each set of input bits, or challenge. This unique Challenge Response Pair (CRP) is used in identifying a particular device. Frequencies generated from these oscillators are read through a logic analyzer. The varying frequencies observed from all the oscillators mapped across different regions of FPGAs range from 16.234 MHz to 125 MHz with the average frequency of 101.446 MHz. Experimental result shows the uniqueness for the PUF response is 49.92% which is very close to the desired 50% factor.
{"title":"Asynchronous physical unclonable function using FPGA-based self-timed ring oscillator (abstract only)","authors":"R. Silwal, M. Niamat","doi":"10.1145/2554688.2554745","DOIUrl":"https://doi.org/10.1145/2554688.2554745","url":null,"abstract":"Recently, electronic industries have been facing an increased amount of hardware counterfeits. These counterfeit components, when assembled into a product or a system, can not only jeopardize performance and reliability but also create safety issues. Physical Unclonable Function (PUF) provides means to enhance physical security of Integrated Circuits (IC) against piracy and unauthorized access. The proposed design illustrates the feasibility of using self-timed ring oscillators as a novel approach towards PUF implementation for FPGA authentication. The proposed Self-Timed Ring Oscillator PUF (STRO-PUF) consists of two groups of identically laid-out self-timed ring oscillators. Inputs to the PUF are given through a challenge generator, which selects two self-timed ring oscillators from each group. Outputs of oscillators are fed to multiplexers of corresponding groups. Self-timed ring oscillators exploit the inherent features of random process variations by producing varying frequencies. These unpredictable variations in frequencies are captured using frequency comparator, which generates a output bit. A unique set of output bits , or response is generated for each set of input bits, or challenge. This unique Challenge Response Pair (CRP) is used in identifying a particular device. Frequencies generated from these oscillators are read through a logic analyzer. The varying frequencies observed from all the oscillators mapped across different regions of FPGAs range from 16.234 MHz to 125 MHz with the average frequency of 101.446 MHz. Experimental result shows the uniqueness for the PUF response is 49.92% which is very close to the desired 50% factor.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130242236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Hutchings, Joshua S. Monson, D. Savory, J. Keeley
A novel Digital to Analog Converter (DAC) modulates the overall power consumption of an FPGA by disabling/enabling short circuits programmed into the interconnect. The power pin of the FPGA serves as the output of the DAC. The DAC achieves high linearity and can be used to implement applications in communications, security, etc. The shortcircuit-based DAC consumes 1/3 the area of an alternative shift-register-based DAC that is presented for the sake of comparison.
{"title":"A power side-channel-based digital to analog converterfor Xilinx FPGAs","authors":"B. Hutchings, Joshua S. Monson, D. Savory, J. Keeley","doi":"10.1145/2554688.2554770","DOIUrl":"https://doi.org/10.1145/2554688.2554770","url":null,"abstract":"A novel Digital to Analog Converter (DAC) modulates the overall power consumption of an FPGA by disabling/enabling short circuits programmed into the interconnect. The power pin of the FPGA serves as the output of the DAC. The DAC achieves high linearity and can be used to implement applications in communications, security, etc. The shortcircuit-based DAC consumes 1/3 the area of an alternative shift-register-based DAC that is presented for the sake of comparison.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125770085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents an adaptive heap sort architecture for an image coding implementation on FPGA, which specifically addresses the issue of sorting different amount of data located in each subband during the coding. The proposed sorting architecture is easily scalable. Performance of the sorter only depends on the amount of data sorted. The efficient usage of dual port memories yields high throughput up to 50 Msamples/s and their adaptive trigger/shutdown provide the average dynamic power reduction up to 20.9%. We designed this architecture and incorporated it in our Adaptive Scanning of Wavelet Data (ASWD) module which reorganizes the wavelet coefficients into locally stationary sequences for a wavelet-based image encoder. We validated the hardware on an Altera's Stratix IV FPGA as an IP accelerator in a Nios II processor based System on Chip. The architectural innovations can also be exploited in other applications that require high throughput and scalable sorting. Our experiments show that compared to an embedded ARM CortexA9 processor running at 666 MHz, our architecture at 100 MHz can provide around 13X speedup while consuming 242 mW average core dynamic power.
本文提出了一种用于FPGA图像编码实现的自适应堆排序架构,该架构具体解决了编码过程中位于每个子带的不同数据量的排序问题。所建议的排序体系结构易于扩展。排序器的性能仅取决于排序的数据量。双端口存储器的有效使用可产生高达50 Msamples/s的高吞吐量,其自适应触发/关闭可提供高达20.9%的平均动态功耗降低。我们设计了这种结构,并将其整合到我们的小波数据自适应扫描(ASWD)模块中,该模块将小波系数重组为局部平稳序列,用于基于小波的图像编码器。我们在Altera的Stratix IV FPGA上验证了硬件作为基于Nios II处理器的片上系统的IP加速器。架构上的创新也可以用于其他需要高吞吐量和可扩展排序的应用程序。我们的实验表明,与运行在666 MHz的嵌入式ARM CortexA9处理器相比,我们的架构在100 MHz时可以提供大约13倍的加速,同时消耗242 mW的平均核心动态功率。
{"title":"A power-efficient adaptive heapsort for fpga-based image coding application (abstract only)","authors":"Yuhui Bai, S. Z. Ahmed, B. Granado","doi":"10.1145/2554688.2554746","DOIUrl":"https://doi.org/10.1145/2554688.2554746","url":null,"abstract":"This paper presents an adaptive heap sort architecture for an image coding implementation on FPGA, which specifically addresses the issue of sorting different amount of data located in each subband during the coding. The proposed sorting architecture is easily scalable. Performance of the sorter only depends on the amount of data sorted. The efficient usage of dual port memories yields high throughput up to 50 Msamples/s and their adaptive trigger/shutdown provide the average dynamic power reduction up to 20.9%. We designed this architecture and incorporated it in our Adaptive Scanning of Wavelet Data (ASWD) module which reorganizes the wavelet coefficients into locally stationary sequences for a wavelet-based image encoder. We validated the hardware on an Altera's Stratix IV FPGA as an IP accelerator in a Nios II processor based System on Chip. The architectural innovations can also be exploited in other applications that require high throughput and scalable sorting. Our experiments show that compared to an embedded ARM CortexA9 processor running at 666 MHz, our architecture at 100 MHz can provide around 13X speedup while consuming 242 mW average core dynamic power.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"509 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132479074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Optical flow computation is widely used in many video/image based applications such as motion detection, video compression etc. Dense optical flow field that provides more details of information is more useful in lots of applications. However, high-quality algorithms for dense optical flow computation are computationally expensive. For instance, on the ARM Cortex-A9 processor within ZYNQ, the popular linear variational method Combine-Brightness-Gradient (CBG), spends $26.68s per frame to compute optical flow when the image size is 640 x 480. It is difficult to be sped up especially when embedded systems with power constraints are considered. Poor portability is another factor to limit current implementations of optical flow computation to be used in more applications. In this paper, a high-performance, low-power FPGA-accelerated implementation of dense optical flow computation is presented. One high-quality dense optical flow method, the Combine-Brightness-Gradient model, is implemented. C code instead of VHDL/Verilog HDL is used to improve the productivity. Portability of the system is designed carefully for deploying it on different platforms conveniently. Experimental results show 12 fps and 0.38J per frame are achieved by this optical flow computing system when 640 x 480 image is used and optical flow for all pixels are computed. Furthermore, portability is demonstrated by implementing the optical flow algorithm on different heterogeneous platforms such as the ZYNQ-7000 SoC and the PC-FPGA platform with a Kintex-7 FPGA respectively.
{"title":"Implementing FPGA-based energy-efficient dense optical flow computation with high portability in C (abstract only)","authors":"Zhibin Wang, Wenmin Yang, Jin Yu, Zhilei Chai","doi":"10.1145/2554688.2554733","DOIUrl":"https://doi.org/10.1145/2554688.2554733","url":null,"abstract":"Optical flow computation is widely used in many video/image based applications such as motion detection, video compression etc. Dense optical flow field that provides more details of information is more useful in lots of applications. However, high-quality algorithms for dense optical flow computation are computationally expensive. For instance, on the ARM Cortex-A9 processor within ZYNQ, the popular linear variational method Combine-Brightness-Gradient (CBG), spends $26.68s per frame to compute optical flow when the image size is 640 x 480. It is difficult to be sped up especially when embedded systems with power constraints are considered. Poor portability is another factor to limit current implementations of optical flow computation to be used in more applications. In this paper, a high-performance, low-power FPGA-accelerated implementation of dense optical flow computation is presented. One high-quality dense optical flow method, the Combine-Brightness-Gradient model, is implemented. C code instead of VHDL/Verilog HDL is used to improve the productivity. Portability of the system is designed carefully for deploying it on different platforms conveniently. Experimental results show 12 fps and 0.38J per frame are achieved by this optical flow computing system when 640 x 480 image is used and optical flow for all pixels are computed. Furthermore, portability is demonstrated by implementing the optical flow algorithm on different heterogeneous platforms such as the ZYNQ-7000 SoC and the PC-FPGA platform with a Kintex-7 FPGA respectively.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130140292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}