首页 > 最新文献

Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays最新文献

英文 中文
Session details: Applications 1 会话详细信息
P. Lysaght
{"title":"Session details: Applications 1","authors":"P. Lysaght","doi":"10.1145/3260939","DOIUrl":"https://doi.org/10.1145/3260939","url":null,"abstract":"","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116630762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Session details: Architecture 会话细节:架构
M. Hutton
{"title":"Session details: Architecture","authors":"M. Hutton","doi":"10.1145/3260937","DOIUrl":"https://doi.org/10.1145/3260937","url":null,"abstract":"","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125928707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards high performance GHASH for pipelined AES-GCM using FPGAs (abstract only) 使用fpga实现流水线AES-GCM的高性能GHASH(仅抽象)
K. M. Abdellatif, R. Chotin-Avot, Z. Marrakchi, H. Mehrez, Qingshan Tang
AES-GCM has been utilized in various security applications. It consists of two components: an Advanced Encryption Standard (AES) engine and a Galois Hash (GHASH) core. The performance of the system is determined by the GHASH architecture because of the inherent computation feedback. This paper introduces a modification for the pipelined Karatsuba Ofman Algorithm (KOA)-based GHASH. In particular, the computation feedback is removed by analyzing the complexity of the computation process. The proposed GHASH core is evaluated with three different implementations of AES ( BRAMs-based SubBytes, composite field-based SubBytes, and LUT-based SubBytes). The presented AES-GCM architectures are implemented using Xilinx Virtex5 FPGAs. Our comparison to previous work reveals that our architectures are more performance-efficient (Thr. /Slices).
AES-GCM已在各种安全应用中得到应用。它由两个部分组成:高级加密标准(AES)引擎和伽罗瓦散列(GHASH)核心。由于固有的计算反馈,系统的性能由GHASH体系结构决定。本文介绍了一种基于流水线Karatsuba Ofman算法(KOA)的GHASH的改进。特别地,通过分析计算过程的复杂性,消除了计算反馈。提出的GHASH核心使用三种不同的AES实现(基于brams的SubBytes,基于复合字段的SubBytes和基于lut的SubBytes)进行评估。提出的AES-GCM架构是使用Xilinx Virtex5 fpga实现的。我们与以前的工作进行了比较,发现我们的体系结构更具有性能效率。/片)。
{"title":"Towards high performance GHASH for pipelined AES-GCM using FPGAs (abstract only)","authors":"K. M. Abdellatif, R. Chotin-Avot, Z. Marrakchi, H. Mehrez, Qingshan Tang","doi":"10.1145/2554688.2554709","DOIUrl":"https://doi.org/10.1145/2554688.2554709","url":null,"abstract":"AES-GCM has been utilized in various security applications. It consists of two components: an Advanced Encryption Standard (AES) engine and a Galois Hash (GHASH) core. The performance of the system is determined by the GHASH architecture because of the inherent computation feedback. This paper introduces a modification for the pipelined Karatsuba Ofman Algorithm (KOA)-based GHASH. In particular, the computation feedback is removed by analyzing the complexity of the computation process. The proposed GHASH core is evaluated with three different implementations of AES ( BRAMs-based SubBytes, composite field-based SubBytes, and LUT-based SubBytes). The presented AES-GCM architectures are implemented using Xilinx Virtex5 FPGAs. Our comparison to previous work reveals that our architectures are more performance-efficient (Thr. /Slices).","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128922322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On hybrid memory allocation for FPGA behavioral synthesis (abstract only) FPGA行为综合的混合内存分配(仅摘要)
Qian Zhang, Chenfei Ma, Q. Xu
FPGA behavioral synthesis has gained significant momentum recently with the growing interests in accelerating high-performance computing applications. While the latest generation of high-level synthesis (HLS) tools has made significant progress, they still lack the support for certain high-level language features such as dynamic memory allocation, despite the fact that efficiently utilization of the on-chip memory resources in FPGAs is critical to achieve the performance and power consumption target for many designs. To tackle the above problem, in this paper, we propose a novel hybrid memory allocation scheme to map malloc/free in C programing language onto FPGA platforms. By estimating the memory usage and available FPGA memory resources, the scheme judiciously allocates static memory blocks and/or instantiate hardware allocators for memory requests. And the partition between these two parts is based on estimated access counts and solving an ILP to minimize overhead from dynamic memory allocation. Experimental results on benchmark circuits demonstrate the efficacy of the proposed technique.
近年来,随着人们对加速高性能计算应用的兴趣日益浓厚,FPGA行为合成获得了显著的发展势头。虽然最新一代的高级合成(HLS)工具已经取得了重大进展,但它们仍然缺乏对某些高级语言功能的支持,例如动态内存分配,尽管有效利用fpga中的片上内存资源对于实现许多设计的性能和功耗目标至关重要。为了解决上述问题,本文提出了一种新的混合内存分配方案,将C语言中的malloc/free映射到FPGA平台上。通过估计内存使用和可用的FPGA内存资源,该方案明智地为内存请求分配静态内存块和/或实例化硬件分配器。这两个部分之间的分区是基于估计的访问计数和求解ILP来最小化动态内存分配的开销。在基准电路上的实验结果证明了该方法的有效性。
{"title":"On hybrid memory allocation for FPGA behavioral synthesis (abstract only)","authors":"Qian Zhang, Chenfei Ma, Q. Xu","doi":"10.1145/2554688.2554697","DOIUrl":"https://doi.org/10.1145/2554688.2554697","url":null,"abstract":"FPGA behavioral synthesis has gained significant momentum recently with the growing interests in accelerating high-performance computing applications. While the latest generation of high-level synthesis (HLS) tools has made significant progress, they still lack the support for certain high-level language features such as dynamic memory allocation, despite the fact that efficiently utilization of the on-chip memory resources in FPGAs is critical to achieve the performance and power consumption target for many designs. To tackle the above problem, in this paper, we propose a novel hybrid memory allocation scheme to map malloc/free in C programing language onto FPGA platforms. By estimating the memory usage and available FPGA memory resources, the scheme judiciously allocates static memory blocks and/or instantiate hardware allocators for memory requests. And the partition between these two parts is based on estimated access counts and solving an ILP to minimize overhead from dynamic memory allocation. Experimental results on benchmark circuits demonstrate the efficacy of the proposed technique.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125419844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A soft error vulnerability analysis framework for Xilinx FPGAs Xilinx fpga软错误漏洞分析框架
Aitzan Sari, D. Agiakatsikas, M. Psarakis
Today's SRAM-based FPGAs provide a reach set of computing resources which makes them attractive in demanding and critical application domains, such as avionics and space. Unfortunately, their high reliance on SRAM configuration memory arise reliability issues due to the single-event upsets (SEUs). Considering the criticality of these applications, the vulnerability analysis of FPGA designs to SEUs becomes essential part of the design flow. In this context, we present an open-source framework for the soft error vulnerability analysis of Xilinx FPGA devices. The proposed framework will allow researchers to evaluate their reliability-aware CAD algorithms and estimate the soft error susceptibility of the designs at early stages of the implementation flow for the latest Xilinx architectures.
今天基于sram的fpga提供了一套广泛的计算资源,这使得它们在要求苛刻和关键的应用领域(如航空电子和空间)具有吸引力。不幸的是,它们对SRAM配置内存的高度依赖导致了单事件干扰(seu)的可靠性问题。考虑到这些应用的重要性,FPGA设计对seu的脆弱性分析成为设计流程中必不可少的一部分。在此背景下,我们提出了一个开源框架,用于分析Xilinx FPGA器件的软错误漏洞。提出的框架将允许研究人员评估他们的可靠性感知CAD算法,并在最新赛灵思架构的实施流程的早期阶段估计设计的软错误敏感性。
{"title":"A soft error vulnerability analysis framework for Xilinx FPGAs","authors":"Aitzan Sari, D. Agiakatsikas, M. Psarakis","doi":"10.1145/2554688.2554767","DOIUrl":"https://doi.org/10.1145/2554688.2554767","url":null,"abstract":"Today's SRAM-based FPGAs provide a reach set of computing resources which makes them attractive in demanding and critical application domains, such as avionics and space. Unfortunately, their high reliance on SRAM configuration memory arise reliability issues due to the single-event upsets (SEUs). Considering the criticality of these applications, the vulnerability analysis of FPGA designs to SEUs becomes essential part of the design flow. In this context, we present an open-source framework for the soft error vulnerability analysis of Xilinx FPGA devices. The proposed framework will allow researchers to evaluate their reliability-aware CAD algorithms and estimate the soft error susceptibility of the designs at early stages of the implementation flow for the latest Xilinx architectures.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129151347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Session details: Processors and systems 会话细节:处理器和系统
M. Leeser
{"title":"Session details: Processors and systems","authors":"M. Leeser","doi":"10.1145/3260940","DOIUrl":"https://doi.org/10.1145/3260940","url":null,"abstract":"","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131789643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Asynchronous physical unclonable function using FPGA-based self-timed ring oscillator (abstract only) 基于fpga的自定时环振荡器的异步物理不可克隆功能(仅抽象)
R. Silwal, M. Niamat
Recently, electronic industries have been facing an increased amount of hardware counterfeits. These counterfeit components, when assembled into a product or a system, can not only jeopardize performance and reliability but also create safety issues. Physical Unclonable Function (PUF) provides means to enhance physical security of Integrated Circuits (IC) against piracy and unauthorized access. The proposed design illustrates the feasibility of using self-timed ring oscillators as a novel approach towards PUF implementation for FPGA authentication. The proposed Self-Timed Ring Oscillator PUF (STRO-PUF) consists of two groups of identically laid-out self-timed ring oscillators. Inputs to the PUF are given through a challenge generator, which selects two self-timed ring oscillators from each group. Outputs of oscillators are fed to multiplexers of corresponding groups. Self-timed ring oscillators exploit the inherent features of random process variations by producing varying frequencies. These unpredictable variations in frequencies are captured using frequency comparator, which generates a output bit. A unique set of output bits , or response is generated for each set of input bits, or challenge. This unique Challenge Response Pair (CRP) is used in identifying a particular device. Frequencies generated from these oscillators are read through a logic analyzer. The varying frequencies observed from all the oscillators mapped across different regions of FPGAs range from 16.234 MHz to 125 MHz with the average frequency of 101.446 MHz. Experimental result shows the uniqueness for the PUF response is 49.92% which is very close to the desired 50% factor.
最近,电子行业面临着越来越多的硬件假冒。这些假冒部件在组装成产品或系统时,不仅会危及性能和可靠性,还会产生安全问题。物理不可克隆功能(PUF)提供了一种增强集成电路(IC)物理安全性的手段,以防止盗版和未经授权的访问。提出的设计说明了使用自定时环振荡器作为FPGA认证PUF实现的新方法的可行性。所提出的自定时环振PUF (STRO-PUF)由两组相同布局的自定时环振组成。PUF的输入通过挑战发生器提供,挑战发生器从每组中选择两个自定时环振荡器。振荡器的输出被馈送到相应组的多路复用器。自定时环振荡器通过产生不同的频率来利用随机过程变化的固有特征。这些不可预测的频率变化使用频率比较器捕获,它产生一个输出位。一组唯一的输出位或响应为每组输入位或挑战生成。这种独特的挑战响应对(CRP)用于识别特定的设备。从这些振荡器产生的频率通过逻辑分析仪读取。从fpga不同区域的所有振荡器中观察到的频率变化范围从16.234 MHz到125 MHz,平均频率为101.446 MHz。实验结果表明,PUF响应的唯一性为49.92%,与期望的50%因子非常接近。
{"title":"Asynchronous physical unclonable function using FPGA-based self-timed ring oscillator (abstract only)","authors":"R. Silwal, M. Niamat","doi":"10.1145/2554688.2554745","DOIUrl":"https://doi.org/10.1145/2554688.2554745","url":null,"abstract":"Recently, electronic industries have been facing an increased amount of hardware counterfeits. These counterfeit components, when assembled into a product or a system, can not only jeopardize performance and reliability but also create safety issues. Physical Unclonable Function (PUF) provides means to enhance physical security of Integrated Circuits (IC) against piracy and unauthorized access. The proposed design illustrates the feasibility of using self-timed ring oscillators as a novel approach towards PUF implementation for FPGA authentication. The proposed Self-Timed Ring Oscillator PUF (STRO-PUF) consists of two groups of identically laid-out self-timed ring oscillators. Inputs to the PUF are given through a challenge generator, which selects two self-timed ring oscillators from each group. Outputs of oscillators are fed to multiplexers of corresponding groups. Self-timed ring oscillators exploit the inherent features of random process variations by producing varying frequencies. These unpredictable variations in frequencies are captured using frequency comparator, which generates a output bit. A unique set of output bits , or response is generated for each set of input bits, or challenge. This unique Challenge Response Pair (CRP) is used in identifying a particular device. Frequencies generated from these oscillators are read through a logic analyzer. The varying frequencies observed from all the oscillators mapped across different regions of FPGAs range from 16.234 MHz to 125 MHz with the average frequency of 101.446 MHz. Experimental result shows the uniqueness for the PUF response is 49.92% which is very close to the desired 50% factor.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130242236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A power side-channel-based digital to analog converterfor Xilinx FPGAs 基于功率侧通道的Xilinx fpga数模转换器
B. Hutchings, Joshua S. Monson, D. Savory, J. Keeley
A novel Digital to Analog Converter (DAC) modulates the overall power consumption of an FPGA by disabling/enabling short circuits programmed into the interconnect. The power pin of the FPGA serves as the output of the DAC. The DAC achieves high linearity and can be used to implement applications in communications, security, etc. The shortcircuit-based DAC consumes 1/3 the area of an alternative shift-register-based DAC that is presented for the sake of comparison.
一种新型的数模转换器(DAC)通过禁用/启用编程到互连中的短路来调制FPGA的总功耗。FPGA的电源引脚作为DAC的输出。该DAC实现了高线性度,可用于实现通信、安全等应用。基于短路的DAC消耗的面积是为了比较而提出的基于移位寄存器的DAC的1/3。
{"title":"A power side-channel-based digital to analog converterfor Xilinx FPGAs","authors":"B. Hutchings, Joshua S. Monson, D. Savory, J. Keeley","doi":"10.1145/2554688.2554770","DOIUrl":"https://doi.org/10.1145/2554688.2554770","url":null,"abstract":"A novel Digital to Analog Converter (DAC) modulates the overall power consumption of an FPGA by disabling/enabling short circuits programmed into the interconnect. The power pin of the FPGA serves as the output of the DAC. The DAC achieves high linearity and can be used to implement applications in communications, security, etc. The shortcircuit-based DAC consumes 1/3 the area of an alternative shift-register-based DAC that is presented for the sake of comparison.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125770085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A power-efficient adaptive heapsort for fpga-based image coding application (abstract only) 基于fpga的图像编码应用的高效自适应堆排序(摘要)
Yuhui Bai, S. Z. Ahmed, B. Granado
This paper presents an adaptive heap sort architecture for an image coding implementation on FPGA, which specifically addresses the issue of sorting different amount of data located in each subband during the coding. The proposed sorting architecture is easily scalable. Performance of the sorter only depends on the amount of data sorted. The efficient usage of dual port memories yields high throughput up to 50 Msamples/s and their adaptive trigger/shutdown provide the average dynamic power reduction up to 20.9%. We designed this architecture and incorporated it in our Adaptive Scanning of Wavelet Data (ASWD) module which reorganizes the wavelet coefficients into locally stationary sequences for a wavelet-based image encoder. We validated the hardware on an Altera's Stratix IV FPGA as an IP accelerator in a Nios II processor based System on Chip. The architectural innovations can also be exploited in other applications that require high throughput and scalable sorting. Our experiments show that compared to an embedded ARM CortexA9 processor running at 666 MHz, our architecture at 100 MHz can provide around 13X speedup while consuming 242 mW average core dynamic power.
本文提出了一种用于FPGA图像编码实现的自适应堆排序架构,该架构具体解决了编码过程中位于每个子带的不同数据量的排序问题。所建议的排序体系结构易于扩展。排序器的性能仅取决于排序的数据量。双端口存储器的有效使用可产生高达50 Msamples/s的高吞吐量,其自适应触发/关闭可提供高达20.9%的平均动态功耗降低。我们设计了这种结构,并将其整合到我们的小波数据自适应扫描(ASWD)模块中,该模块将小波系数重组为局部平稳序列,用于基于小波的图像编码器。我们在Altera的Stratix IV FPGA上验证了硬件作为基于Nios II处理器的片上系统的IP加速器。架构上的创新也可以用于其他需要高吞吐量和可扩展排序的应用程序。我们的实验表明,与运行在666 MHz的嵌入式ARM CortexA9处理器相比,我们的架构在100 MHz时可以提供大约13倍的加速,同时消耗242 mW的平均核心动态功率。
{"title":"A power-efficient adaptive heapsort for fpga-based image coding application (abstract only)","authors":"Yuhui Bai, S. Z. Ahmed, B. Granado","doi":"10.1145/2554688.2554746","DOIUrl":"https://doi.org/10.1145/2554688.2554746","url":null,"abstract":"This paper presents an adaptive heap sort architecture for an image coding implementation on FPGA, which specifically addresses the issue of sorting different amount of data located in each subband during the coding. The proposed sorting architecture is easily scalable. Performance of the sorter only depends on the amount of data sorted. The efficient usage of dual port memories yields high throughput up to 50 Msamples/s and their adaptive trigger/shutdown provide the average dynamic power reduction up to 20.9%. We designed this architecture and incorporated it in our Adaptive Scanning of Wavelet Data (ASWD) module which reorganizes the wavelet coefficients into locally stationary sequences for a wavelet-based image encoder. We validated the hardware on an Altera's Stratix IV FPGA as an IP accelerator in a Nios II processor based System on Chip. The architectural innovations can also be exploited in other applications that require high throughput and scalable sorting. Our experiments show that compared to an embedded ARM CortexA9 processor running at 666 MHz, our architecture at 100 MHz can provide around 13X speedup while consuming 242 mW average core dynamic power.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132479074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Implementing FPGA-based energy-efficient dense optical flow computation with high portability in C (abstract only) 基于fpga的高能效密集光流计算的C语言实现(仅摘要)
Zhibin Wang, Wenmin Yang, Jin Yu, Zhilei Chai
Optical flow computation is widely used in many video/image based applications such as motion detection, video compression etc. Dense optical flow field that provides more details of information is more useful in lots of applications. However, high-quality algorithms for dense optical flow computation are computationally expensive. For instance, on the ARM Cortex-A9 processor within ZYNQ, the popular linear variational method Combine-Brightness-Gradient (CBG), spends $26.68s per frame to compute optical flow when the image size is 640 x 480. It is difficult to be sped up especially when embedded systems with power constraints are considered. Poor portability is another factor to limit current implementations of optical flow computation to be used in more applications. In this paper, a high-performance, low-power FPGA-accelerated implementation of dense optical flow computation is presented. One high-quality dense optical flow method, the Combine-Brightness-Gradient model, is implemented. C code instead of VHDL/Verilog HDL is used to improve the productivity. Portability of the system is designed carefully for deploying it on different platforms conveniently. Experimental results show 12 fps and 0.38J per frame are achieved by this optical flow computing system when 640 x 480 image is used and optical flow for all pixels are computed. Furthermore, portability is demonstrated by implementing the optical flow algorithm on different heterogeneous platforms such as the ZYNQ-7000 SoC and the PC-FPGA platform with a Kintex-7 FPGA respectively.
光流计算广泛应用于许多基于视频/图像的应用,如运动检测、视频压缩等。密集的光流场提供了更多的细节信息,在许多应用中更为有用。然而,用于密集光流计算的高质量算法在计算上是昂贵的。例如,在ZYNQ的ARM Cortex-A9处理器上,当图像尺寸为640 x 480时,流行的线性变分方法组合亮度梯度(CBG)每帧花费26.68美元来计算光流。特别是在考虑功率限制的嵌入式系统时,很难加快速度。可移植性差是限制当前光流计算实现在更多应用中使用的另一个因素。本文提出了一种高性能、低功耗的密集光流计算fpga加速实现方案。实现了一种高质量的密集光流方法——组合亮度-梯度模型。使用C代码代替VHDL/Verilog HDL来提高生产率。系统的可移植性经过精心设计,便于在不同平台上部署。实验结果表明,当使用640 × 480图像并计算所有像素的光流时,该系统可实现12 fps和0.38J /帧的光流计算。此外,通过在不同的异构平台(如ZYNQ-7000 SoC和使用Kintex-7 FPGA的PC-FPGA平台)上实现光流算法,证明了该算法的可移植性。
{"title":"Implementing FPGA-based energy-efficient dense optical flow computation with high portability in C (abstract only)","authors":"Zhibin Wang, Wenmin Yang, Jin Yu, Zhilei Chai","doi":"10.1145/2554688.2554733","DOIUrl":"https://doi.org/10.1145/2554688.2554733","url":null,"abstract":"Optical flow computation is widely used in many video/image based applications such as motion detection, video compression etc. Dense optical flow field that provides more details of information is more useful in lots of applications. However, high-quality algorithms for dense optical flow computation are computationally expensive. For instance, on the ARM Cortex-A9 processor within ZYNQ, the popular linear variational method Combine-Brightness-Gradient (CBG), spends $26.68s per frame to compute optical flow when the image size is 640 x 480. It is difficult to be sped up especially when embedded systems with power constraints are considered. Poor portability is another factor to limit current implementations of optical flow computation to be used in more applications. In this paper, a high-performance, low-power FPGA-accelerated implementation of dense optical flow computation is presented. One high-quality dense optical flow method, the Combine-Brightness-Gradient model, is implemented. C code instead of VHDL/Verilog HDL is used to improve the productivity. Portability of the system is designed carefully for deploying it on different platforms conveniently. Experimental results show 12 fps and 0.38J per frame are achieved by this optical flow computing system when 640 x 480 image is used and optical flow for all pixels are computed. Furthermore, portability is demonstrated by implementing the optical flow algorithm on different heterogeneous platforms such as the ZYNQ-7000 SoC and the PC-FPGA platform with a Kintex-7 FPGA respectively.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130140292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1