首页 > 最新文献

2013 International Conference on Field-Programmable Technology (FPT)最新文献

英文 中文
High throughput, tree automata based XML processing using FPGAs 高吞吐量,基于树自动机的XML处理使用fpga
Pub Date : 2013-12-01 DOI: 10.1109/FPT.2013.6718333
Reetinder P. S. Sidhu
A novel and efficient approach to XML processing using FPGAs, based upon the sound theoretical formalism of tree automata, is presented. The approach enables the key tasks of schema validation and query to be performed in a unified manner. A remarkably simple implementation of a tree automaton in hardware, as a pair of interacting automata with the states of one forming the input to the other, is described. The implementation can process one XML token in at most two clock cycles. Also, the throughput is achieved for any schema grammar or query (that can be accommodated in the state tables) independent of its complexity. Further, use of tree automata offers greater expressive power for specifying schemas as well as queries than in previous hardware based approaches. Detailed performance evaluation demonstrates the significant throughput improvements of the proposed tree automata based approach compared with software as well as earlier FPGA based approaches. The implementation of XML schema validation on a mid-range FPGA provides sustained throughput from 1.7 to 3.1 Gbps, yielding a five to ten times speedup over an efficient software approach. Due to the very compact implementation, multiple instances can be utilized to further make significant improvements in throughput.
在树形自动机完备的理论形式化基础上,提出了一种利用fpga处理XML的新颖有效的方法。该方法允许以统一的方式执行模式验证和查询的关键任务。描述了硬件中树形自动机的一个非常简单的实现,作为一对相互作用的自动机,其中一个的状态形成另一个的输入。该实现最多可以在两个时钟周期内处理一个XML令牌。此外,对于任何模式语法或查询(可以容纳在状态表中),吞吐量都是独立于其复杂性而实现的。此外,与以前基于硬件的方法相比,使用树自动机为指定模式和查询提供了更强的表达能力。详细的性能评估表明,与软件和早期基于FPGA的方法相比,所提出的基于树自动机的方法显着提高了吞吐量。在中档FPGA上实现XML模式验证提供了从1.7到3.1 Gbps的持续吞吐量,比有效的软件方法产生5到10倍的加速。由于实现非常紧凑,可以利用多个实例进一步显著提高吞吐量。
{"title":"High throughput, tree automata based XML processing using FPGAs","authors":"Reetinder P. S. Sidhu","doi":"10.1109/FPT.2013.6718333","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718333","url":null,"abstract":"A novel and efficient approach to XML processing using FPGAs, based upon the sound theoretical formalism of tree automata, is presented. The approach enables the key tasks of schema validation and query to be performed in a unified manner. A remarkably simple implementation of a tree automaton in hardware, as a pair of interacting automata with the states of one forming the input to the other, is described. The implementation can process one XML token in at most two clock cycles. Also, the throughput is achieved for any schema grammar or query (that can be accommodated in the state tables) independent of its complexity. Further, use of tree automata offers greater expressive power for specifying schemas as well as queries than in previous hardware based approaches. Detailed performance evaluation demonstrates the significant throughput improvements of the proposed tree automata based approach compared with software as well as earlier FPGA based approaches. The implementation of XML schema validation on a mid-range FPGA provides sustained throughput from 1.7 to 3.1 Gbps, yielding a five to ten times speedup over an efficient software approach. Due to the very compact implementation, multiple instances can be utilized to further make significant improvements in throughput.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128221359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Fast Boolean matching based on NPN classification 基于NPN分类的快速布尔匹配
Pub Date : 2013-12-01 DOI: 10.1109/FPT.2013.6718374
Zheng Huang, Lingli Wang, Yakov Nasikovskiy, A. Mishchenko
This paper proposes a fast algorithm for Boolean matching of completely specified Boolean functions. The algorithm is based on the NPN classification and can be applied on-the-fly to millions of small practical functions appearing in industrial designs, leading to runtime and memory reduction in logic synthesis and technology mapping. The algorithm is conceptually simpler, faster, and more scalable than previous work.
提出了一种快速的完全指定布尔函数的布尔匹配算法。该算法基于NPN分类,可实时应用于工业设计中出现的数百万个小型实用功能,从而减少逻辑合成和技术映射的运行时间和内存。该算法在概念上比以前的工作更简单、更快、更具可扩展性。
{"title":"Fast Boolean matching based on NPN classification","authors":"Zheng Huang, Lingli Wang, Yakov Nasikovskiy, A. Mishchenko","doi":"10.1109/FPT.2013.6718374","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718374","url":null,"abstract":"This paper proposes a fast algorithm for Boolean matching of completely specified Boolean functions. The algorithm is based on the NPN classification and can be applied on-the-fly to millions of small practical functions appearing in industrial designs, leading to runtime and memory reduction in logic synthesis and technology mapping. The algorithm is conceptually simpler, faster, and more scalable than previous work.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134372519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
Spatio-Temporally-Shared Reconfigurable Fast Fourier Transform architecture design 时空共享可重构快速傅立叶变换体系结构设计
Pub Date : 2013-12-01 DOI: 10.1109/FPT.2013.6718405
Hung-Lin Chao, Chun-Yang Peng, Cheng-Chien Wu, Ken-Shin Huang, Chun-Hsien Lu, Jih-Sheng Shen, Pao-Ann Hsiung
The Fast Fourier Transform (FFT) has been one of the most popular and widely-used transform functions in communication hardware designs. With growing digital convergence, a single device needs to support multiple communication protocols, all of which need FFT computations. Currently, most FFT designs are either not shareable across applications or only among a fixed set of applications. This work proposes a novel reconfigurable FFT design called Spatio-Temporally-shAred Reconfigurable Fast Fourier Transform (STARFFT), which leverages on the partial dynamic reconfiguration technology such that it can be shared across arbitrary set of applications. STARFFT has a software driver that checks feasibility, schedules applications, and reconfigures the hardware. STARFFT hardware has several radix-2 pipelines that are time-multiplexed among applications such that significant reductions in hardware resource requirements and in power consumption are achieved. Experimental results show that STARFFT can reduce the total hardware resource usage by nearly 88% and the power consumption requirements by about 90%.
快速傅里叶变换(FFT)是通信硬件设计中应用最广泛的变换函数之一。随着数字融合的发展,单个设备需要支持多种通信协议,所有这些都需要FFT计算。目前,大多数FFT设计要么不能跨应用程序共享,要么只能在一组固定的应用程序之间共享。这项工作提出了一种新的可重构FFT设计,称为时空共享可重构快速傅立叶变换(STARFFT),它利用部分动态重构技术,使其可以在任意应用程序集之间共享。STARFFT有一个软件驱动程序,用于检查可行性、调度应用程序和重新配置硬件。STARFFT硬件有几个基数-2管道,这些管道在应用程序之间进行时间复用,从而大大减少了硬件资源需求和功耗。实验结果表明,STARFFT可以将总硬件资源使用减少近88%,功耗要求减少约90%。
{"title":"Spatio-Temporally-Shared Reconfigurable Fast Fourier Transform architecture design","authors":"Hung-Lin Chao, Chun-Yang Peng, Cheng-Chien Wu, Ken-Shin Huang, Chun-Hsien Lu, Jih-Sheng Shen, Pao-Ann Hsiung","doi":"10.1109/FPT.2013.6718405","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718405","url":null,"abstract":"The Fast Fourier Transform (FFT) has been one of the most popular and widely-used transform functions in communication hardware designs. With growing digital convergence, a single device needs to support multiple communication protocols, all of which need FFT computations. Currently, most FFT designs are either not shareable across applications or only among a fixed set of applications. This work proposes a novel reconfigurable FFT design called Spatio-Temporally-shAred Reconfigurable Fast Fourier Transform (STARFFT), which leverages on the partial dynamic reconfiguration technology such that it can be shared across arbitrary set of applications. STARFFT has a software driver that checks feasibility, schedules applications, and reconfigures the hardware. STARFFT hardware has several radix-2 pipelines that are time-multiplexed among applications such that significant reductions in hardware resource requirements and in power consumption are achieved. Experimental results show that STARFFT can reduce the total hardware resource usage by nearly 88% and the power consumption requirements by about 90%.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132915777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Transparent FPGA based device for SQL DDoS mitigation 基于透明FPGA的SQL DDoS缓解设备
Pub Date : 2013-12-01 DOI: 10.1109/FPT.2013.6718334
Karthikeyan Pandiyarajan, Srijith Haridas, Kuruvilla Varghese
A Distributed Denial-of-Service attack is an attempt to make a computer resource unavailable to its intended users. Typically, a large number of bots are triggered by an attacker simultaneously to create a huge load on a web server and bring it down. However, when processing SQL queries on a web server, owing to huge resource requirements, even a small number of queries from smaller set of bots can create huge load on the server. Such sophisticated application layer attacks go undetected by network security solutions under deployment today. Therefore, we propose an SQL DDoS Mitigator device that focuses on preventing such attacks targeting SQL database resources. It can parse packets at line speed, with a maximum latency of 20μs for detecting HTTP GET packets with embedded SQL queries. The query pattern information for requester IP addresses are stored in a red-black tree data structure. Clients crossing the limit of server load, dynamically set on the basis of server state, will be re-directed to a CAPTCHA server for identification of bots. The IPs confirmed as bots are black-listed for a configurable timeout period. The complete system, except the CAPTCHA server, is built on “Xilinx Virtex-II Pro 50” FPGA based NetFPGA-1G platform. The device achieved a throughput of 400 Kilo Packets/s in a 1 Gbps network.
分布式拒绝服务攻击是试图使计算机资源对其目标用户不可用。通常,攻击者会同时触发大量的僵尸程序,在web服务器上造成巨大的负载并使其崩溃。然而,当在web服务器上处理SQL查询时,由于巨大的资源需求,即使是来自较小bot集的少量查询也会在服务器上产生巨大的负载。目前部署的网络安全解决方案无法检测到这种复杂的应用层攻击。因此,我们提出了一种SQL DDoS缓解器设备,专注于防止针对SQL数据库资源的此类攻击。它可以以线速度解析数据包,对于使用嵌入式SQL查询检测HTTP GET数据包,最大延迟为20μs。请求者IP地址的查询模式信息存储在红黑树数据结构中。超过服务器负载限制的客户端(根据服务器状态动态设置)将被重定向到CAPTCHA服务器以识别机器人。确认为机器人的ip将在可配置的超时时间内被列入黑名单。除了CAPTCHA服务器外,整个系统都是建立在基于NetFPGA-1G平台的“Xilinx Virtex-II Pro 50”FPGA上。该设备在1gbps的网络中实现了400kpackets /s的吞吐量。
{"title":"Transparent FPGA based device for SQL DDoS mitigation","authors":"Karthikeyan Pandiyarajan, Srijith Haridas, Kuruvilla Varghese","doi":"10.1109/FPT.2013.6718334","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718334","url":null,"abstract":"A Distributed Denial-of-Service attack is an attempt to make a computer resource unavailable to its intended users. Typically, a large number of bots are triggered by an attacker simultaneously to create a huge load on a web server and bring it down. However, when processing SQL queries on a web server, owing to huge resource requirements, even a small number of queries from smaller set of bots can create huge load on the server. Such sophisticated application layer attacks go undetected by network security solutions under deployment today. Therefore, we propose an SQL DDoS Mitigator device that focuses on preventing such attacks targeting SQL database resources. It can parse packets at line speed, with a maximum latency of 20μs for detecting HTTP GET packets with embedded SQL queries. The query pattern information for requester IP addresses are stored in a red-black tree data structure. Clients crossing the limit of server load, dynamically set on the basis of server state, will be re-directed to a CAPTCHA server for identification of bots. The IPs confirmed as bots are black-listed for a configurable timeout period. The complete system, except the CAPTCHA server, is built on “Xilinx Virtex-II Pro 50” FPGA based NetFPGA-1G platform. The device achieved a throughput of 400 Kilo Packets/s in a 1 Gbps network.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"128 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133052332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Fast simulation of Digital Spiking Silicon Neuron model employing reconfigurable dataflow computing 基于可重构数据流计算的数字脉冲硅神经元模型快速仿真
Pub Date : 2013-12-01 DOI: 10.1109/FPT.2013.6718420
Will X. Y. Li, Shridhar Choudhary, R. Cheung, Takeshi Matsumoto, M. Fujita
A new simulation scheme of the Digital Spiking Silicon Neuron (DSSN) model is proposed. This scheme is based on the reconfigurable dataflow computing paradigm and targets the Maxeler MaxWorkstation. Compared to the previous implementation of the DSSN network, the new scheme has the virtues of better flexibility and better programmability. More importantly, computing with dataflow cores takes good advantage of the intrinsic parallelism of the reconfigurable hardware and better pipelining is achievable. The proposed scheme has good potential of conducting large-scale and fast simulation of the DSSN-model-based network which is pivotal to future neuroscience research.
提出了一种新的数字脉冲硅神经元(dsn)模型仿真方案。该方案基于可重构数据流计算范式,以Maxeler MaxWorkstation为目标。与以往的dsn网络实现方案相比,新方案具有更好的灵活性和可编程性。更重要的是,数据流核计算充分利用了可重构硬件固有的并行性,可以实现更好的流水线化。该方案具有对基于dssn模型的神经网络进行大规模、快速仿真的良好潜力,对未来神经科学研究具有重要意义。
{"title":"Fast simulation of Digital Spiking Silicon Neuron model employing reconfigurable dataflow computing","authors":"Will X. Y. Li, Shridhar Choudhary, R. Cheung, Takeshi Matsumoto, M. Fujita","doi":"10.1109/FPT.2013.6718420","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718420","url":null,"abstract":"A new simulation scheme of the Digital Spiking Silicon Neuron (DSSN) model is proposed. This scheme is based on the reconfigurable dataflow computing paradigm and targets the Maxeler MaxWorkstation. Compared to the previous implementation of the DSSN network, the new scheme has the virtues of better flexibility and better programmability. More importantly, computing with dataflow cores takes good advantage of the intrinsic parallelism of the reconfigurable hardware and better pipelining is achievable. The proposed scheme has good potential of conducting large-scale and fast simulation of the DSSN-model-based network which is pivotal to future neuroscience research.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133125704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Enhancing communication on automotive networks using data layer extensions 使用数据层扩展增强汽车网络上的通信
Pub Date : 2013-12-01 DOI: 10.1109/FPT.2013.6718417
Shanker Shreejith, Suhaib A. Fahmy
Automotive systems function within a distributed computing paradigm consisting of networks of sensors, actuators and processing units. More advanced functions are finding their way into the automotive domain, making computing and networking more complex. While safety, security and determinism are primary concerns for many systems, the networking protocols do not provide extensions to address them and it is left to application designers to tackle these issues. Standardising simple features like time-stamping of messages and health status flags can help improve robustness and mitigate risks associated with replay attacks. It is also possible to integrate further protection, like encryption of messages, addressing the increasing security concerns in this domain. In this paper, we demonstrate a systematic way of accommodating such enhancements within a standard automotive network that retains interoperability with existing systems. We show how such enhancements can be made possible in both software and hardware to help add functionality above the core network specification. We also show that the enhancements incorporated at the hardware layer offers 20× better performance than the software-based approach.
汽车系统在由传感器、执行器和处理单元网络组成的分布式计算范式中运行。更高级的功能正在进入汽车领域,使计算和网络变得更加复杂。虽然安全性、安全性和确定性是许多系统的主要关注点,但网络协议没有提供解决这些问题的扩展,而是留给应用程序设计人员来解决这些问题。对消息的时间戳和健康状态标志等简单特性进行标准化可以帮助提高健壮性并降低与重放攻击相关的风险。还可以集成进一步的保护,如消息加密,以解决该领域中日益增加的安全问题。在本文中,我们演示了一种在标准汽车网络中容纳这种增强的系统方法,该网络保留了与现有系统的互操作性。我们将展示如何在软件和硬件中实现这种增强,以帮助在核心网络规范之上添加功能。我们还表明,在硬件层合并的增强提供了比基于软件的方法好20倍的性能。
{"title":"Enhancing communication on automotive networks using data layer extensions","authors":"Shanker Shreejith, Suhaib A. Fahmy","doi":"10.1109/FPT.2013.6718417","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718417","url":null,"abstract":"Automotive systems function within a distributed computing paradigm consisting of networks of sensors, actuators and processing units. More advanced functions are finding their way into the automotive domain, making computing and networking more complex. While safety, security and determinism are primary concerns for many systems, the networking protocols do not provide extensions to address them and it is left to application designers to tackle these issues. Standardising simple features like time-stamping of messages and health status flags can help improve robustness and mitigate risks associated with replay attacks. It is also possible to integrate further protection, like encryption of messages, addressing the increasing security concerns in this domain. In this paper, we demonstrate a systematic way of accommodating such enhancements within a standard automotive network that retains interoperability with existing systems. We show how such enhancements can be made possible in both software and hardware to help add functionality above the core network specification. We also show that the enhancements incorporated at the hardware layer offers 20× better performance than the software-based approach.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130339346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A high-speed FFT based on a six-step algorithm: Applied to a radio telescope for a solar radio burst 基于六步算法的高速FFT:应用于观测太阳射电暴的射电望远镜
Pub Date : 2013-12-01 DOI: 10.1109/FPT.2013.6718406
Hiroki Nakahara, K. Iwai, H. Nakanishi
A radio telescope analyzes the radio frequency (RF) received from celestial objects. It consists of an antenna, a receiver, and a spectrometer. The spectrometer converts the time domain into the frequency domain with an FFT operation. A solar radio burst observation requires a high-speed FFT. This paper proposes a P parallel N point FFT for fixed point data based on a six-step algorithm. We analyze the hardware resources for the P parallel N point FFT. We implemented 32 parallel N point FFT circuits on a Xilinx Virtex 7 VC707 board. Comparison with the existing FFT implementations shows that the proposed one is 4.52-22.64 times faster.
射电望远镜分析从天体接收的射频(RF)。它由天线、接收器和分光仪组成。光谱仪通过FFT运算将时域转换为频域。太阳射电暴观测需要高速FFT。本文提出了一种基于六步算法的不动点数据的P并行N点FFT。分析了P并行N点FFT的硬件资源。我们在Xilinx Virtex 7 VC707板上实现了32个并行N点FFT电路。与现有的FFT实现相比,该算法的速度提高了4.52 ~ 22.64倍。
{"title":"A high-speed FFT based on a six-step algorithm: Applied to a radio telescope for a solar radio burst","authors":"Hiroki Nakahara, K. Iwai, H. Nakanishi","doi":"10.1109/FPT.2013.6718406","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718406","url":null,"abstract":"A radio telescope analyzes the radio frequency (RF) received from celestial objects. It consists of an antenna, a receiver, and a spectrometer. The spectrometer converts the time domain into the frequency domain with an FFT operation. A solar radio burst observation requires a high-speed FFT. This paper proposes a P parallel N point FFT for fixed point data based on a six-step algorithm. We analyze the hardware resources for the P parallel N point FFT. We implemented 32 parallel N point FFT circuits on a Xilinx Virtex 7 VC707 board. Comparison with the existing FFT implementations shows that the proposed one is 4.52-22.64 times faster.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122695149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A low power reconfigurable accelerator using a back-gate bias control technique 采用后门偏置控制技术的低功率可重构加速器
Pub Date : 2013-12-01 DOI: 10.1109/FPT.2013.6718395
Hong-rui Su, Weihan Wang, K. Kitamori, H. Amano
Leakage power is a serious problem especially for accerelators which use a large size Processing Element (PE) array. Here, a low power reconfigurable accelerator called Cool Mega Array (CMA) with back-gate bias control (CMA-bb) is implemented and evaluated. In CMA-bb, the back-gate bias of the microcontroller and PE array can be controlled independently. In the idle mode, reverse bias is given to the both parts to suppress the leakage current. When high performance is required, forward bias is used to increase the clock frequency. For simple applications, the operational power can be suppressed by using reverse bias only in the PE array. The real chip is implemented with a 65nm experimental process for low leakage applications. The evaluation results show that the leakage current can be suppressed to 300μA by using the reverse bias. The operational frequency is increased from 39MHz to 50MHz with up to 21% increase of operational power by using the forward bias. For simple applications, 8% to 9.4% of operational power is saved by giving reverse bias only to the PE array.
泄漏功率是一个严重的问题,特别是对于使用大尺寸处理元件(PE)阵列的加速器。本文实现并评估了一种低功耗可重构加速器,称为具有后门偏置控制(CMA-bb)的Cool Mega Array (CMA)。在CMA-bb中,单片机和PE阵列的后门偏置可以独立控制。在怠速模式下,对两个部分施加反向偏置以抑制漏电流。当需要高性能时,使用正向偏置来增加时钟频率。对于简单的应用,可以通过在PE阵列中使用反向偏置来抑制工作功率。真正的芯片采用65nm实验工艺实现,用于低泄漏应用。评价结果表明,采用反向偏置可以将漏电流抑制到300μA。通过使用正向偏置,工作频率从39MHz增加到50MHz,工作功率增加21%。对于简单的应用,仅给PE阵列提供反向偏置可以节省8%到9.4%的运行功率。
{"title":"A low power reconfigurable accelerator using a back-gate bias control technique","authors":"Hong-rui Su, Weihan Wang, K. Kitamori, H. Amano","doi":"10.1109/FPT.2013.6718395","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718395","url":null,"abstract":"Leakage power is a serious problem especially for accerelators which use a large size Processing Element (PE) array. Here, a low power reconfigurable accelerator called Cool Mega Array (CMA) with back-gate bias control (CMA-bb) is implemented and evaluated. In CMA-bb, the back-gate bias of the microcontroller and PE array can be controlled independently. In the idle mode, reverse bias is given to the both parts to suppress the leakage current. When high performance is required, forward bias is used to increase the clock frequency. For simple applications, the operational power can be suppressed by using reverse bias only in the PE array. The real chip is implemented with a 65nm experimental process for low leakage applications. The evaluation results show that the leakage current can be suppressed to 300μA by using the reverse bias. The operational frequency is increased from 39MHz to 50MHz with up to 21% increase of operational power by using the forward bias. For simple applications, 8% to 9.4% of operational power is saved by giving reverse bias only to the PE array.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"443 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115938365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
NFA reduction for regular expressions matching using FPGA 基于FPGA的正则表达式匹配NFA约简
Pub Date : 2013-12-01 DOI: 10.1109/FPT.2013.6718381
V. Kosar, M. Zádník, J. Korenek
Many algorithms have been proposed to accelerate regular expression matching via mapping of a nondeterministic finite automaton into a circuit implemented in an FPGA. These algorithms exploit unique features of the FPGA to achieve high throughput. On the other hand the FPGA poses a limit on the number of regular expressions by its limited resources. In this paper, we investigate applicability of NFA reduction techniques - a formal aparatus to reduce the number of states and transitions in NFA prior to its mapping into FPGA. The paper presents several NFA reduction techniques, each with a different reduction power and time complexity. The evaluation utilizes regular expressions from Snort and L7 decoder. The best NFA reduction algorithms achieve more than 66% reduction in the number of states for a Snort ftp module. Such a reduction translates directly into 66% LUT-FF pairs saving in the FPGA.
已经提出了许多算法,通过将不确定性有限自动机映射到FPGA中实现的电路来加速正则表达式匹配。这些算法利用FPGA的独特特性来实现高吞吐量。另一方面,FPGA有限的资源限制了正则表达式的数量。在本文中,我们研究了NFA缩减技术的适用性-一种在NFA映射到FPGA之前减少其状态和转换数量的正式设备。本文介绍了几种NFA约简技术,每种技术具有不同的约简能力和时间复杂度。计算使用Snort和L7解码器中的正则表达式。最好的NFA减少算法可以使Snort ftp模块的状态数量减少66%以上。这样的减少直接转化为在FPGA中节省66%的LUT-FF对。
{"title":"NFA reduction for regular expressions matching using FPGA","authors":"V. Kosar, M. Zádník, J. Korenek","doi":"10.1109/FPT.2013.6718381","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718381","url":null,"abstract":"Many algorithms have been proposed to accelerate regular expression matching via mapping of a nondeterministic finite automaton into a circuit implemented in an FPGA. These algorithms exploit unique features of the FPGA to achieve high throughput. On the other hand the FPGA poses a limit on the number of regular expressions by its limited resources. In this paper, we investigate applicability of NFA reduction techniques - a formal aparatus to reduce the number of states and transitions in NFA prior to its mapping into FPGA. The paper presents several NFA reduction techniques, each with a different reduction power and time complexity. The evaluation utilizes regular expressions from Snort and L7 decoder. The best NFA reduction algorithms achieve more than 66% reduction in the number of states for a Snort ftp module. Such a reduction translates directly into 66% LUT-FF pairs saving in the FPGA.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130912332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Hardware acceleration for the banded Smith-Waterman algorithm with the cycled systolic array 循环收缩阵列带状Smith-Waterman算法的硬件加速
Pub Date : 2013-12-01 DOI: 10.1109/FPT.2013.6718421
Peng Chen, Chao Wang, Xi Li, Xuehai Zhou
The Smith-Waterman is one of the most popular algorithms in the molecular sequence alignment. It is often used to find the best local alignment between two strings by calculating the similarity score of the pair of strings. The algorithm is of great potential to be parallelized and has been employed by a lot of FPGA-based solutions, mostly with the systolic array manner. However, the architecture designers always find the number of the process elements (PE) in their implementation quite limited by the resources available on the FPGA devices. They either make decomposition or fold the implementation of their applications when facing a large requirement for the process elements number. In this paper, we put forward a novel FPGA-based architecture which could address the problem with a bounded number of PEs to realize any lengths of systolic array. It is mainly based on the idea of the banded Smith-Waterman but with a key distinguish that it reuses the PEs which are beyond the boundary. Analysis shows that the approach is as fast as the normal systolic fabric and obtains quite considerable resource reduction.
Smith-Waterman算法是分子序列比对中最流行的算法之一。它通常通过计算字符串对的相似度分数来寻找两个字符串之间的最佳局部对齐。该算法具有很大的并行化潜力,并已被许多基于fpga的解决方案所采用,大多采用收缩阵列方式。然而,架构设计人员总是发现,在他们的实现中,过程元素(PE)的数量受到FPGA设备上可用资源的限制。当面临对过程元素数量的大量需求时,他们要么分解应用程序,要么折叠应用程序的实现。在本文中,我们提出了一种新的基于fpga的架构,该架构可以解决pe数量有限的问题,以实现任意长度的收缩阵列。它主要基于带状史密斯-沃特曼的思想,但有一个关键的区别,即它重用了超出边界的pe。分析表明,该方法与常规收缩织物一样快,并获得相当可观的资源节约。
{"title":"Hardware acceleration for the banded Smith-Waterman algorithm with the cycled systolic array","authors":"Peng Chen, Chao Wang, Xi Li, Xuehai Zhou","doi":"10.1109/FPT.2013.6718421","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718421","url":null,"abstract":"The Smith-Waterman is one of the most popular algorithms in the molecular sequence alignment. It is often used to find the best local alignment between two strings by calculating the similarity score of the pair of strings. The algorithm is of great potential to be parallelized and has been employed by a lot of FPGA-based solutions, mostly with the systolic array manner. However, the architecture designers always find the number of the process elements (PE) in their implementation quite limited by the resources available on the FPGA devices. They either make decomposition or fold the implementation of their applications when facing a large requirement for the process elements number. In this paper, we put forward a novel FPGA-based architecture which could address the problem with a bounded number of PEs to realize any lengths of systolic array. It is mainly based on the idea of the banded Smith-Waterman but with a key distinguish that it reuses the PEs which are beyond the boundary. Analysis shows that the approach is as fast as the normal systolic fabric and obtains quite considerable resource reduction.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127202446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
期刊
2013 International Conference on Field-Programmable Technology (FPT)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1