Pub Date : 2013-12-01DOI: 10.1109/FPT.2013.6718333
Reetinder P. S. Sidhu
A novel and efficient approach to XML processing using FPGAs, based upon the sound theoretical formalism of tree automata, is presented. The approach enables the key tasks of schema validation and query to be performed in a unified manner. A remarkably simple implementation of a tree automaton in hardware, as a pair of interacting automata with the states of one forming the input to the other, is described. The implementation can process one XML token in at most two clock cycles. Also, the throughput is achieved for any schema grammar or query (that can be accommodated in the state tables) independent of its complexity. Further, use of tree automata offers greater expressive power for specifying schemas as well as queries than in previous hardware based approaches. Detailed performance evaluation demonstrates the significant throughput improvements of the proposed tree automata based approach compared with software as well as earlier FPGA based approaches. The implementation of XML schema validation on a mid-range FPGA provides sustained throughput from 1.7 to 3.1 Gbps, yielding a five to ten times speedup over an efficient software approach. Due to the very compact implementation, multiple instances can be utilized to further make significant improvements in throughput.
{"title":"High throughput, tree automata based XML processing using FPGAs","authors":"Reetinder P. S. Sidhu","doi":"10.1109/FPT.2013.6718333","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718333","url":null,"abstract":"A novel and efficient approach to XML processing using FPGAs, based upon the sound theoretical formalism of tree automata, is presented. The approach enables the key tasks of schema validation and query to be performed in a unified manner. A remarkably simple implementation of a tree automaton in hardware, as a pair of interacting automata with the states of one forming the input to the other, is described. The implementation can process one XML token in at most two clock cycles. Also, the throughput is achieved for any schema grammar or query (that can be accommodated in the state tables) independent of its complexity. Further, use of tree automata offers greater expressive power for specifying schemas as well as queries than in previous hardware based approaches. Detailed performance evaluation demonstrates the significant throughput improvements of the proposed tree automata based approach compared with software as well as earlier FPGA based approaches. The implementation of XML schema validation on a mid-range FPGA provides sustained throughput from 1.7 to 3.1 Gbps, yielding a five to ten times speedup over an efficient software approach. Due to the very compact implementation, multiple instances can be utilized to further make significant improvements in throughput.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128221359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-01DOI: 10.1109/FPT.2013.6718374
Zheng Huang, Lingli Wang, Yakov Nasikovskiy, A. Mishchenko
This paper proposes a fast algorithm for Boolean matching of completely specified Boolean functions. The algorithm is based on the NPN classification and can be applied on-the-fly to millions of small practical functions appearing in industrial designs, leading to runtime and memory reduction in logic synthesis and technology mapping. The algorithm is conceptually simpler, faster, and more scalable than previous work.
{"title":"Fast Boolean matching based on NPN classification","authors":"Zheng Huang, Lingli Wang, Yakov Nasikovskiy, A. Mishchenko","doi":"10.1109/FPT.2013.6718374","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718374","url":null,"abstract":"This paper proposes a fast algorithm for Boolean matching of completely specified Boolean functions. The algorithm is based on the NPN classification and can be applied on-the-fly to millions of small practical functions appearing in industrial designs, leading to runtime and memory reduction in logic synthesis and technology mapping. The algorithm is conceptually simpler, faster, and more scalable than previous work.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134372519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Fast Fourier Transform (FFT) has been one of the most popular and widely-used transform functions in communication hardware designs. With growing digital convergence, a single device needs to support multiple communication protocols, all of which need FFT computations. Currently, most FFT designs are either not shareable across applications or only among a fixed set of applications. This work proposes a novel reconfigurable FFT design called Spatio-Temporally-shAred Reconfigurable Fast Fourier Transform (STARFFT), which leverages on the partial dynamic reconfiguration technology such that it can be shared across arbitrary set of applications. STARFFT has a software driver that checks feasibility, schedules applications, and reconfigures the hardware. STARFFT hardware has several radix-2 pipelines that are time-multiplexed among applications such that significant reductions in hardware resource requirements and in power consumption are achieved. Experimental results show that STARFFT can reduce the total hardware resource usage by nearly 88% and the power consumption requirements by about 90%.
{"title":"Spatio-Temporally-Shared Reconfigurable Fast Fourier Transform architecture design","authors":"Hung-Lin Chao, Chun-Yang Peng, Cheng-Chien Wu, Ken-Shin Huang, Chun-Hsien Lu, Jih-Sheng Shen, Pao-Ann Hsiung","doi":"10.1109/FPT.2013.6718405","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718405","url":null,"abstract":"The Fast Fourier Transform (FFT) has been one of the most popular and widely-used transform functions in communication hardware designs. With growing digital convergence, a single device needs to support multiple communication protocols, all of which need FFT computations. Currently, most FFT designs are either not shareable across applications or only among a fixed set of applications. This work proposes a novel reconfigurable FFT design called Spatio-Temporally-shAred Reconfigurable Fast Fourier Transform (STARFFT), which leverages on the partial dynamic reconfiguration technology such that it can be shared across arbitrary set of applications. STARFFT has a software driver that checks feasibility, schedules applications, and reconfigures the hardware. STARFFT hardware has several radix-2 pipelines that are time-multiplexed among applications such that significant reductions in hardware resource requirements and in power consumption are achieved. Experimental results show that STARFFT can reduce the total hardware resource usage by nearly 88% and the power consumption requirements by about 90%.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132915777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A Distributed Denial-of-Service attack is an attempt to make a computer resource unavailable to its intended users. Typically, a large number of bots are triggered by an attacker simultaneously to create a huge load on a web server and bring it down. However, when processing SQL queries on a web server, owing to huge resource requirements, even a small number of queries from smaller set of bots can create huge load on the server. Such sophisticated application layer attacks go undetected by network security solutions under deployment today. Therefore, we propose an SQL DDoS Mitigator device that focuses on preventing such attacks targeting SQL database resources. It can parse packets at line speed, with a maximum latency of 20μs for detecting HTTP GET packets with embedded SQL queries. The query pattern information for requester IP addresses are stored in a red-black tree data structure. Clients crossing the limit of server load, dynamically set on the basis of server state, will be re-directed to a CAPTCHA server for identification of bots. The IPs confirmed as bots are black-listed for a configurable timeout period. The complete system, except the CAPTCHA server, is built on “Xilinx Virtex-II Pro 50” FPGA based NetFPGA-1G platform. The device achieved a throughput of 400 Kilo Packets/s in a 1 Gbps network.
分布式拒绝服务攻击是试图使计算机资源对其目标用户不可用。通常,攻击者会同时触发大量的僵尸程序,在web服务器上造成巨大的负载并使其崩溃。然而,当在web服务器上处理SQL查询时,由于巨大的资源需求,即使是来自较小bot集的少量查询也会在服务器上产生巨大的负载。目前部署的网络安全解决方案无法检测到这种复杂的应用层攻击。因此,我们提出了一种SQL DDoS缓解器设备,专注于防止针对SQL数据库资源的此类攻击。它可以以线速度解析数据包,对于使用嵌入式SQL查询检测HTTP GET数据包,最大延迟为20μs。请求者IP地址的查询模式信息存储在红黑树数据结构中。超过服务器负载限制的客户端(根据服务器状态动态设置)将被重定向到CAPTCHA服务器以识别机器人。确认为机器人的ip将在可配置的超时时间内被列入黑名单。除了CAPTCHA服务器外,整个系统都是建立在基于NetFPGA-1G平台的“Xilinx Virtex-II Pro 50”FPGA上。该设备在1gbps的网络中实现了400kpackets /s的吞吐量。
{"title":"Transparent FPGA based device for SQL DDoS mitigation","authors":"Karthikeyan Pandiyarajan, Srijith Haridas, Kuruvilla Varghese","doi":"10.1109/FPT.2013.6718334","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718334","url":null,"abstract":"A Distributed Denial-of-Service attack is an attempt to make a computer resource unavailable to its intended users. Typically, a large number of bots are triggered by an attacker simultaneously to create a huge load on a web server and bring it down. However, when processing SQL queries on a web server, owing to huge resource requirements, even a small number of queries from smaller set of bots can create huge load on the server. Such sophisticated application layer attacks go undetected by network security solutions under deployment today. Therefore, we propose an SQL DDoS Mitigator device that focuses on preventing such attacks targeting SQL database resources. It can parse packets at line speed, with a maximum latency of 20μs for detecting HTTP GET packets with embedded SQL queries. The query pattern information for requester IP addresses are stored in a red-black tree data structure. Clients crossing the limit of server load, dynamically set on the basis of server state, will be re-directed to a CAPTCHA server for identification of bots. The IPs confirmed as bots are black-listed for a configurable timeout period. The complete system, except the CAPTCHA server, is built on “Xilinx Virtex-II Pro 50” FPGA based NetFPGA-1G platform. The device achieved a throughput of 400 Kilo Packets/s in a 1 Gbps network.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"128 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133052332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-01DOI: 10.1109/FPT.2013.6718420
Will X. Y. Li, Shridhar Choudhary, R. Cheung, Takeshi Matsumoto, M. Fujita
A new simulation scheme of the Digital Spiking Silicon Neuron (DSSN) model is proposed. This scheme is based on the reconfigurable dataflow computing paradigm and targets the Maxeler MaxWorkstation. Compared to the previous implementation of the DSSN network, the new scheme has the virtues of better flexibility and better programmability. More importantly, computing with dataflow cores takes good advantage of the intrinsic parallelism of the reconfigurable hardware and better pipelining is achievable. The proposed scheme has good potential of conducting large-scale and fast simulation of the DSSN-model-based network which is pivotal to future neuroscience research.
{"title":"Fast simulation of Digital Spiking Silicon Neuron model employing reconfigurable dataflow computing","authors":"Will X. Y. Li, Shridhar Choudhary, R. Cheung, Takeshi Matsumoto, M. Fujita","doi":"10.1109/FPT.2013.6718420","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718420","url":null,"abstract":"A new simulation scheme of the Digital Spiking Silicon Neuron (DSSN) model is proposed. This scheme is based on the reconfigurable dataflow computing paradigm and targets the Maxeler MaxWorkstation. Compared to the previous implementation of the DSSN network, the new scheme has the virtues of better flexibility and better programmability. More importantly, computing with dataflow cores takes good advantage of the intrinsic parallelism of the reconfigurable hardware and better pipelining is achievable. The proposed scheme has good potential of conducting large-scale and fast simulation of the DSSN-model-based network which is pivotal to future neuroscience research.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133125704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-01DOI: 10.1109/FPT.2013.6718417
Shanker Shreejith, Suhaib A. Fahmy
Automotive systems function within a distributed computing paradigm consisting of networks of sensors, actuators and processing units. More advanced functions are finding their way into the automotive domain, making computing and networking more complex. While safety, security and determinism are primary concerns for many systems, the networking protocols do not provide extensions to address them and it is left to application designers to tackle these issues. Standardising simple features like time-stamping of messages and health status flags can help improve robustness and mitigate risks associated with replay attacks. It is also possible to integrate further protection, like encryption of messages, addressing the increasing security concerns in this domain. In this paper, we demonstrate a systematic way of accommodating such enhancements within a standard automotive network that retains interoperability with existing systems. We show how such enhancements can be made possible in both software and hardware to help add functionality above the core network specification. We also show that the enhancements incorporated at the hardware layer offers 20× better performance than the software-based approach.
{"title":"Enhancing communication on automotive networks using data layer extensions","authors":"Shanker Shreejith, Suhaib A. Fahmy","doi":"10.1109/FPT.2013.6718417","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718417","url":null,"abstract":"Automotive systems function within a distributed computing paradigm consisting of networks of sensors, actuators and processing units. More advanced functions are finding their way into the automotive domain, making computing and networking more complex. While safety, security and determinism are primary concerns for many systems, the networking protocols do not provide extensions to address them and it is left to application designers to tackle these issues. Standardising simple features like time-stamping of messages and health status flags can help improve robustness and mitigate risks associated with replay attacks. It is also possible to integrate further protection, like encryption of messages, addressing the increasing security concerns in this domain. In this paper, we demonstrate a systematic way of accommodating such enhancements within a standard automotive network that retains interoperability with existing systems. We show how such enhancements can be made possible in both software and hardware to help add functionality above the core network specification. We also show that the enhancements incorporated at the hardware layer offers 20× better performance than the software-based approach.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130339346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-01DOI: 10.1109/FPT.2013.6718406
Hiroki Nakahara, K. Iwai, H. Nakanishi
A radio telescope analyzes the radio frequency (RF) received from celestial objects. It consists of an antenna, a receiver, and a spectrometer. The spectrometer converts the time domain into the frequency domain with an FFT operation. A solar radio burst observation requires a high-speed FFT. This paper proposes a P parallel N point FFT for fixed point data based on a six-step algorithm. We analyze the hardware resources for the P parallel N point FFT. We implemented 32 parallel N point FFT circuits on a Xilinx Virtex 7 VC707 board. Comparison with the existing FFT implementations shows that the proposed one is 4.52-22.64 times faster.
{"title":"A high-speed FFT based on a six-step algorithm: Applied to a radio telescope for a solar radio burst","authors":"Hiroki Nakahara, K. Iwai, H. Nakanishi","doi":"10.1109/FPT.2013.6718406","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718406","url":null,"abstract":"A radio telescope analyzes the radio frequency (RF) received from celestial objects. It consists of an antenna, a receiver, and a spectrometer. The spectrometer converts the time domain into the frequency domain with an FFT operation. A solar radio burst observation requires a high-speed FFT. This paper proposes a P parallel N point FFT for fixed point data based on a six-step algorithm. We analyze the hardware resources for the P parallel N point FFT. We implemented 32 parallel N point FFT circuits on a Xilinx Virtex 7 VC707 board. Comparison with the existing FFT implementations shows that the proposed one is 4.52-22.64 times faster.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122695149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-01DOI: 10.1109/FPT.2013.6718395
Hong-rui Su, Weihan Wang, K. Kitamori, H. Amano
Leakage power is a serious problem especially for accerelators which use a large size Processing Element (PE) array. Here, a low power reconfigurable accelerator called Cool Mega Array (CMA) with back-gate bias control (CMA-bb) is implemented and evaluated. In CMA-bb, the back-gate bias of the microcontroller and PE array can be controlled independently. In the idle mode, reverse bias is given to the both parts to suppress the leakage current. When high performance is required, forward bias is used to increase the clock frequency. For simple applications, the operational power can be suppressed by using reverse bias only in the PE array. The real chip is implemented with a 65nm experimental process for low leakage applications. The evaluation results show that the leakage current can be suppressed to 300μA by using the reverse bias. The operational frequency is increased from 39MHz to 50MHz with up to 21% increase of operational power by using the forward bias. For simple applications, 8% to 9.4% of operational power is saved by giving reverse bias only to the PE array.
泄漏功率是一个严重的问题,特别是对于使用大尺寸处理元件(PE)阵列的加速器。本文实现并评估了一种低功耗可重构加速器,称为具有后门偏置控制(CMA-bb)的Cool Mega Array (CMA)。在CMA-bb中,单片机和PE阵列的后门偏置可以独立控制。在怠速模式下,对两个部分施加反向偏置以抑制漏电流。当需要高性能时,使用正向偏置来增加时钟频率。对于简单的应用,可以通过在PE阵列中使用反向偏置来抑制工作功率。真正的芯片采用65nm实验工艺实现,用于低泄漏应用。评价结果表明,采用反向偏置可以将漏电流抑制到300μA。通过使用正向偏置,工作频率从39MHz增加到50MHz,工作功率增加21%。对于简单的应用,仅给PE阵列提供反向偏置可以节省8%到9.4%的运行功率。
{"title":"A low power reconfigurable accelerator using a back-gate bias control technique","authors":"Hong-rui Su, Weihan Wang, K. Kitamori, H. Amano","doi":"10.1109/FPT.2013.6718395","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718395","url":null,"abstract":"Leakage power is a serious problem especially for accerelators which use a large size Processing Element (PE) array. Here, a low power reconfigurable accelerator called Cool Mega Array (CMA) with back-gate bias control (CMA-bb) is implemented and evaluated. In CMA-bb, the back-gate bias of the microcontroller and PE array can be controlled independently. In the idle mode, reverse bias is given to the both parts to suppress the leakage current. When high performance is required, forward bias is used to increase the clock frequency. For simple applications, the operational power can be suppressed by using reverse bias only in the PE array. The real chip is implemented with a 65nm experimental process for low leakage applications. The evaluation results show that the leakage current can be suppressed to 300μA by using the reverse bias. The operational frequency is increased from 39MHz to 50MHz with up to 21% increase of operational power by using the forward bias. For simple applications, 8% to 9.4% of operational power is saved by giving reverse bias only to the PE array.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"443 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115938365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-01DOI: 10.1109/FPT.2013.6718381
V. Kosar, M. Zádník, J. Korenek
Many algorithms have been proposed to accelerate regular expression matching via mapping of a nondeterministic finite automaton into a circuit implemented in an FPGA. These algorithms exploit unique features of the FPGA to achieve high throughput. On the other hand the FPGA poses a limit on the number of regular expressions by its limited resources. In this paper, we investigate applicability of NFA reduction techniques - a formal aparatus to reduce the number of states and transitions in NFA prior to its mapping into FPGA. The paper presents several NFA reduction techniques, each with a different reduction power and time complexity. The evaluation utilizes regular expressions from Snort and L7 decoder. The best NFA reduction algorithms achieve more than 66% reduction in the number of states for a Snort ftp module. Such a reduction translates directly into 66% LUT-FF pairs saving in the FPGA.
{"title":"NFA reduction for regular expressions matching using FPGA","authors":"V. Kosar, M. Zádník, J. Korenek","doi":"10.1109/FPT.2013.6718381","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718381","url":null,"abstract":"Many algorithms have been proposed to accelerate regular expression matching via mapping of a nondeterministic finite automaton into a circuit implemented in an FPGA. These algorithms exploit unique features of the FPGA to achieve high throughput. On the other hand the FPGA poses a limit on the number of regular expressions by its limited resources. In this paper, we investigate applicability of NFA reduction techniques - a formal aparatus to reduce the number of states and transitions in NFA prior to its mapping into FPGA. The paper presents several NFA reduction techniques, each with a different reduction power and time complexity. The evaluation utilizes regular expressions from Snort and L7 decoder. The best NFA reduction algorithms achieve more than 66% reduction in the number of states for a Snort ftp module. Such a reduction translates directly into 66% LUT-FF pairs saving in the FPGA.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130912332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-01DOI: 10.1109/FPT.2013.6718421
Peng Chen, Chao Wang, Xi Li, Xuehai Zhou
The Smith-Waterman is one of the most popular algorithms in the molecular sequence alignment. It is often used to find the best local alignment between two strings by calculating the similarity score of the pair of strings. The algorithm is of great potential to be parallelized and has been employed by a lot of FPGA-based solutions, mostly with the systolic array manner. However, the architecture designers always find the number of the process elements (PE) in their implementation quite limited by the resources available on the FPGA devices. They either make decomposition or fold the implementation of their applications when facing a large requirement for the process elements number. In this paper, we put forward a novel FPGA-based architecture which could address the problem with a bounded number of PEs to realize any lengths of systolic array. It is mainly based on the idea of the banded Smith-Waterman but with a key distinguish that it reuses the PEs which are beyond the boundary. Analysis shows that the approach is as fast as the normal systolic fabric and obtains quite considerable resource reduction.
{"title":"Hardware acceleration for the banded Smith-Waterman algorithm with the cycled systolic array","authors":"Peng Chen, Chao Wang, Xi Li, Xuehai Zhou","doi":"10.1109/FPT.2013.6718421","DOIUrl":"https://doi.org/10.1109/FPT.2013.6718421","url":null,"abstract":"The Smith-Waterman is one of the most popular algorithms in the molecular sequence alignment. It is often used to find the best local alignment between two strings by calculating the similarity score of the pair of strings. The algorithm is of great potential to be parallelized and has been employed by a lot of FPGA-based solutions, mostly with the systolic array manner. However, the architecture designers always find the number of the process elements (PE) in their implementation quite limited by the resources available on the FPGA devices. They either make decomposition or fold the implementation of their applications when facing a large requirement for the process elements number. In this paper, we put forward a novel FPGA-based architecture which could address the problem with a bounded number of PEs to realize any lengths of systolic array. It is mainly based on the idea of the banded Smith-Waterman but with a key distinguish that it reuses the PEs which are beyond the boundary. Analysis shows that the approach is as fast as the normal systolic fabric and obtains quite considerable resource reduction.","PeriodicalId":344469,"journal":{"name":"2013 International Conference on Field-Programmable Technology (FPT)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127202446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}