The OpenFlow framework allows the data plane of a network switch to be managed by a software-based controller. This enables a software-defined networking model in which sophisticated network management policies can be deployed. In this paper, we present an FPGA-based switch which is fully-compliant with OpenFlow 1.0, and meets the 10 Gbps line rate. The switch design is both modular and highly parametrized. It has generic split-transaction interfaces and isolated platform-specific features, making it both flexible for architectural exploration and portable across FPGA platforms. The flow tables in the switch can be implemented on Block RAM or DRAM without any modifications to the rest of the design. The switch has been ported to the NetFPGA-10G, the ML605 and the DE4 boards. It can be integrated with a Desktop PC via either the PCIe or the serial link, and with an FPGA-based MIPS64 softcore as a coprocessor. The latter FPGA-based switch-processor system provides an ideal platform for network research in which both the data plane and the control plane can be explored.
{"title":"Enabling Hardware Exploration in Software-Defined Networking: A Flexible, Portable OpenFlow Switch","authors":"Asif Khan, Nirav H. Dave","doi":"10.1109/FCCM.2013.15","DOIUrl":"https://doi.org/10.1109/FCCM.2013.15","url":null,"abstract":"The OpenFlow framework allows the data plane of a network switch to be managed by a software-based controller. This enables a software-defined networking model in which sophisticated network management policies can be deployed. In this paper, we present an FPGA-based switch which is fully-compliant with OpenFlow 1.0, and meets the 10 Gbps line rate. The switch design is both modular and highly parametrized. It has generic split-transaction interfaces and isolated platform-specific features, making it both flexible for architectural exploration and portable across FPGA platforms. The flow tables in the switch can be implemented on Block RAM or DRAM without any modifications to the rest of the design. The switch has been ported to the NetFPGA-10G, the ML605 and the DE4 boards. It can be integrated with a Desktop PC via either the PCIe or the serial link, and with an FPGA-based MIPS64 softcore as a coprocessor. The latter FPGA-based switch-processor system provides an ideal platform for network research in which both the data plane and the control plane can be explored.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117027890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marco Ceriani, G. Palermo, Simone Secchi, Antonino Tumeo, Oreste Villa
We propose an intermediate approach between full custom hardware systems and full-software tools. Figure 1 shows the overview of the proposed architecture. We start from an off-the-shelf architecture composed of simple, in-order cores and an on-chip interconnection. The onchip interconnection interfaces the processing core with the memory controller for the external memory (DDR3) and the shared I/O peripherals. We add three custom components: the Global Memory Access Scheduler (GMAS), the Global Network Interface (GNI) and the Global SYNChronization module (GSYNC). The GMAS enables support for the scrambled address space. It also implements part of the support latency tolerance, storing remote memory operations, and acts as a scheduler for lightweight software multithreading.
{"title":"Exploring Manycore Multinode Systems for Irregular Applications with FPGA Prototyping","authors":"Marco Ceriani, G. Palermo, Simone Secchi, Antonino Tumeo, Oreste Villa","doi":"10.1109/FCCM.2013.62","DOIUrl":"https://doi.org/10.1109/FCCM.2013.62","url":null,"abstract":"We propose an intermediate approach between full custom hardware systems and full-software tools. Figure 1 shows the overview of the proposed architecture. We start from an off-the-shelf architecture composed of simple, in-order cores and an on-chip interconnection. The onchip interconnection interfaces the processing core with the memory controller for the external memory (DDR3) and the shared I/O peripherals. We add three custom components: the Global Memory Access Scheduler (GMAS), the Global Network Interface (GNI) and the Global SYNChronization module (GSYNC). The GMAS enables support for the scrambled address space. It also implements part of the support latency tolerance, storing remote memory operations, and acts as a scheduler for lightweight software multithreading.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133337787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Halstead, Bharat Sukhwani, Hong Min, Mathew S. Thoennes, Parijat Dube, S. Asaad, B. Iyer
In this paper, we investigate the use of field programmable gate arrays (FPGAs) to accelerate relational joins. Relational join is one of the most CPU-intensive, yet commonly used, database operations. Hashing can be used to reduce the time complexity from quadratic (naïve) to linear time. However, doing so can introduce false positives to the results which must be resolved. We present a hash-join engine on FPGA that performs hashing, conflict resolution, and joining on a PCIe-attached system, achieving greater than 11x speedup over software.
{"title":"Accelerating Join Operation for Relational Databases with FPGAs","authors":"R. Halstead, Bharat Sukhwani, Hong Min, Mathew S. Thoennes, Parijat Dube, S. Asaad, B. Iyer","doi":"10.1109/FCCM.2013.17","DOIUrl":"https://doi.org/10.1109/FCCM.2013.17","url":null,"abstract":"In this paper, we investigate the use of field programmable gate arrays (FPGAs) to accelerate relational joins. Relational join is one of the most CPU-intensive, yet commonly used, database operations. Hashing can be used to reduce the time complexity from quadratic (naïve) to linear time. However, doing so can introduce false positives to the results which must be resolved. We present a hash-join engine on FPGA that performs hashing, conflict resolution, and joining on a PCIe-attached system, achieving greater than 11x speedup over software.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123661718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nandhini Chandramoorthy, Siddharth Advani, K. Irick, N. Vijaykrishnan
Summary form only given. The objective of this paper is to present a configurable architecture for a visual saliency model based on AIM. It presents algorithmic enhancements to AIM that facilitates the design of a performance-efficient hardware architecture that offers tradeoffs between accuracy, resource utilization and latency. The AIM computational model involves (1) extraction of a set of coefficient features for each local patch in an image, (2) estimation of probability density for each coefficient with respect to its local surround, (3) computation of their product to give a joint likelihood and (4) computation of the self information of each pixel from its log likelihood. Calculation of likelihood with respect to each pixel individually in a local surround is computationally expensive. It proposes to approximate the contribution of pixels in the surround in terms of “cells” grouped further into “support zones”, whose widths are configurable. This approximation leads to nearly a 10x reduction in the number of multipliers, a critical resource, for a 41x41 surround size.
{"title":"A Configurable Architecture for a Visual Saliency System and Its Application in Retail","authors":"Nandhini Chandramoorthy, Siddharth Advani, K. Irick, N. Vijaykrishnan","doi":"10.1109/FCCM.2013.41","DOIUrl":"https://doi.org/10.1109/FCCM.2013.41","url":null,"abstract":"Summary form only given. The objective of this paper is to present a configurable architecture for a visual saliency model based on AIM. It presents algorithmic enhancements to AIM that facilitates the design of a performance-efficient hardware architecture that offers tradeoffs between accuracy, resource utilization and latency. The AIM computational model involves (1) extraction of a set of coefficient features for each local patch in an image, (2) estimation of probability density for each coefficient with respect to its local surround, (3) computation of their product to give a joint likelihood and (4) computation of the self information of each pixel from its log likelihood. Calculation of likelihood with respect to each pixel individually in a local surround is computationally expensive. It proposes to approximate the contribution of pixels in the surround in terms of “cells” grouped further into “support zones”, whose widths are configurable. This approximation leads to nearly a 10x reduction in the number of multipliers, a critical resource, for a 41x41 surround size.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122862895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Embedded applications can often demand stringent latency requirements. While high degrees of parallelism within custom FPGA-based accelerators may help to some extent, it may also be necessary to limit the precision used in the datapath to boost the operating frequency of the implementation. However, by reducing the precision, the engineer introduces quantization error into the design. In this paper, we demonstrate that for many applications it would be preferable to simply overclock the design and accept that timing violations may arise. Since the errors introduced by timing violations occur rarely, they will cause less noise than quantization errors. Through the use of analytical models and empirical results on a Xilinx Virtex-6 FPGA, we show that a geometric mean reduction of 67.9% to 98.8% in error expectation or a geometric mean improvement of 3.1% to 27.6% in operating frequency can be obtained using this alternative design methodology.
{"title":"Accuracy-Performance Tradeoffs on an FPGA through Overclocking","authors":"Kan Shi, D. Boland, G. Constantinides","doi":"10.1109/FCCM.2013.10","DOIUrl":"https://doi.org/10.1109/FCCM.2013.10","url":null,"abstract":"Embedded applications can often demand stringent latency requirements. While high degrees of parallelism within custom FPGA-based accelerators may help to some extent, it may also be necessary to limit the precision used in the datapath to boost the operating frequency of the implementation. However, by reducing the precision, the engineer introduces quantization error into the design. In this paper, we demonstrate that for many applications it would be preferable to simply overclock the design and accept that timing violations may arise. Since the errors introduced by timing violations occur rarely, they will cause less noise than quantization errors. Through the use of analytical models and empirical results on a Xilinx Virtex-6 FPGA, we show that a geometric mean reduction of 67.9% to 98.8% in error expectation or a geometric mean improvement of 3.1% to 27.6% in operating frequency can be obtained using this alternative design methodology.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123771161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Summary form only given. Physically unclonable functions (PUFs) have been a hot research topic in hardware-oriented security for many years. Given a challenge as an input to the PUF, it generates a corresponding response, which can be treated as a unique fingerprint or signature for authentication purpose. In this paper, a delay-based PUF design involving multiplexers on FPGA is presented. Due to the intrinsic difference of the switching latencies of two chained multiplexers, a positive pulse may be produced at the output of the downstream multiplexer. This pulse can be used to set the output of a D flip-flop to `1'. The proposed design improves the randomness of the outputs of the PUF.
{"title":"A Delay-based PUF Design Using Multiplexers on FPGA","authors":"Miaoqing Huang, Shiming Li","doi":"10.1109/FCCM.2013.11","DOIUrl":"https://doi.org/10.1109/FCCM.2013.11","url":null,"abstract":"Summary form only given. Physically unclonable functions (PUFs) have been a hot research topic in hardware-oriented security for many years. Given a challenge as an input to the PUF, it generates a corresponding response, which can be treated as a unique fingerprint or signature for authentication purpose. In this paper, a delay-based PUF design involving multiplexers on FPGA is presented. Due to the intrinsic difference of the switching latencies of two chained multiplexers, a positive pulse may be produced at the output of the downstream multiplexer. This pulse can be used to set the output of a D flip-flop to `1'. The proposed design improves the randomness of the outputs of the PUF.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128688638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents an optimised high throughput architecture for integer squaring on FPGAs. The approach reduces the number of DSP blocks required compared to a standard multiplier. Previous work has proposed the tiling method for double precision squaring, using the least number of DSP blocks so far. However that approach incurs a large overhead in terms of look-up table (LUT) consumption and has a complex and irregular structure that is not suitable for higher word size. The architecture proposed in this paper can reduce DSP block usage by an equivalent amount to the tiling method while incurring a much lower LUT overhead: 21.8% fewer LUTs for a 53-bit squarer. The architecture is mapped to a Xilinx Virtex 6 FPGA and evaluated for a wide range of operand word sizes, demonstrating its scalability and efficiency.
{"title":"Efficient Large Integer Squarers on FPGA","authors":"Simin Xu, Suhaib A. Fahmy, I. Mcloughlin","doi":"10.1109/FCCM.2013.35","DOIUrl":"https://doi.org/10.1109/FCCM.2013.35","url":null,"abstract":"This paper presents an optimised high throughput architecture for integer squaring on FPGAs. The approach reduces the number of DSP blocks required compared to a standard multiplier. Previous work has proposed the tiling method for double precision squaring, using the least number of DSP blocks so far. However that approach incurs a large overhead in terms of look-up table (LUT) consumption and has a complex and irregular structure that is not suitable for higher word size. The architecture proposed in this paper can reduce DSP block usage by an equivalent amount to the tiling method while incurring a much lower LUT overhead: 21.8% fewer LUTs for a 53-bit squarer. The architecture is mapped to a Xilinx Virtex 6 FPGA and evaluated for a wide range of operand word sizes, demonstrating its scalability and efficiency.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115849487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider the problem of minimizing communication with off-chip memory and composition of multiple linear algebra kernels in iterative solvers for solving large-scale eigenvalue problems and linear systems of equations. While GPUs may offer higher throughput for individual kernels, overall application performance is limited by the inability to support on-chip sharing of data across kernels. In this paper, we show that higher on-chip memory capacity and superior on-chip communication bandwidth enables FPGAs to better support the composition of a sequence of kernels within these iterative solvers. We present a time-multiplexed FPGA architecture which exploits the on-chip capacity to store dependencies between kernels and high communication bandwidth to move data. We propose a resource-constrained framework to select the optimal value of an algorithmic parameter which provides the tradeoff between communication and computation cost for a particular FPGA. Using the Lanczos Method as a case study, we show how to minimize communication on FPGAs by this tight algorithm-architecture interaction and get superior performance over GPU despite of its ~5x larger off-chip memory bandwidth and ~2x greater peak singleprecision floating-point performance.
{"title":"Application Composition and Communication Optimization in Iterative Solvers Using FPGAs","authors":"A. Rafique, Nachiket Kapre, G. Constantinides","doi":"10.1109/FCCM.2013.16","DOIUrl":"https://doi.org/10.1109/FCCM.2013.16","url":null,"abstract":"We consider the problem of minimizing communication with off-chip memory and composition of multiple linear algebra kernels in iterative solvers for solving large-scale eigenvalue problems and linear systems of equations. While GPUs may offer higher throughput for individual kernels, overall application performance is limited by the inability to support on-chip sharing of data across kernels. In this paper, we show that higher on-chip memory capacity and superior on-chip communication bandwidth enables FPGAs to better support the composition of a sequence of kernels within these iterative solvers. We present a time-multiplexed FPGA architecture which exploits the on-chip capacity to store dependencies between kernels and high communication bandwidth to move data. We propose a resource-constrained framework to select the optimal value of an algorithmic parameter which provides the tradeoff between communication and computation cost for a particular FPGA. Using the Lanczos Method as a case study, we show how to minimize communication on FPGAs by this tight algorithm-architecture interaction and get superior performance over GPU despite of its ~5x larger off-chip memory bandwidth and ~2x greater peak singleprecision floating-point performance.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116526615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexander Brant, Ameer Abdelhadi, Douglas H. H. Sim, S. Tang, Michael Xi Yue, G. Lemieux
Overclocking a CPU is a common practice among home-built PC enthusiasts where the CPU is operated at a higher frequency than its speed rating. This practice is unsafe because timing errors cannot be detected by modern CPUs and they can be practically undetectable by the end user. Using a timing speculation technique such as Razor, it is possible to detect timing errors in CPUs. To date, Razor has been shown to correct only unidirectional, feed-forward processor pipelines. In this paper, we safely overclock 2D arrays by extending Razor correction to cover bidirectional communication in a tightly coupled or lockstep fashion. To recover from an error, stall wavefronts are produced which propagate across the device. Multiple errors may arise in close proximity in time and space; if the corresponding stall wavefronts collide, they merge to produce a single unified wavefront, allowing recovery from multiple errors with one stall cycle. We demonstrate the correctness and viability of our approach by constructing a proof-of-concept prototype which runs on a traditional Altera FPGA. Our approach can be applied to custom computing arrays, systolic arrays, CGRAs, and also time-multiplexed FPGAs such as those produced by Tabula. As a result, these devices can be overclocked and safely tolerate dynamic, data-dependent timing errors. Alternatively, instead of overclocking, this same technique can be used to `undervolt' the power supply and save energy.
{"title":"Safe Overclocking of Tightly Coupled CGRAs and Processor Arrays using Razor","authors":"Alexander Brant, Ameer Abdelhadi, Douglas H. H. Sim, S. Tang, Michael Xi Yue, G. Lemieux","doi":"10.1109/FCCM.2013.63","DOIUrl":"https://doi.org/10.1109/FCCM.2013.63","url":null,"abstract":"Overclocking a CPU is a common practice among home-built PC enthusiasts where the CPU is operated at a higher frequency than its speed rating. This practice is unsafe because timing errors cannot be detected by modern CPUs and they can be practically undetectable by the end user. Using a timing speculation technique such as Razor, it is possible to detect timing errors in CPUs. To date, Razor has been shown to correct only unidirectional, feed-forward processor pipelines. In this paper, we safely overclock 2D arrays by extending Razor correction to cover bidirectional communication in a tightly coupled or lockstep fashion. To recover from an error, stall wavefronts are produced which propagate across the device. Multiple errors may arise in close proximity in time and space; if the corresponding stall wavefronts collide, they merge to produce a single unified wavefront, allowing recovery from multiple errors with one stall cycle. We demonstrate the correctness and viability of our approach by constructing a proof-of-concept prototype which runs on a traditional Altera FPGA. Our approach can be applied to custom computing arrays, systolic arrays, CGRAs, and also time-multiplexed FPGAs such as those produced by Tabula. As a result, these devices can be overclocked and safely tolerate dynamic, data-dependent timing errors. Alternatively, instead of overclocking, this same technique can be used to `undervolt' the power supply and save energy.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128253827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SQL query processing on large database systems is recognized as one of the most important emerging disciplines of computing nowadays. However, current approaches do not provide a substantial coverage of typical query operators in hardware. In this paper, we provide an important step to higher operator coverage by proposing a) full dynamic data path generation for support also complex operators such as restrictions and aggregations. b) Also, an analysis of the computation times of a real database queries when running on a normal desktop computer is proposed to show that c) speedups ranging between 4 and 50 are obtainable by providing generative support also for the important restrict and aggregate operators using FPGAs.
{"title":"Acceleration of SQL Restrictions and Aggregations through FPGA-Based Dynamic Partial Reconfiguration","authors":"C. Dennl, Daniel Ziener, J. Teich","doi":"10.1109/FCCM.2013.38","DOIUrl":"https://doi.org/10.1109/FCCM.2013.38","url":null,"abstract":"SQL query processing on large database systems is recognized as one of the most important emerging disciplines of computing nowadays. However, current approaches do not provide a substantial coverage of typical query operators in hardware. In this paper, we provide an important step to higher operator coverage by proposing a) full dynamic data path generation for support also complex operators such as restrictions and aggregations. b) Also, an analysis of the computation times of a real database queries when running on a normal desktop computer is proposed to show that c) speedups ranging between 4 and 50 are obtainable by providing generative support also for the important restrict and aggregate operators using FPGAs.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124099702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}