Pub Date : 2009-12-09DOI: 10.1109/ReConFig.2009.62
Rachid Dafali, J. Diguet
This paper presents our approach considering the needs of reconfiguration in the domain of NoCs. We introduce our motivations and then detail our strategy based on local (delegate) and global configuration managers. Finally we describe an original self-adaptive Network Interface architecture, which is a part of the configuration manager, in charge of run-time buffer sizing. The challenge is clearly a tradeoff between the complexity of decision implementation and expected gains in terms of cost and performances. Our results obtained on FPGA within an emulator board demonstrate the interest of the proposed approach.
{"title":"Self-Adaptive Network Interface (SANI): Local Component of a NoC Configuration Manager","authors":"Rachid Dafali, J. Diguet","doi":"10.1109/ReConFig.2009.62","DOIUrl":"https://doi.org/10.1109/ReConFig.2009.62","url":null,"abstract":"This paper presents our approach considering the needs of reconfiguration in the domain of NoCs. We introduce our motivations and then detail our strategy based on local (delegate) and global configuration managers. Finally we describe an original self-adaptive Network Interface architecture, which is a part of the configuration manager, in charge of run-time buffer sizing. The challenge is clearly a tradeoff between the complexity of decision implementation and expected gains in terms of cost and performances. Our results obtained on FPGA within an emulator board demonstrate the interest of the proposed approach.","PeriodicalId":325631,"journal":{"name":"2009 International Conference on Reconfigurable Computing and FPGAs","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115534980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-09DOI: 10.1109/ReConFig.2009.51
L. Kirischian, V. Dumitriu, P. Chun
The possibility for distribution of FPGA resources in the temporal domain for multi-modal & multi-task workloads conceptually allows virtualization of logic, communication and input/output resources similar to memory virtualization in advanced conventional computers (e.g. superscalar). This, in turn, can dramatically increase the cost-effectiveness of FPGA based Reconfigurable Computing Systems (RCS). In the presented “proof-of-concept” research the following topics have been investigated, developed and tested: i) architecture of a platform to support the dynamic allocation of Application Specific Virtual Processors (ASVP), ii) mechanisms for run-time on-chip assembly of ASVP from Virtual Hardware Components (VHC) and iii) mechanisms for run-time on-chip components (VHC) relocation in predetermined regions of the FPGA device. The above mechanisms have been implemented and tested on a specially developed platform: the Multi-task Adaptive Reconfigurable System (MARS) Platform. The actual application of MARS was prototyping a high-performance multi-mode stereo-vision system (200 fps) for the next generation of space-borne computing platforms.
{"title":"Virtualization of Computing Resources in RCS for Multi-task Stream Applications","authors":"L. Kirischian, V. Dumitriu, P. Chun","doi":"10.1109/ReConFig.2009.51","DOIUrl":"https://doi.org/10.1109/ReConFig.2009.51","url":null,"abstract":"The possibility for distribution of FPGA resources in the temporal domain for multi-modal & multi-task workloads conceptually allows virtualization of logic, communication and input/output resources similar to memory virtualization in advanced conventional computers (e.g. superscalar). This, in turn, can dramatically increase the cost-effectiveness of FPGA based Reconfigurable Computing Systems (RCS). In the presented “proof-of-concept” research the following topics have been investigated, developed and tested: i) architecture of a platform to support the dynamic allocation of Application Specific Virtual Processors (ASVP), ii) mechanisms for run-time on-chip assembly of ASVP from Virtual Hardware Components (VHC) and iii) mechanisms for run-time on-chip components (VHC) relocation in predetermined regions of the FPGA device. The above mechanisms have been implemented and tested on a specially developed platform: the Multi-task Adaptive Reconfigurable System (MARS) Platform. The actual application of MARS was prototyping a high-performance multi-mode stereo-vision system (200 fps) for the next generation of space-borne computing platforms.","PeriodicalId":325631,"journal":{"name":"2009 International Conference on Reconfigurable Computing and FPGAs","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131421084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-09DOI: 10.1109/ReConFig.2009.73
F. Mayer-Lindenberg
We describe a simple and fast approach to FPGA programming that allows to efficiently exploit the numeric processing capabilities of recent FPGA chips. It basically consists in programming on top of a library of complex components for FPGA based scalable processor networks and providing a high-level programming interface to it. The FPGA application is presented as a network of processes which is automatically transformed into a corresponding network of simple processor components by a compiler. The compiler then generates individual program code for each of the simple processors. The coarse-grained processor network is eventually compiled into an FPGA configuration bitstream using standard FPGA tools at close-to-interactive speeds. Our approach has the additional benefit of being fully compatible with processor programming and extendible to mixed multi-component FPGA and processor systems. An experimental implementation of the process mapping scheme uses the p-Nets language that provides convenient structures for the presentation of the application processes and supports composite targets including processors linked to the FPGA chips. The evaluation of our concept on some FPGA chips includes an estimate of their floating point processing performances.
{"title":"High-Level FPGA Programming through Mapping Process Networks to FPGA Resources","authors":"F. Mayer-Lindenberg","doi":"10.1109/ReConFig.2009.73","DOIUrl":"https://doi.org/10.1109/ReConFig.2009.73","url":null,"abstract":"We describe a simple and fast approach to FPGA programming that allows to efficiently exploit the numeric processing capabilities of recent FPGA chips. It basically consists in programming on top of a library of complex components for FPGA based scalable processor networks and providing a high-level programming interface to it. The FPGA application is presented as a network of processes which is automatically transformed into a corresponding network of simple processor components by a compiler. The compiler then generates individual program code for each of the simple processors. The coarse-grained processor network is eventually compiled into an FPGA configuration bitstream using standard FPGA tools at close-to-interactive speeds. Our approach has the additional benefit of being fully compatible with processor programming and extendible to mixed multi-component FPGA and processor systems. An experimental implementation of the process mapping scheme uses the p-Nets language that provides convenient structures for the presentation of the application processes and supports composite targets including processors linked to the FPGA chips. The evaluation of our concept on some FPGA chips includes an estimate of their floating point processing performances.","PeriodicalId":325631,"journal":{"name":"2009 International Conference on Reconfigurable Computing and FPGAs","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123521399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-09DOI: 10.1109/ReConFig.2009.54
P. Yalla, J. Kaps
The advent of new low-power Field Programmable Gate Arrays (FPGA) for battery powered devices opens a host of new applications to FPGAs. In order to provide security on resource constrained devices lightweight cryptographic algorithms have been developed. However, there has not been much research on porting these algorithms to FPGAs. In this paper we propose lightweight cryptography for FPGAs by introducing block cipher independent optimization techniques for Xilinx Spartan3 FPGAs and applying them to the lightweight cryptographic algorithms HIGHT and Present. Our implementations are the first reported of these block ciphers on FPGAs. Furthermore, they are the smallest block cipher implementations on FPGAs using only 117 and 91 slices respectively, which makes them comparable in size to stream cipher implementations. Both are less than half the size of the AES implementation by Chodowiec and Gaj without using block RAMs. Present’s throughput over area ratio of 240 Kbps/slice is similar to that of AES, however, HIGHT outperforms them by far with 720 Kbps/slice.
{"title":"Lightweight Cryptography for FPGAs","authors":"P. Yalla, J. Kaps","doi":"10.1109/ReConFig.2009.54","DOIUrl":"https://doi.org/10.1109/ReConFig.2009.54","url":null,"abstract":"The advent of new low-power Field Programmable Gate Arrays (FPGA) for battery powered devices opens a host of new applications to FPGAs. In order to provide security on resource constrained devices lightweight cryptographic algorithms have been developed. However, there has not been much research on porting these algorithms to FPGAs. In this paper we propose lightweight cryptography for FPGAs by introducing block cipher independent optimization techniques for Xilinx Spartan3 FPGAs and applying them to the lightweight cryptographic algorithms HIGHT and Present. Our implementations are the first reported of these block ciphers on FPGAs. Furthermore, they are the smallest block cipher implementations on FPGAs using only 117 and 91 slices respectively, which makes them comparable in size to stream cipher implementations. Both are less than half the size of the AES implementation by Chodowiec and Gaj without using block RAMs. Present’s throughput over area ratio of 240 Kbps/slice is similar to that of AES, however, HIGHT outperforms them by far with 720 Kbps/slice.","PeriodicalId":325631,"journal":{"name":"2009 International Conference on Reconfigurable Computing and FPGAs","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122638855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-09DOI: 10.1109/ReConFig.2009.35
M. Jeitler, J. Lechner
While stability and robustness of synchronous circuits becomes increasingly problematic due to shrinking feature sizes, delay-insensitive asynchronous circuits are supposed to provide inherent protection against various fault types. However, results on experimental evaluation and analysis of these fault tolerance properties are scarce, mainly due to the lack of suitable prototyping platforms. Using a soft-core processor as an example, this paper shows how an off-the-shelf FPGA can be used for asynchronous Four State Logic designs, on which future fault injection experiments will be conducted.
{"title":"Speeding up Fault Injection for Asynchronous Logic by FPGA-Based Emulation","authors":"M. Jeitler, J. Lechner","doi":"10.1109/ReConFig.2009.35","DOIUrl":"https://doi.org/10.1109/ReConFig.2009.35","url":null,"abstract":"While stability and robustness of synchronous circuits becomes increasingly problematic due to shrinking feature sizes, delay-insensitive asynchronous circuits are supposed to provide inherent protection against various fault types. However, results on experimental evaluation and analysis of these fault tolerance properties are scarce, mainly due to the lack of suitable prototyping platforms. Using a soft-core processor as an example, this paper shows how an off-the-shelf FPGA can be used for asynchronous Four State Logic designs, on which future fault injection experiments will be conducted.","PeriodicalId":325631,"journal":{"name":"2009 International Conference on Reconfigurable Computing and FPGAs","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125443027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-09DOI: 10.1109/ReConFig.2009.40
M. Juliato, C. Gebotys
This paper introduces the specialization of a NIOS2 processor targeting the computation of message authentication codes and integrity checks in constrained environments. Several hardware/software partitioning levels are considered, which vary from simple functions implemented as custom instructions to complete algorithms as peripherals. Our experimental results show that functions Sum, Sig, Ch, Maj implemented as custom instructions allows for SHA-256 and HMAC to be accelerated 1.38 and 1.36 times respectively, while keeping a small area footprint. If the entire SHA-256 algorithm is implemented as a peripheral, the hash computation is performed 11 times faster while decreasing the program size in 16%. Furthermore, the HMAC/SHA-256 peripheral accelerates the computation of a message authentication code 19 times with a 26% smaller program. These results allow for the specialization of the computational platform of constrained embedded systems to the processing requirements of cryptographic applications performing message authentication codes and integrity checks.
{"title":"Tailoring a Reconfigurable Platform to SHA-256 and HMAC through Custom Instructions and Peripherals","authors":"M. Juliato, C. Gebotys","doi":"10.1109/ReConFig.2009.40","DOIUrl":"https://doi.org/10.1109/ReConFig.2009.40","url":null,"abstract":"This paper introduces the specialization of a NIOS2 processor targeting the computation of message authentication codes and integrity checks in constrained environments. Several hardware/software partitioning levels are considered, which vary from simple functions implemented as custom instructions to complete algorithms as peripherals. Our experimental results show that functions Sum, Sig, Ch, Maj implemented as custom instructions allows for SHA-256 and HMAC to be accelerated 1.38 and 1.36 times respectively, while keeping a small area footprint. If the entire SHA-256 algorithm is implemented as a peripheral, the hash computation is performed 11 times faster while decreasing the program size in 16%. Furthermore, the HMAC/SHA-256 peripheral accelerates the computation of a message authentication code 19 times with a 26% smaller program. These results allow for the specialization of the computational platform of constrained embedded systems to the processing requirements of cryptographic applications performing message authentication codes and integrity checks.","PeriodicalId":325631,"journal":{"name":"2009 International Conference on Reconfigurable Computing and FPGAs","volume":"208 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122144937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-09DOI: 10.1109/ReConFig.2009.24
Jochen Strunk, Toni Volkmer, W. Rehm, H. Schick
This paper examines the feasibility of utilizing a grid of asynchronously clocked run-time reconfigurable modules (RTRMs) on a dynamically and partially reconfigurable (DPR) FPGA. In contrast to a synchronously clocked grid studied in research, the design, the implementation, the performance and the resource utilization of an asynchronously clocked grid is shown. Such a run-time reconfigurable (RTR) grid on a FPGA can be utilized to dynamically offload compute functions on a host coupled system, providing multi-user and multi-context execution on behalf of user demands. For embedded systems it can be utilized as a highly dynamical platform by providing functional enhancement by module replacement during run-time. The presented platform leverages synthesis and development constraints and is able to increase the overall throughput by allowing multiple clock domains within the grid. The performance and the additional resource utilization of handling multiple clock domains is compared to synchronously clocked grids. As proof of concept a case study with a grid of 47 RTRMs is conducted on state of the art Virtex-5 FPGAs.
{"title":"Design and Performance of a Grid of Asynchronously Clocked Run-Time Reconfigurable Modules on a FPGA","authors":"Jochen Strunk, Toni Volkmer, W. Rehm, H. Schick","doi":"10.1109/ReConFig.2009.24","DOIUrl":"https://doi.org/10.1109/ReConFig.2009.24","url":null,"abstract":"This paper examines the feasibility of utilizing a grid of asynchronously clocked run-time reconfigurable modules (RTRMs) on a dynamically and partially reconfigurable (DPR) FPGA. In contrast to a synchronously clocked grid studied in research, the design, the implementation, the performance and the resource utilization of an asynchronously clocked grid is shown. Such a run-time reconfigurable (RTR) grid on a FPGA can be utilized to dynamically offload compute functions on a host coupled system, providing multi-user and multi-context execution on behalf of user demands. For embedded systems it can be utilized as a highly dynamical platform by providing functional enhancement by module replacement during run-time. The presented platform leverages synthesis and development constraints and is able to increase the overall throughput by allowing multiple clock domains within the grid. The performance and the additional resource utilization of handling multiple clock domains is compared to synchronously clocked grids. As proof of concept a case study with a grid of 47 RTRMs is conducted on state of the art Virtex-5 FPGAs.","PeriodicalId":325631,"journal":{"name":"2009 International Conference on Reconfigurable Computing and FPGAs","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128215228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-09DOI: 10.1109/ReConFig.2009.48
Arghavan Asad, A. E. Zonouz, M. Seyrafi, M. Soryani, M. Fathy
Networks-on-Chip (NoC) has been proposed as an only efficient and scalable solution for providing global on-chip communications in any large VLSI design. Simultaneously, power dissipation issues have grown to such importance that they now constrain attainable performance. The large value of power consumption, relative to the active power, can therefore have serious implications for the feasibility of deploying NoCs. If NoCs are to be accepted, their full power implications need to be known. Moreover, these power characteristics must be accurately understood across the large possible design space of NoCs. Blocking time is one of the effective factors on NoC power consumption. In this paper we present a Markovian model for evaluating the amount of the dissipated power comes from packet blocking and show the blocking time effects on total power consumption of on-chip networks approach.
{"title":"Modeling and Analyzing of Blocking Time Effects on Power Consumption in Network-on-Chips","authors":"Arghavan Asad, A. E. Zonouz, M. Seyrafi, M. Soryani, M. Fathy","doi":"10.1109/ReConFig.2009.48","DOIUrl":"https://doi.org/10.1109/ReConFig.2009.48","url":null,"abstract":"Networks-on-Chip (NoC) has been proposed as an only efficient and scalable solution for providing global on-chip communications in any large VLSI design. Simultaneously, power dissipation issues have grown to such importance that they now constrain attainable performance. The large value of power consumption, relative to the active power, can therefore have serious implications for the feasibility of deploying NoCs. If NoCs are to be accepted, their full power implications need to be known. Moreover, these power characteristics must be accurately understood across the large possible design space of NoCs. Blocking time is one of the effective factors on NoC power consumption. In this paper we present a Markovian model for evaluating the amount of the dissipated power comes from packet blocking and show the blocking time effects on total power consumption of on-chip networks approach.","PeriodicalId":325631,"journal":{"name":"2009 International Conference on Reconfigurable Computing and FPGAs","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132619992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-09DOI: 10.1109/ReConFig.2009.81
C. Torres-Huitzil
This paper presents the feasibility study of the efficient digital hardware implementation of a neural model to generate locomotion patterns of periodic rhythmic movements inspired by biological neural networks found in animal nervous system called Central Pattern Generators (CPGs). The proposed implementation contains a dedicated digital module that mimics the functionality and organization of the fundamental Amari- Hopfield CPG. This module is attached to an embedded processor running the uclinux operating system. The present paper deals only with the implementation of the basic CPG component and how to embed it under a System on a Chip (SoC) approach in order to be controlled by external commands in a high level transparent way for application development. The system is implemented on a Field Programmable Gate Array (FPGA) device providing a compact, flexible and expandable solution for generating periodic rhythmic patterns in robot control applications. According to experimental results, the architecture can be used as a basis for a biomimetic intelligent embedded control platform for articulated autonomous robots.
{"title":"On the Implementation of Central Pattern Generators for Periodic Rhythmic Locomotion","authors":"C. Torres-Huitzil","doi":"10.1109/ReConFig.2009.81","DOIUrl":"https://doi.org/10.1109/ReConFig.2009.81","url":null,"abstract":"This paper presents the feasibility study of the efficient digital hardware implementation of a neural model to generate locomotion patterns of periodic rhythmic movements inspired by biological neural networks found in animal nervous system called Central Pattern Generators (CPGs). The proposed implementation contains a dedicated digital module that mimics the functionality and organization of the fundamental Amari- Hopfield CPG. This module is attached to an embedded processor running the uclinux operating system. The present paper deals only with the implementation of the basic CPG component and how to embed it under a System on a Chip (SoC) approach in order to be controlled by external commands in a high level transparent way for application development. The system is implemented on a Field Programmable Gate Array (FPGA) device providing a compact, flexible and expandable solution for generating periodic rhythmic patterns in robot control applications. According to experimental results, the architecture can be used as a basis for a biomimetic intelligent embedded control platform for articulated autonomous robots.","PeriodicalId":325631,"journal":{"name":"2009 International Conference on Reconfigurable Computing and FPGAs","volume":"596 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113966791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-09DOI: 10.1109/ReConFig.2009.70
R. Arce-Nazario, E. Orozco, D. Bollman
Multivariate polynomial interpolation is a key computation for the reverse engineering of genetic networks modeled by finite fields. Faster implementations of such algorithms are needed to cope with the increasing quantity and complexity of genetic data. Our implementation of an interpolation methodology to FPGA has led us to identify a systolic array-based hardware architecture that is useful for performing at least three interpolation sub-tasks: Boolean cover, uniqueness, and multivariate polynomial addition. We present a generalization of these algorithms that simplifies mapping to the systolic-array structure, as well as control and storage considerations to guarantee correct results when the input sequence is longer than the processing array. The three interpolation sub-tasks were modeled and implemented to FPGA using the proposed structure, obtaining speedups up to 172x when compared to a software implementation, while achieving low resource utilization.
{"title":"A Systolic Array Based Architecture for Implementing Multivariate Polynomial Interpolation Tasks","authors":"R. Arce-Nazario, E. Orozco, D. Bollman","doi":"10.1109/ReConFig.2009.70","DOIUrl":"https://doi.org/10.1109/ReConFig.2009.70","url":null,"abstract":"Multivariate polynomial interpolation is a key computation for the reverse engineering of genetic networks modeled by finite fields. Faster implementations of such algorithms are needed to cope with the increasing quantity and complexity of genetic data. Our implementation of an interpolation methodology to FPGA has led us to identify a systolic array-based hardware architecture that is useful for performing at least three interpolation sub-tasks: Boolean cover, uniqueness, and multivariate polynomial addition. We present a generalization of these algorithms that simplifies mapping to the systolic-array structure, as well as control and storage considerations to guarantee correct results when the input sequence is longer than the processing array. The three interpolation sub-tasks were modeled and implemented to FPGA using the proposed structure, obtaining speedups up to 172x when compared to a software implementation, while achieving low resource utilization.","PeriodicalId":325631,"journal":{"name":"2009 International Conference on Reconfigurable Computing and FPGAs","volume":"277 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120886444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}