Pub Date : 2009-12-09DOI: 10.1109/ReConFig.2009.29
M. Vazquez, G. Sutter, G. Bioul, J. Deschamps
This paper presents FPGA implementations of add/subtract algorithms for 10´s complement BCD numbers. Carry-chain type circuits have been designed on 6-input LUT´s Xilinx Virtex-5 FPGA technologies. Some new concepts are reviewed to compute the P and G functions for carry-chain optimization purposes. Designs are presented with the corresponding time performances and area consumption figures. Results have been compared with 2´s complement binary implementations carried out on the same platform. Better time delays have been registered for decimal number within same range of operands.
{"title":"Decimal Adders/Subtractors in FPGA: Efficient 6-input LUT Implementations","authors":"M. Vazquez, G. Sutter, G. Bioul, J. Deschamps","doi":"10.1109/ReConFig.2009.29","DOIUrl":"https://doi.org/10.1109/ReConFig.2009.29","url":null,"abstract":"This paper presents FPGA implementations of add/subtract algorithms for 10´s complement BCD numbers. Carry-chain type circuits have been designed on 6-input LUT´s Xilinx Virtex-5 FPGA technologies. Some new concepts are reviewed to compute the P and G functions for carry-chain optimization purposes. Designs are presented with the corresponding time performances and area consumption figures. Results have been compared with 2´s complement binary implementations carried out on the same platform. Better time delays have been registered for decimal number within same range of operands.","PeriodicalId":325631,"journal":{"name":"2009 International Conference on Reconfigurable Computing and FPGAs","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131074898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-09DOI: 10.1109/ReConFig.2009.84
J. Surís, A. Recio, P. Athanas
In this paper the RapidRadio framework for signal classification and receiver deployment is discussed. The frame- work is a productivity enhancing tool that reduces the required knowledge-base for implementing a receiver on an FPGA-based SDR platform. The ultimate objective of this framework is to identify unknown signals and to build FPGA-based receivers capable of receiving them. The framework’s capacity to classify a signal and deploy a functional receiver is validated with over- the-air experiments.
{"title":"Enhancing the Productivity of Radio Designers with RapidRadio","authors":"J. Surís, A. Recio, P. Athanas","doi":"10.1109/ReConFig.2009.84","DOIUrl":"https://doi.org/10.1109/ReConFig.2009.84","url":null,"abstract":"In this paper the RapidRadio framework for signal classification and receiver deployment is discussed. The frame- work is a productivity enhancing tool that reduces the required knowledge-base for implementing a receiver on an FPGA-based SDR platform. The ultimate objective of this framework is to identify unknown signals and to build FPGA-based receivers capable of receiving them. The framework’s capacity to classify a signal and deploy a functional receiver is validated with over- the-air experiments.","PeriodicalId":325631,"journal":{"name":"2009 International Conference on Reconfigurable Computing and FPGAs","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116009085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-09DOI: 10.1109/ReConFig.2009.77
Kenneth L. Rice, M. Bhuiyan, T. Taha, Christopher N. Vutsinas, M. C. Smith
There has been a strong push recently to examine biological scale simulations of neuromorphic algorithms to achieve stronger inference capabilities than current computing algorithms. The recent Izhikevich spiking neuron model is ideally suited for such large scale cortical simulations due to its efficiency and biological accuracy. In this paper we explore the feasibility of using FPGAs for large scale simulations of the Izhikevich model. We developed a modularized processing element to evaluate a large number of Izhikevich spiking neurons in a pipelined manner. This approach allows for easy scalability of the model to larger FPGAs. We utilized a character recognition algorithm based on the Izhikevich model for this study and scaled up the algorithm to use over 9000 neurons. The FPGA implementation of the algorithm on a Xilinx Virtex 4 provided a speedup of approximately 8.5 times an equivalent software implementation on a 2.2 GHz AMD Opteron core. Our results indicate that FPGAs are suitable for large scale cortical simulations utilizing the Izhikevich spiking neuron model.
{"title":"FPGA Implementation of Izhikevich Spiking Neural Networks for Character Recognition","authors":"Kenneth L. Rice, M. Bhuiyan, T. Taha, Christopher N. Vutsinas, M. C. Smith","doi":"10.1109/ReConFig.2009.77","DOIUrl":"https://doi.org/10.1109/ReConFig.2009.77","url":null,"abstract":"There has been a strong push recently to examine biological scale simulations of neuromorphic algorithms to achieve stronger inference capabilities than current computing algorithms. The recent Izhikevich spiking neuron model is ideally suited for such large scale cortical simulations due to its efficiency and biological accuracy. In this paper we explore the feasibility of using FPGAs for large scale simulations of the Izhikevich model. We developed a modularized processing element to evaluate a large number of Izhikevich spiking neurons in a pipelined manner. This approach allows for easy scalability of the model to larger FPGAs. We utilized a character recognition algorithm based on the Izhikevich model for this study and scaled up the algorithm to use over 9000 neurons. The FPGA implementation of the algorithm on a Xilinx Virtex 4 provided a speedup of approximately 8.5 times an equivalent software implementation on a 2.2 GHz AMD Opteron core. Our results indicate that FPGAs are suitable for large scale cortical simulations utilizing the Izhikevich spiking neuron model.","PeriodicalId":325631,"journal":{"name":"2009 International Conference on Reconfigurable Computing and FPGAs","volume":"446 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116329447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-09DOI: 10.1109/ReConFig.2009.21
Stefan Döbrich, C. Hochberger
Future chip technologies will change the way we deal with hardware design. (1) logic resources will be available in vast amount and (2) engineering specialized designs for particular applications will no longer be the general approach as the non recurring expenses will grow tremendously. Thus, we believe that online synthesis that takes place during the execution of an application is one way to overcome these problems. In this paper we show that even a relative simplistic synthesis approach can have a strong impact on the performance of compute intensive applications.
{"title":"Effects of Simplistic Online Synthesis for AMIDAR Processors","authors":"Stefan Döbrich, C. Hochberger","doi":"10.1109/ReConFig.2009.21","DOIUrl":"https://doi.org/10.1109/ReConFig.2009.21","url":null,"abstract":"Future chip technologies will change the way we deal with hardware design. (1) logic resources will be available in vast amount and (2) engineering specialized designs for particular applications will no longer be the general approach as the non recurring expenses will grow tremendously. Thus, we believe that online synthesis that takes place during the execution of an application is one way to overcome these problems. In this paper we show that even a relative simplistic synthesis approach can have a strong impact on the performance of compute intensive applications.","PeriodicalId":325631,"journal":{"name":"2009 International Conference on Reconfigurable Computing and FPGAs","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122083287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-09DOI: 10.1109/ReConFig.2009.33
A. Astarloa, Jesús Lázaro, U. Bidarte, A. Zuloaga, J. Jiménez
This paper presents a PCI-Express based platform for the analysis and evaluation of designs that combines Triple Modular Redundancy and Dynamic Reconfiguration to provide Fault Tolerance and Self-repairing capabilities. The paper presents the general architecture of the platform and exemplifies its functionality with the implementation of a Self-Repairing CAN Gateway.
{"title":"PCIREX: A Fast Prototyping Platform for TMR Dynamically Reconfigurable Systems","authors":"A. Astarloa, Jesús Lázaro, U. Bidarte, A. Zuloaga, J. Jiménez","doi":"10.1109/ReConFig.2009.33","DOIUrl":"https://doi.org/10.1109/ReConFig.2009.33","url":null,"abstract":"This paper presents a PCI-Express based platform for the analysis and evaluation of designs that combines Triple Modular Redundancy and Dynamic Reconfiguration to provide Fault Tolerance and Self-repairing capabilities. The paper presents the general architecture of the platform and exemplifies its functionality with the implementation of a Self-Repairing CAN Gateway.","PeriodicalId":325631,"journal":{"name":"2009 International Conference on Reconfigurable Computing and FPGAs","volume":" 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113952320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-09DOI: 10.1109/ReConFig.2009.75
Kenichi Koizumi, M. Inaba, K. Hiraki, Y. Ishii, T. Miyoshi, Kazuki Yoshizoe
After a computer named “Deep Blue” defeated the world chess champion Garry Kasparov in 1997, researchers studying computer board games focused their attention on the game “Go.” Go is known to be more difficult for computers to play than chess or shogi because (1) the search space for Go is much larger, (2) it is difficult to define an appropriate evaluation function of position, and (3) a position sometimes changes globally in just one move. Recently, a new method called Monte Carlo Go has been developed, which involves performing Monte Carlo simulations to evaluate a position. Monte Carlo Go increases the strength of the Computer-Go program. For Monte Carlo Go, the strength fully depends on the number of simulations. Several attempts were made to accelerate simulations, e.g., by the use of cluster systems and FPGAs. The cluster system yields good results, but it is a very expensive system. On the other hand, acceleration using an FPGA was not so easy because the usage of FPGA resources tends to be high. Previously, FPGA acceleration was feasible for smaller board such as a board with a 9 × 9 grid, while it was not feasible for the standard board with a 19 × 19 grid. In this paper, we propose triple line-based playout for Go (TLPG), a hardware algorithm for generating simulations using an FPGA. By reproducing global information redundantly, TLPG enables the generation of simulations only using local operations; this helps realize compact implementations of hardware logic, and thus, TLPG can handle both 9 × 9 and 19 × 19 grid Go boards. We implement TLPG on Xilinx Virtex-5 (XC5VFX70T-1FF1136) and evaluate it. TLPG can perform 40,649 playouts per second for a 9 × 9 grid Go board and 4,668 playouts per second for a 19 × 19 grid Go board.
{"title":"Triple Line-Based Playout for Go - An Accelerator for Monte Carlo Go","authors":"Kenichi Koizumi, M. Inaba, K. Hiraki, Y. Ishii, T. Miyoshi, Kazuki Yoshizoe","doi":"10.1109/ReConFig.2009.75","DOIUrl":"https://doi.org/10.1109/ReConFig.2009.75","url":null,"abstract":"After a computer named “Deep Blue” defeated the world chess champion Garry Kasparov in 1997, researchers studying computer board games focused their attention on the game “Go.” Go is known to be more difficult for computers to play than chess or shogi because (1) the search space for Go is much larger, (2) it is difficult to define an appropriate evaluation function of position, and (3) a position sometimes changes globally in just one move. Recently, a new method called Monte Carlo Go has been developed, which involves performing Monte Carlo simulations to evaluate a position. Monte Carlo Go increases the strength of the Computer-Go program. For Monte Carlo Go, the strength fully depends on the number of simulations. Several attempts were made to accelerate simulations, e.g., by the use of cluster systems and FPGAs. The cluster system yields good results, but it is a very expensive system. On the other hand, acceleration using an FPGA was not so easy because the usage of FPGA resources tends to be high. Previously, FPGA acceleration was feasible for smaller board such as a board with a 9 × 9 grid, while it was not feasible for the standard board with a 19 × 19 grid. In this paper, we propose triple line-based playout for Go (TLPG), a hardware algorithm for generating simulations using an FPGA. By reproducing global information redundantly, TLPG enables the generation of simulations only using local operations; this helps realize compact implementations of hardware logic, and thus, TLPG can handle both 9 × 9 and 19 × 19 grid Go boards. We implement TLPG on Xilinx Virtex-5 (XC5VFX70T-1FF1136) and evaluate it. TLPG can perform 40,649 playouts per second for a 9 × 9 grid Go board and 4,668 playouts per second for a 19 × 19 grid Go board.","PeriodicalId":325631,"journal":{"name":"2009 International Conference on Reconfigurable Computing and FPGAs","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116775266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-09DOI: 10.1109/ReConFig.2009.61
Abelardo Jara-Berrocal, A. Gordon-Ross
Large applications that exceed available FPGA resources must time-multiplex these resources using smaller hardware modules. In order to orchestrate this time-multiplexing, temporal partitioning partitions these hardware modules into multiple subsets, each of which fit within the available resources. During a temporal partition transition, the FPGA is reconfigured to the subsequent temporal partition. However, FPGA reconfiguration time can impose significant performance overhead as the entire FPGA fabric must be reconfigured even if only a small portion has changed. Partially reconfigurable (PR) FPGAs can decrease reconfiguration time by only reconfiguring the portions of the FPGA fabric that differ. In this paper, we present a design methodology using a simulated annealing-based module placement optimization engine to minimize FPGA reconfiguration overhead by exploiting module overlap across successive temporal partitions. Experimental results show that our methodology reduces FPGA reconfiguration time by 44% on average.
{"title":"Runtime Temporal Partitioning Assembly to Reduce FPGA Reconfiguration Time","authors":"Abelardo Jara-Berrocal, A. Gordon-Ross","doi":"10.1109/ReConFig.2009.61","DOIUrl":"https://doi.org/10.1109/ReConFig.2009.61","url":null,"abstract":"Large applications that exceed available FPGA resources must time-multiplex these resources using smaller hardware modules. In order to orchestrate this time-multiplexing, temporal partitioning partitions these hardware modules into multiple subsets, each of which fit within the available resources. During a temporal partition transition, the FPGA is reconfigured to the subsequent temporal partition. However, FPGA reconfiguration time can impose significant performance overhead as the entire FPGA fabric must be reconfigured even if only a small portion has changed. Partially reconfigurable (PR) FPGAs can decrease reconfiguration time by only reconfiguring the portions of the FPGA fabric that differ. In this paper, we present a design methodology using a simulated annealing-based module placement optimization engine to minimize FPGA reconfiguration overhead by exploiting module overlap across successive temporal partitions. Experimental results show that our methodology reduces FPGA reconfiguration time by 44% on average.","PeriodicalId":325631,"journal":{"name":"2009 International Conference on Reconfigurable Computing and FPGAs","volume":"40 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114042349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-09DOI: 10.1109/ReConFig.2009.53
C. Castro, C. Llanos, Walter de Britto Vidal Filho, L. Coelho
This paper presents a fuzzy controller implementation in FPGA (Field Programmable Gate Array) for a robot that rides a bicycle using the well-known Acrobot model. The overall system presents a hardware/software codesign approach and it was achieved by means of a Microblaze FPGA embedded processor and a fuzzy controller, which was implemented directly in hardware. Both the microprocessor and the controller are connected via the Fast Simplex Link - FSL bus. The proposed design methodology involves firstly the fuzzy controller design in software for simulation and testing issues, taking into account the mathematical model of the plant. Afterwards, the controller was synthesized to the hardware description language VHDL using the Xfuzzy 2.0 tool. The fuzzy controller has 2 modules, each one producing a torque control variable. The first module receives both the position and angular speed of the first link of the Acrobot system whereas the second module receives the position and angular speed of the second link. The final torque variable is calculated in the Microblaze taking into account two gains. Each gain represents a priority that is applied to each fuzzy module. These gains were experimentally calculated through several simulation executed in the Matlab computational environment.
本文介绍了一种基于现场可编程门阵列(FPGA)的自行车机器人模糊控制器的实现,该控制器采用著名的Acrobot模型。整个系统采用硬件/软件协同设计的方式,采用Microblaze FPGA嵌入式处理器和模糊控制器直接在硬件上实现。微处理器和控制器都通过Fast Simplex Link - FSL总线连接。所提出的设计方法首先涉及到模糊控制器在软件仿真和测试中的设计问题,同时考虑到系统的数学模型。然后,利用Xfuzzy 2.0工具将控制器合成为硬件描述语言VHDL。模糊控制器有2个模块,每个模块产生一个转矩控制变量。第一模块接收Acrobot系统第一链路的位置和角速度,第二模块接收第二链路的位置和角速度。最终的扭矩变量在Microblaze中计算,考虑了两个增益。每个增益表示应用于每个模糊模块的优先级。这些增益是通过在Matlab计算环境中进行的多次模拟实验计算出来的。
{"title":"Fuzzy Control for Cyclist Robot Stability Using FPGAs","authors":"C. Castro, C. Llanos, Walter de Britto Vidal Filho, L. Coelho","doi":"10.1109/ReConFig.2009.53","DOIUrl":"https://doi.org/10.1109/ReConFig.2009.53","url":null,"abstract":"This paper presents a fuzzy controller implementation in FPGA (Field Programmable Gate Array) for a robot that rides a bicycle using the well-known Acrobot model. The overall system presents a hardware/software codesign approach and it was achieved by means of a Microblaze FPGA embedded processor and a fuzzy controller, which was implemented directly in hardware. Both the microprocessor and the controller are connected via the Fast Simplex Link - FSL bus. The proposed design methodology involves firstly the fuzzy controller design in software for simulation and testing issues, taking into account the mathematical model of the plant. Afterwards, the controller was synthesized to the hardware description language VHDL using the Xfuzzy 2.0 tool. The fuzzy controller has 2 modules, each one producing a torque control variable. The first module receives both the position and angular speed of the first link of the Acrobot system whereas the second module receives the position and angular speed of the second link. The final torque variable is calculated in the Microblaze taking into account two gains. Each gain represents a priority that is applied to each fuzzy module. These gains were experimentally calculated through several simulation executed in the Matlab computational environment.","PeriodicalId":325631,"journal":{"name":"2009 International Conference on Reconfigurable Computing and FPGAs","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122610547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-09DOI: 10.1109/ReConFig.2009.16
Hector Borrayo-Sandoval, R. Parra-Michel, L. F. Gonzalez-Perez, Fernando Landeros Printzen, C. F. Uribe
During the last decade, Turbo codes have been taking an increasing importance in channel coding due to its good performance in error correction. One key component in Turbo codes is the interleaver/deinterleaver pair, often designed as reconfigurable coprocessors able to deal with requirements of large data length variability found in the newest communication standards. In this work we introduce a configurable interleaver architecture for the turbo decoder in 3rd Generation Partnership Project (3GPP) standard. It is implemented under the idea of “iterative modulo computation”. Additionally, the presented solution not only generates the interleaved addresses, but also deals with the flow of data streams through the interleaver. The architecture and FPGA implementation results are also presented.
{"title":"Design and Implementation of a Configurable Interleaver/Deinterleaver for Turbo Codes in 3GPP Standard","authors":"Hector Borrayo-Sandoval, R. Parra-Michel, L. F. Gonzalez-Perez, Fernando Landeros Printzen, C. F. Uribe","doi":"10.1109/ReConFig.2009.16","DOIUrl":"https://doi.org/10.1109/ReConFig.2009.16","url":null,"abstract":"During the last decade, Turbo codes have been taking an increasing importance in channel coding due to its good performance in error correction. One key component in Turbo codes is the interleaver/deinterleaver pair, often designed as reconfigurable coprocessors able to deal with requirements of large data length variability found in the newest communication standards. In this work we introduce a configurable interleaver architecture for the turbo decoder in 3rd Generation Partnership Project (3GPP) standard. It is implemented under the idea of “iterative modulo computation”. Additionally, the presented solution not only generates the interleaved addresses, but also deals with the flow of data streams through the interleaver. The architecture and FPGA implementation results are also presented.","PeriodicalId":325631,"journal":{"name":"2009 International Conference on Reconfigurable Computing and FPGAs","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133768268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-09DOI: 10.1109/ReConFig.2009.17
Malte Baesler, T. Teufel
Decimal Floating Point (DFP) operations are very important for applications, that cannot tolerate errors from conversions between binary and decimal formats, for instance scientific, commercial, financial and internet-based applications. In this paper we present a parallel decimal fixed-point multiplier, designed to exploit the features of FPGAs. Our multiplier is based on BCD recoding schemes, fast partial product generation and a BCD-4221 Carry Save Adder reduction tree. Furthermore, we extend the multiplier with an accurate scalar product unit in order to provide an important operation with smallest possible rounding error as proposed in. Finally the design is implemented and tested on a Xilinx Virtex-II Pro FPGA platform.
浮点(DFP)操作对于不能容忍二进制和十进制格式转换错误的应用程序非常重要,例如科学、商业、金融和基于互联网的应用程序。本文提出了一种利用fpga的特点设计的并行十进制定点乘法器。我们的乘法器基于BCD编码方案,快速部分积生成和BCD-4221进位保存加法器约简树。此外,我们用精确的标量积单位扩展了乘法器,以便提供一个重要的操作,其舍入误差尽可能小。最后在Xilinx Virtex-II Pro FPGA平台上对该设计进行了实现和测试。
{"title":"FPGA Implementation of a Decimal Floating-Point Accurate Scalar Product Unit with a Parallel Fixed-Point Multiplier","authors":"Malte Baesler, T. Teufel","doi":"10.1109/ReConFig.2009.17","DOIUrl":"https://doi.org/10.1109/ReConFig.2009.17","url":null,"abstract":"Decimal Floating Point (DFP) operations are very important for applications, that cannot tolerate errors from conversions between binary and decimal formats, for instance scientific, commercial, financial and internet-based applications. In this paper we present a parallel decimal fixed-point multiplier, designed to exploit the features of FPGAs. Our multiplier is based on BCD recoding schemes, fast partial product generation and a BCD-4221 Carry Save Adder reduction tree. Furthermore, we extend the multiplier with an accurate scalar product unit in order to provide an important operation with smallest possible rounding error as proposed in. Finally the design is implemented and tested on a Xilinx Virtex-II Pro FPGA platform.","PeriodicalId":325631,"journal":{"name":"2009 International Conference on Reconfigurable Computing and FPGAs","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134492257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}