高效高斯-约当矩阵反演的内存优化结构

2007 3rd Southern Conference on Programmable Logic Pub Date : 2007-06-18 DOI:10.1109/SPL.2007.371720

Gon alo

{"title":"高效高斯-约当矩阵反演的内存优化结构","authors":"Gon alo","doi":"10.1109/SPL.2007.371720","DOIUrl":null,"url":null,"abstract":"This paper presents a new architecture for efficient Gauss-Jordan matrix inversion algorithm on reconfigurable hardware platforms. The results show that currently available re- configurable computing technology can easily achieve significantly higher floating-point performance than high-end CPUs, running state-of-the-art routines for large matrices operations. For common reconfigurable systems, where the FPGAs are directly coupled to the on-board memory, the achievable performance scales directly with the number of realizable simultaneous memory accesses. A new dedicated reconfigurable architecture is proposed and analysed and the results show a performance improvement of 2x over the previous implementation, using only half of the memory and half of the floating-point units. Benchmarking against Matlab, which features high performance matrix inversion routines, shows that a 100 MHz FPGA can easily surpass the performance of 3,2 GHz Intel Pentium IV processors. This is possible having only 5 double-port memory banks or 9 single-port memory banks connected to the FPGA.","PeriodicalId":419253,"journal":{"name":"2007 3rd Southern Conference on Programmable Logic","volume":"78 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"Memory Optimized Architecture for Efficient Gauss-Jordan Matrix Inversion\",\"authors\":\"Gon alo\",\"doi\":\"10.1109/SPL.2007.371720\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a new architecture for efficient Gauss-Jordan matrix inversion algorithm on reconfigurable hardware platforms. The results show that currently available re- configurable computing technology can easily achieve significantly higher floating-point performance than high-end CPUs, running state-of-the-art routines for large matrices operations. For common reconfigurable systems, where the FPGAs are directly coupled to the on-board memory, the achievable performance scales directly with the number of realizable simultaneous memory accesses. A new dedicated reconfigurable architecture is proposed and analysed and the results show a performance improvement of 2x over the previous implementation, using only half of the memory and half of the floating-point units. Benchmarking against Matlab, which features high performance matrix inversion routines, shows that a 100 MHz FPGA can easily surpass the performance of 3,2 GHz Intel Pentium IV processors. This is possible having only 5 double-port memory banks or 9 single-port memory banks connected to the FPGA.\",\"PeriodicalId\":419253,\"journal\":{\"name\":\"2007 3rd Southern Conference on Programmable Logic\",\"volume\":\"78 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2007 3rd Southern Conference on Programmable Logic\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SPL.2007.371720\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 3rd Southern Conference on Programmable Logic","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPL.2007.371720","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

摘要

本文提出了一种在可重构硬件平台上实现高效高斯-约当矩阵反演算法的新架构。结果表明，当前可用的可重构计算技术可以轻松实现比高端cpu更高的浮点性能，运行最先进的大型矩阵操作例程。对于常见的可重构系统，其中fpga直接耦合到板载存储器，可实现的性能与可实现的并发存储器访问数量直接相关。提出并分析了一种新的专用可重构架构，结果表明，在只使用一半内存和一半浮点单元的情况下，性能比以前的实现提高了2倍。对具有高性能矩阵反演例程的Matlab进行基准测试表明，100 MHz FPGA可以轻松超越3.2 GHz Intel Pentium IV处理器的性能。只有5个双端口内存库或9个单端口内存库连接到FPGA，这是可能的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Memory Optimized Architecture for Efficient Gauss-Jordan Matrix Inversion

This paper presents a new architecture for efficient Gauss-Jordan matrix inversion algorithm on reconfigurable hardware platforms. The results show that currently available re- configurable computing technology can easily achieve significantly higher floating-point performance than high-end CPUs, running state-of-the-art routines for large matrices operations. For common reconfigurable systems, where the FPGAs are directly coupled to the on-board memory, the achievable performance scales directly with the number of realizable simultaneous memory accesses. A new dedicated reconfigurable architecture is proposed and analysed and the results show a performance improvement of 2x over the previous implementation, using only half of the memory and half of the floating-point units. Benchmarking against Matlab, which features high performance matrix inversion routines, shows that a 100 MHz FPGA can easily surpass the performance of 3,2 GHz Intel Pentium IV processors. This is possible having only 5 double-port memory banks or 9 single-port memory banks connected to the FPGA.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2007 3rd Southern Conference on Programmable Logic

自引率

0.00%

发文量

期刊最新文献

Low Power AMR System Based on FPGA A Genetic Algorithm Based Solution for Dynamically Reconfigurable Modules Allocation TCL/TK for EDA Tools Comparative Analysis of High Level Programming for Reconfigurable Computers: Methodology and Empirical Study Extending Embedded Computing Scheduling Algorithms for Reconfigurable Computing Systems