{"title":"Xilinx Virtex fpga高性能矩阵乘法器核心的设计与实现","authors":"S. Belkacemi, K. Benkrid, D. Crookes, A. Benkrid","doi":"10.1109/CAMP.2003.1598160","DOIUrl":null,"url":null,"abstract":"Matrix multiplication is a core operation in digital signal processing operations with a variety of applications such as image processing, computer graphics, sonar processing and robotics. This paper presents the design and implementation of a high performance, fully parallel matrix multiplication core. The core is parameterised and scalable in terms of the matrices' dimensions (row and column number) and the input data word length. Fully floorplanned FPGA configurations are generated automatically, from high-level descriptions of the matrix multiplication operation, in the form of EDIF netlists in less than 1 sec. These are specifically optimised for Xilinx Virtex FPGA chips. By exploiting the abundance of logic resources in Xilinx Virtex FPGAs (look-up tables, fast carry logic, shift registers, flip flops etc.), a fully parallel implementation of the matrix multiplier core has been achieved; with a full matrix result being generated every clock cycle. A 3times3 matrix multiplier instance consumes 2,448 Virtex slices and can run at 175 MHz on an XCV1000E-6 Virtex-E chip, thus performing over 4.7 billion MAC/sec. This leads to 175 million full 3times3 matrix result per second","PeriodicalId":443821,"journal":{"name":"2003 IEEE International Workshop on Computer Architectures for Machine Perception","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Design and implementation of a high performance matrix multiplier core for Xilinx Virtex FPGAs\",\"authors\":\"S. Belkacemi, K. Benkrid, D. Crookes, A. Benkrid\",\"doi\":\"10.1109/CAMP.2003.1598160\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Matrix multiplication is a core operation in digital signal processing operations with a variety of applications such as image processing, computer graphics, sonar processing and robotics. This paper presents the design and implementation of a high performance, fully parallel matrix multiplication core. The core is parameterised and scalable in terms of the matrices' dimensions (row and column number) and the input data word length. Fully floorplanned FPGA configurations are generated automatically, from high-level descriptions of the matrix multiplication operation, in the form of EDIF netlists in less than 1 sec. These are specifically optimised for Xilinx Virtex FPGA chips. By exploiting the abundance of logic resources in Xilinx Virtex FPGAs (look-up tables, fast carry logic, shift registers, flip flops etc.), a fully parallel implementation of the matrix multiplier core has been achieved; with a full matrix result being generated every clock cycle. A 3times3 matrix multiplier instance consumes 2,448 Virtex slices and can run at 175 MHz on an XCV1000E-6 Virtex-E chip, thus performing over 4.7 billion MAC/sec. This leads to 175 million full 3times3 matrix result per second\",\"PeriodicalId\":443821,\"journal\":{\"name\":\"2003 IEEE International Workshop on Computer Architectures for Machine Perception\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-05-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2003 IEEE International Workshop on Computer Architectures for Machine Perception\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CAMP.2003.1598160\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2003 IEEE International Workshop on Computer Architectures for Machine Perception","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CAMP.2003.1598160","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Design and implementation of a high performance matrix multiplier core for Xilinx Virtex FPGAs
Matrix multiplication is a core operation in digital signal processing operations with a variety of applications such as image processing, computer graphics, sonar processing and robotics. This paper presents the design and implementation of a high performance, fully parallel matrix multiplication core. The core is parameterised and scalable in terms of the matrices' dimensions (row and column number) and the input data word length. Fully floorplanned FPGA configurations are generated automatically, from high-level descriptions of the matrix multiplication operation, in the form of EDIF netlists in less than 1 sec. These are specifically optimised for Xilinx Virtex FPGA chips. By exploiting the abundance of logic resources in Xilinx Virtex FPGAs (look-up tables, fast carry logic, shift registers, flip flops etc.), a fully parallel implementation of the matrix multiplier core has been achieved; with a full matrix result being generated every clock cycle. A 3times3 matrix multiplier instance consumes 2,448 Virtex slices and can run at 175 MHz on an XCV1000E-6 Virtex-E chip, thus performing over 4.7 billion MAC/sec. This leads to 175 million full 3times3 matrix result per second