一种可重构的低功耗高性能矩阵乘法器设计

R. Lin
{"title":"一种可重构的低功耗高性能矩阵乘法器设计","authors":"R. Lin","doi":"10.1109/ISQED.2000.838891","DOIUrl":null,"url":null,"abstract":"A novel reconfigurable low-power high-performance matrix multiplier architecture and its component circuits are presented. The processor can be easily reconfigured to compute the product of matrices X/sub nK/ and Y/sub km/ for any integers n, k, m and any item precision b (ranging from 4 to 64 bits) thus maximizing the utilization of the hardware available. As a typical example, the hardware equivalent to one 64/spl times/64 bit high precision multiplier in the system can be directly reconfigured to produce the product of two matrices X/sub 8/spl times/8/ and Y/sub 8/spl times/8/ of 8-bit items in 9 pipeline cycles, which would require 512 multiplications (done by large multipliers) in a non-reconfigurable high precision system. Given an input stream of h/spl times/h matrix pairs with b-bit items, the processor, called matrix multiplier of size s (note s=hb), may consist of an array of (s/m)/sup 2/ of m/spl times/m small multipliers (m=4 case is illustrated), a few arrays of adders each adding three numbers, an array of accumulators and corresponding simple reconfiguration switches. To compute the product of X/sub nK/ and Y/sub km/, of item precision b on the proposed processor of size s we only need to partition X/sub nK/ and Y/sub km/ into s/b X s/b sub-matrices, reconfigure the processor according to the values of s (fixed) and b (input parameter), compute the products of submatrices, and accumulate them for the desired result in pipelined fashion. A recently proposed shift switch logic, a nonbinary logic for arithmetic circuits, is utilized in the design. The novel logic operates 4-bit state signals where no more than half of the signal bits are subject to value-change at any logic stage, which, verified by SPICE simulation, significantly reduces the large circuit power dissipation while keeping high performance in speed and small VLSI area.","PeriodicalId":113766,"journal":{"name":"Proceedings IEEE 2000 First International Symposium on Quality Electronic Design (Cat. No. PR00525)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"A reconfigurable low-power high-performance matrix multiplier design\",\"authors\":\"R. Lin\",\"doi\":\"10.1109/ISQED.2000.838891\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A novel reconfigurable low-power high-performance matrix multiplier architecture and its component circuits are presented. The processor can be easily reconfigured to compute the product of matrices X/sub nK/ and Y/sub km/ for any integers n, k, m and any item precision b (ranging from 4 to 64 bits) thus maximizing the utilization of the hardware available. As a typical example, the hardware equivalent to one 64/spl times/64 bit high precision multiplier in the system can be directly reconfigured to produce the product of two matrices X/sub 8/spl times/8/ and Y/sub 8/spl times/8/ of 8-bit items in 9 pipeline cycles, which would require 512 multiplications (done by large multipliers) in a non-reconfigurable high precision system. Given an input stream of h/spl times/h matrix pairs with b-bit items, the processor, called matrix multiplier of size s (note s=hb), may consist of an array of (s/m)/sup 2/ of m/spl times/m small multipliers (m=4 case is illustrated), a few arrays of adders each adding three numbers, an array of accumulators and corresponding simple reconfiguration switches. To compute the product of X/sub nK/ and Y/sub km/, of item precision b on the proposed processor of size s we only need to partition X/sub nK/ and Y/sub km/ into s/b X s/b sub-matrices, reconfigure the processor according to the values of s (fixed) and b (input parameter), compute the products of submatrices, and accumulate them for the desired result in pipelined fashion. A recently proposed shift switch logic, a nonbinary logic for arithmetic circuits, is utilized in the design. The novel logic operates 4-bit state signals where no more than half of the signal bits are subject to value-change at any logic stage, which, verified by SPICE simulation, significantly reduces the large circuit power dissipation while keeping high performance in speed and small VLSI area.\",\"PeriodicalId\":113766,\"journal\":{\"name\":\"Proceedings IEEE 2000 First International Symposium on Quality Electronic Design (Cat. No. PR00525)\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2000-03-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings IEEE 2000 First International Symposium on Quality Electronic Design (Cat. No. PR00525)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISQED.2000.838891\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings IEEE 2000 First International Symposium on Quality Electronic Design (Cat. No. PR00525)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISQED.2000.838891","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16

摘要

提出了一种新的可重构低功耗高性能矩阵乘法器结构及其组成电路。处理器可以很容易地重新配置,以计算矩阵X/sub nK/和Y/sub km/对任何整数n, k, m和任何项目精度b(从4到64位)的乘积,从而最大限度地利用可用的硬件。作为一个典型的例子,系统中相当于一个64/spl倍/64位高精度乘法器的硬件可以直接重新配置,以在9个管道周期中产生两个矩阵X/sub 8/spl倍/8/和Y/sub 8/spl倍/8/的乘积,这将需要512次乘法(由大型乘法器完成)在不可重构的高精度系统中。给定一个带有b位项的h/spl次/h矩阵对的输入流,称为大小为s的矩阵乘法器(注s=hb),可以由一个(s/m)/sup 2/个m/spl次/m个小乘法器(说明m=4的情况)、几个加法器数组(每个加3个数字)、一个累加器数组和相应的简单重构开关组成。为了计算项目精度为b的X/sub nK/和Y/sub km/在大小为s的处理器上的乘积,我们只需要将X/sub nK/和Y/sub km/划分为s/b X s/b子矩阵,根据s(固定)和b(输入参数)的值重新配置处理器,计算子矩阵的乘积,并以流水线方式累积它们以获得期望的结果。在设计中采用了一种最近提出的移位开关逻辑,一种用于算术电路的非二进制逻辑。该新型逻辑处理4位状态信号,在任何逻辑阶段都不超过一半的信号位受到值变化的影响,SPICE仿真验证了这一点,在保持高性能速度和小VLSI面积的同时,显着降低了电路的大功耗。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A reconfigurable low-power high-performance matrix multiplier design
A novel reconfigurable low-power high-performance matrix multiplier architecture and its component circuits are presented. The processor can be easily reconfigured to compute the product of matrices X/sub nK/ and Y/sub km/ for any integers n, k, m and any item precision b (ranging from 4 to 64 bits) thus maximizing the utilization of the hardware available. As a typical example, the hardware equivalent to one 64/spl times/64 bit high precision multiplier in the system can be directly reconfigured to produce the product of two matrices X/sub 8/spl times/8/ and Y/sub 8/spl times/8/ of 8-bit items in 9 pipeline cycles, which would require 512 multiplications (done by large multipliers) in a non-reconfigurable high precision system. Given an input stream of h/spl times/h matrix pairs with b-bit items, the processor, called matrix multiplier of size s (note s=hb), may consist of an array of (s/m)/sup 2/ of m/spl times/m small multipliers (m=4 case is illustrated), a few arrays of adders each adding three numbers, an array of accumulators and corresponding simple reconfiguration switches. To compute the product of X/sub nK/ and Y/sub km/, of item precision b on the proposed processor of size s we only need to partition X/sub nK/ and Y/sub km/ into s/b X s/b sub-matrices, reconfigure the processor according to the values of s (fixed) and b (input parameter), compute the products of submatrices, and accumulate them for the desired result in pipelined fashion. A recently proposed shift switch logic, a nonbinary logic for arithmetic circuits, is utilized in the design. The novel logic operates 4-bit state signals where no more than half of the signal bits are subject to value-change at any logic stage, which, verified by SPICE simulation, significantly reduces the large circuit power dissipation while keeping high performance in speed and small VLSI area.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Low power testing of VLSI circuits: problems and solutions Quick on-chip self- and mutual-inductance screen Correct-by-design CAD enhancement for EMI and signal integrity An efficient rule-based OPC approach using a DRC tool for 0.18 /spl mu/m ASIC On testability of multiple precharged domino logic
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1