融合乘加微架构，包括独立的早期归一化乘加管道

2011 IEEE 20th Symposium on Computer Arithmetic Pub Date : 2011-07-25 DOI:10.1109/ARITH.2011.25

D. Lutz

{"title":"融合乘加微架构，包括独立的早期归一化乘加管道","authors":"D. Lutz","doi":"10.1109/ARITH.2011.25","DOIUrl":null,"url":null,"abstract":"We present an IEEE 754-2008 and ARM compliant floating-point micro architecture that preserves the higher performance of separate multiply and add units while decreasing the effective latency of fused multiply-adds (FMAs). The multiplier supports subnormals in a novel and faster manner, shifting the partial products so that injection rounding can be used. The early-normalizing adder retains the low latency of a split path near/far adder, but does so in a unified path with less area. The adder also allows rounding on effective subtractions involving one input that is twice the normal width, a necessary feature for handling FMAs. The resulting floating-point unit has about twice the (IPC) performance of the best previous ARM design, and can be clocked at a higher speed despite the wider paths required by FMAs.","PeriodicalId":272151,"journal":{"name":"2011 IEEE 20th Symposium on Computer Arithmetic","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":"{\"title\":\"Fused Multiply-Add Microarchitecture Comprising Separate Early-Normalizing Multiply and Add Pipelines\",\"authors\":\"D. Lutz\",\"doi\":\"10.1109/ARITH.2011.25\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present an IEEE 754-2008 and ARM compliant floating-point micro architecture that preserves the higher performance of separate multiply and add units while decreasing the effective latency of fused multiply-adds (FMAs). The multiplier supports subnormals in a novel and faster manner, shifting the partial products so that injection rounding can be used. The early-normalizing adder retains the low latency of a split path near/far adder, but does so in a unified path with less area. The adder also allows rounding on effective subtractions involving one input that is twice the normal width, a necessary feature for handling FMAs. The resulting floating-point unit has about twice the (IPC) performance of the best previous ARM design, and can be clocked at a higher speed despite the wider paths required by FMAs.\",\"PeriodicalId\":272151,\"journal\":{\"name\":\"2011 IEEE 20th Symposium on Computer Arithmetic\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"23\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE 20th Symposium on Computer Arithmetic\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ARITH.2011.25\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE 20th Symposium on Computer Arithmetic","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ARITH.2011.25","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 23

摘要

我们提出了一种符合IEEE 754-2008和ARM标准的浮点微架构，该架构既保留了独立乘法和加法单元的更高性能，又降低了融合乘法和加法(fma)的有效延迟。乘法器以一种新颖的、更快的方式支持次法线，移动部分乘积，从而可以使用注入舍入。早期归一化加法器保留了分割路径近/远加法器的低延迟，但在面积较小的统一路径上实现了这一点。加法器还允许对有效减法进行舍入，其中一个输入是正常宽度的两倍，这是处理fma的必要功能。由此产生的浮点单元的IPC性能大约是以前最好的ARM设计的两倍，尽管fma需要更宽的路径，但可以以更高的速度进行时钟处理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Fused Multiply-Add Microarchitecture Comprising Separate Early-Normalizing Multiply and Add Pipelines

We present an IEEE 754-2008 and ARM compliant floating-point micro architecture that preserves the higher performance of separate multiply and add units while decreasing the effective latency of fused multiply-adds (FMAs). The multiplier supports subnormals in a novel and faster manner, shifting the partial products so that injection rounding can be used. The early-normalizing adder retains the low latency of a split path near/far adder, but does so in a unified path with less area. The adder also allows rounding on effective subtractions involving one input that is twice the normal width, a necessary feature for handling FMAs. The resulting floating-point unit has about twice the (IPC) performance of the best previous ARM design, and can be clocked at a higher speed despite the wider paths required by FMAs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 IEEE 20th Symposium on Computer Arithmetic

自引率

0.00%

发文量

期刊最新文献

Fused Multiply-Add Microarchitecture Comprising Separate Early-Normalizing Multiply and Add Pipelines A 1.5 Ghz VLIW DSP CPU with Integrated Floating Point and Fixed Point Instructions in 40 nm CMOS Flocq: A Unified Library for Proving Floating-Point Algorithms in Coq Teraflop FPGA Design Self Checking in Current Floating-Point Units