浮点融合点积单元的改进架构

2013 IEEE 21st Symposium on Computer Arithmetic Pub Date : 2013-04-07 DOI:10.1109/ARITH.2013.26

Jongwook Sohn, E. Swartzlander

{"title":"浮点融合点积单元的改进架构","authors":"Jongwook Sohn, E. Swartzlander","doi":"10.1109/ARITH.2013.26","DOIUrl":null,"url":null,"abstract":"This paper presents improved architectures for a floating-point fused two-term dot product unit. The floating-point fused dot product unit is useful for a wide variety of digital signal processing (DSP) applications including complex multiplication and fast Fourier transform (FFT) and discrete cosine transform (DCT) butterfly operations. In order to improve the performance, a new alignment scheme, early normalization, a four-input leading zero anticipation (LZA), a dual-path algorithm, and pipelining are applied. The proposed designs are implemented for single precision and synthesized with a 45nm standard cell library. The proposed dual-path design reduces the latency by 25% compared to the traditional floating-point fused dot product unit. Based on a data flow analysis, the proposed design can be split into three pipeline stages. Since the latencies of the three stages are fairly well balanced, the throughput is increased by a factor of 2.8 compared to the non-pipelined dual-path design.","PeriodicalId":211528,"journal":{"name":"2013 IEEE 21st Symposium on Computer Arithmetic","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"42","resultStr":"{\"title\":\"Improved Architectures for a Floating-Point Fused Dot Product Unit\",\"authors\":\"Jongwook Sohn, E. Swartzlander\",\"doi\":\"10.1109/ARITH.2013.26\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents improved architectures for a floating-point fused two-term dot product unit. The floating-point fused dot product unit is useful for a wide variety of digital signal processing (DSP) applications including complex multiplication and fast Fourier transform (FFT) and discrete cosine transform (DCT) butterfly operations. In order to improve the performance, a new alignment scheme, early normalization, a four-input leading zero anticipation (LZA), a dual-path algorithm, and pipelining are applied. The proposed designs are implemented for single precision and synthesized with a 45nm standard cell library. The proposed dual-path design reduces the latency by 25% compared to the traditional floating-point fused dot product unit. Based on a data flow analysis, the proposed design can be split into three pipeline stages. Since the latencies of the three stages are fairly well balanced, the throughput is increased by a factor of 2.8 compared to the non-pipelined dual-path design.\",\"PeriodicalId\":211528,\"journal\":{\"name\":\"2013 IEEE 21st Symposium on Computer Arithmetic\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-04-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"42\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE 21st Symposium on Computer Arithmetic\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ARITH.2013.26\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 21st Symposium on Computer Arithmetic","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ARITH.2013.26","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 42

摘要

提出了一种改进的浮点融合两项点积单元结构。浮点融合点积单元可用于各种数字信号处理(DSP)应用，包括复杂乘法和快速傅立叶变换(FFT)和离散余弦变换(DCT)蝴蝶运算。为了提高性能，采用了一种新的对齐方案、早期归一化、四输入前导零预判(LZA)、双路径算法和流水线。所提出的设计实现了单精度，并与45nm标准细胞库合成。与传统的浮点融合点积单元相比，所提出的双路径设计将延迟降低了25%。基于数据流分析，提出的设计可分为三个管道阶段。由于三个阶段的延迟相当平衡，因此与非流水线双路径设计相比，吞吐量增加了2.8倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Improved Architectures for a Floating-Point Fused Dot Product Unit

This paper presents improved architectures for a floating-point fused two-term dot product unit. The floating-point fused dot product unit is useful for a wide variety of digital signal processing (DSP) applications including complex multiplication and fast Fourier transform (FFT) and discrete cosine transform (DCT) butterfly operations. In order to improve the performance, a new alignment scheme, early normalization, a four-input leading zero anticipation (LZA), a dual-path algorithm, and pipelining are applied. The proposed designs are implemented for single precision and synthesized with a 45nm standard cell library. The proposed dual-path design reduces the latency by 25% compared to the traditional floating-point fused dot product unit. Based on a data flow analysis, the proposed design can be split into three pipeline stages. Since the latencies of the three stages are fairly well balanced, the throughput is increased by a factor of 2.8 compared to the non-pipelined dual-path design.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 IEEE 21st Symposium on Computer Arithmetic

自引率

0.00%

发文量

期刊最新文献

Numerical Reproducibility and Accuracy at ExaScale Truncated Logarithmic Approximation Comparison between Binary64 and Decimal64 Floating-Point Numbers Split-Path Fused Floating Point Multiply Accumulate (FPMAC) Precision, Accuracy, and Rounding Error Propagation in Exascale Computing