FPGA平台上基于cordic的高性能近似MAC架构

IF 2.5 3区工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Integration-The Vlsi Journal Pub Date : 2025-03-01 Epub Date: 2024-12-16 DOI:10.1016/j.vlsi.2024.102338

Burhan Khurshid

{"title":"FPGA平台上基于cordic的高性能近似MAC架构","authors":"Burhan Khurshid","doi":"10.1016/j.vlsi.2024.102338","DOIUrl":null,"url":null,"abstract":"<div><div>CORDIC is a versatile algorithm frequently used in different signal-processing operations. While using CORDIC-based computations in evaluating trigonometric and transcendental functions is quite prevalent, the resource overhead associated with its implementation does not justify its use in evaluating linear functions like multiplication and addition. However, with the emergence of approximate computing as an attractive paradigm for error-resilient applications, the algorithm can be used to design approximate linear computational units that completely justify the accuracy-performance trade-offs. In this paper, we model the CORDIC-based computations to emulate the multiply-accumulate operation, albeit with some loss of accuracy. We specifically present two incremental CORDIC-based multiply-accumulate architectures with an attempt to improve the accuracy-performance trade-offs with each increment. A detailed Pareto analysis for 8 and 16-bit unsigned and signed multiply-accumulate structures is conducted to determine the optimum number of computing stages and the associated bit-precision of the intermediate results. Accuracy and performance analysis using 6th and 7th generation FPGAs reveals a substantial improvement over state-of-the-art designs. The proposed architectures are also tested using three image processing applications, and the output results are promising.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"101 ","pages":"Article 102338"},"PeriodicalIF":2.5000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"High-performance CORDIC-based approximate MAC architectures for FPGA platforms\",\"authors\":\"Burhan Khurshid\",\"doi\":\"10.1016/j.vlsi.2024.102338\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>CORDIC is a versatile algorithm frequently used in different signal-processing operations. While using CORDIC-based computations in evaluating trigonometric and transcendental functions is quite prevalent, the resource overhead associated with its implementation does not justify its use in evaluating linear functions like multiplication and addition. However, with the emergence of approximate computing as an attractive paradigm for error-resilient applications, the algorithm can be used to design approximate linear computational units that completely justify the accuracy-performance trade-offs. In this paper, we model the CORDIC-based computations to emulate the multiply-accumulate operation, albeit with some loss of accuracy. We specifically present two incremental CORDIC-based multiply-accumulate architectures with an attempt to improve the accuracy-performance trade-offs with each increment. A detailed Pareto analysis for 8 and 16-bit unsigned and signed multiply-accumulate structures is conducted to determine the optimum number of computing stages and the associated bit-precision of the intermediate results. Accuracy and performance analysis using 6th and 7th generation FPGAs reveals a substantial improvement over state-of-the-art designs. The proposed architectures are also tested using three image processing applications, and the output results are promising.</div></div>\",\"PeriodicalId\":54973,\"journal\":{\"name\":\"Integration-The Vlsi Journal\",\"volume\":\"101 \",\"pages\":\"Article 102338\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2025-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Integration-The Vlsi Journal\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167926024002025\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/12/16 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Integration-The Vlsi Journal","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167926024002025","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/16 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

CORDIC是一种通用算法，经常用于不同的信号处理操作。虽然在计算三角函数和超越函数时使用基于cordic的计算相当普遍，但与它的实现相关的资源开销并不能证明将其用于计算乘法和加法等线性函数是合理的。然而，随着近似计算作为容错应用的一种有吸引力的范例的出现，该算法可用于设计近似线性计算单元，完全证明准确性和性能之间的权衡是合理的。在本文中，我们对基于cordic的计算建模，以模拟乘法累加操作，尽管有一些准确性损失。我们特别提出了两种基于cordic的增量式乘法累积架构，试图通过每次增量来改进精度和性能之间的权衡。对8位和16位无符号和有符号乘累加结构进行了详细的Pareto分析，以确定计算阶段的最佳数量和中间结果的相关位精度。使用第6代和第7代fpga的精度和性能分析显示，在最先进的设计上有了实质性的改进。在三个图像处理应用程序中对所提出的体系结构进行了测试，输出结果令人满意。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

High-performance CORDIC-based approximate MAC architectures for FPGA platforms

CORDIC is a versatile algorithm frequently used in different signal-processing operations. While using CORDIC-based computations in evaluating trigonometric and transcendental functions is quite prevalent, the resource overhead associated with its implementation does not justify its use in evaluating linear functions like multiplication and addition. However, with the emergence of approximate computing as an attractive paradigm for error-resilient applications, the algorithm can be used to design approximate linear computational units that completely justify the accuracy-performance trade-offs. In this paper, we model the CORDIC-based computations to emulate the multiply-accumulate operation, albeit with some loss of accuracy. We specifically present two incremental CORDIC-based multiply-accumulate architectures with an attempt to improve the accuracy-performance trade-offs with each increment. A detailed Pareto analysis for 8 and 16-bit unsigned and signed multiply-accumulate structures is conducted to determine the optimum number of computing stages and the associated bit-precision of the intermediate results. Accuracy and performance analysis using 6th and 7th generation FPGAs reveals a substantial improvement over state-of-the-art designs. The proposed architectures are also tested using three image processing applications, and the output results are promising.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Integration-The Vlsi Journal 工程技术-工程：电子与电气

CiteScore

3.80

自引率

5.30%

发文量

107

审稿时长

6 months

期刊介绍： Integration''s aim is to cover every aspect of the VLSI area, with an emphasis on cross-fertilization between various fields of science, and the design, verification, test and applications of integrated circuits and systems, as well as closely related topics in process and device technologies. Individual issues will feature peer-reviewed tutorials and articles as well as reviews of recent publications. The intended coverage of the journal can be assessed by examining the following (non-exclusive) list of topics: Specification methods and languages; Analog/Digital Integrated Circuits and Systems; VLSI architectures; Algorithms, methods and tools for modeling, simulation, synthesis and verification of integrated circuits and systems of any complexity; Embedded systems; High-level synthesis for VLSI systems; Logic synthesis and finite automata; Testing, design-for-test and test generation algorithms; Physical design; Formal verification; Algorithms implemented in VLSI systems; Systems engineering; Heterogeneous systems.