FPGA平台上基于cordic的高性能近似MAC架构

IF 2.5 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Integration-The Vlsi Journal Pub Date : 2025-03-01 Epub Date: 2024-12-16 DOI:10.1016/j.vlsi.2024.102338
Burhan Khurshid
{"title":"FPGA平台上基于cordic的高性能近似MAC架构","authors":"Burhan Khurshid","doi":"10.1016/j.vlsi.2024.102338","DOIUrl":null,"url":null,"abstract":"<div><div>CORDIC is a versatile algorithm frequently used in different signal-processing operations. While using CORDIC-based computations in evaluating trigonometric and transcendental functions is quite prevalent, the resource overhead associated with its implementation does not justify its use in evaluating linear functions like multiplication and addition. However, with the emergence of approximate computing as an attractive paradigm for error-resilient applications, the algorithm can be used to design approximate linear computational units that completely justify the accuracy-performance trade-offs. In this paper, we model the CORDIC-based computations to emulate the multiply-accumulate operation, albeit with some loss of accuracy. We specifically present two incremental CORDIC-based multiply-accumulate architectures with an attempt to improve the accuracy-performance trade-offs with each increment. A detailed Pareto analysis for 8 and 16-bit unsigned and signed multiply-accumulate structures is conducted to determine the optimum number of computing stages and the associated bit-precision of the intermediate results. Accuracy and performance analysis using 6th and 7th generation FPGAs reveals a substantial improvement over state-of-the-art designs. The proposed architectures are also tested using three image processing applications, and the output results are promising.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"101 ","pages":"Article 102338"},"PeriodicalIF":2.5000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"High-performance CORDIC-based approximate MAC architectures for FPGA platforms\",\"authors\":\"Burhan Khurshid\",\"doi\":\"10.1016/j.vlsi.2024.102338\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>CORDIC is a versatile algorithm frequently used in different signal-processing operations. While using CORDIC-based computations in evaluating trigonometric and transcendental functions is quite prevalent, the resource overhead associated with its implementation does not justify its use in evaluating linear functions like multiplication and addition. However, with the emergence of approximate computing as an attractive paradigm for error-resilient applications, the algorithm can be used to design approximate linear computational units that completely justify the accuracy-performance trade-offs. In this paper, we model the CORDIC-based computations to emulate the multiply-accumulate operation, albeit with some loss of accuracy. We specifically present two incremental CORDIC-based multiply-accumulate architectures with an attempt to improve the accuracy-performance trade-offs with each increment. A detailed Pareto analysis for 8 and 16-bit unsigned and signed multiply-accumulate structures is conducted to determine the optimum number of computing stages and the associated bit-precision of the intermediate results. Accuracy and performance analysis using 6th and 7th generation FPGAs reveals a substantial improvement over state-of-the-art designs. The proposed architectures are also tested using three image processing applications, and the output results are promising.</div></div>\",\"PeriodicalId\":54973,\"journal\":{\"name\":\"Integration-The Vlsi Journal\",\"volume\":\"101 \",\"pages\":\"Article 102338\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2025-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Integration-The Vlsi Journal\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167926024002025\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/12/16 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Integration-The Vlsi Journal","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167926024002025","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/16 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

摘要

CORDIC是一种通用算法,经常用于不同的信号处理操作。虽然在计算三角函数和超越函数时使用基于cordic的计算相当普遍,但与它的实现相关的资源开销并不能证明将其用于计算乘法和加法等线性函数是合理的。然而,随着近似计算作为容错应用的一种有吸引力的范例的出现,该算法可用于设计近似线性计算单元,完全证明准确性和性能之间的权衡是合理的。在本文中,我们对基于cordic的计算建模,以模拟乘法累加操作,尽管有一些准确性损失。我们特别提出了两种基于cordic的增量式乘法累积架构,试图通过每次增量来改进精度和性能之间的权衡。对8位和16位无符号和有符号乘累加结构进行了详细的Pareto分析,以确定计算阶段的最佳数量和中间结果的相关位精度。使用第6代和第7代fpga的精度和性能分析显示,在最先进的设计上有了实质性的改进。在三个图像处理应用程序中对所提出的体系结构进行了测试,输出结果令人满意。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
High-performance CORDIC-based approximate MAC architectures for FPGA platforms
CORDIC is a versatile algorithm frequently used in different signal-processing operations. While using CORDIC-based computations in evaluating trigonometric and transcendental functions is quite prevalent, the resource overhead associated with its implementation does not justify its use in evaluating linear functions like multiplication and addition. However, with the emergence of approximate computing as an attractive paradigm for error-resilient applications, the algorithm can be used to design approximate linear computational units that completely justify the accuracy-performance trade-offs. In this paper, we model the CORDIC-based computations to emulate the multiply-accumulate operation, albeit with some loss of accuracy. We specifically present two incremental CORDIC-based multiply-accumulate architectures with an attempt to improve the accuracy-performance trade-offs with each increment. A detailed Pareto analysis for 8 and 16-bit unsigned and signed multiply-accumulate structures is conducted to determine the optimum number of computing stages and the associated bit-precision of the intermediate results. Accuracy and performance analysis using 6th and 7th generation FPGAs reveals a substantial improvement over state-of-the-art designs. The proposed architectures are also tested using three image processing applications, and the output results are promising.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Integration-The Vlsi Journal
Integration-The Vlsi Journal 工程技术-工程:电子与电气
CiteScore
3.80
自引率
5.30%
发文量
107
审稿时长
6 months
期刊介绍: Integration''s aim is to cover every aspect of the VLSI area, with an emphasis on cross-fertilization between various fields of science, and the design, verification, test and applications of integrated circuits and systems, as well as closely related topics in process and device technologies. Individual issues will feature peer-reviewed tutorials and articles as well as reviews of recent publications. The intended coverage of the journal can be assessed by examining the following (non-exclusive) list of topics: Specification methods and languages; Analog/Digital Integrated Circuits and Systems; VLSI architectures; Algorithms, methods and tools for modeling, simulation, synthesis and verification of integrated circuits and systems of any complexity; Embedded systems; High-level synthesis for VLSI systems; Logic synthesis and finite automata; Testing, design-for-test and test generation algorithms; Physical design; Formal verification; Algorithms implemented in VLSI systems; Systems engineering; Heterogeneous systems.
期刊最新文献
Optimization and FPGA implementation of Echo State Networks to predict the chaotic Lorenz system Electronically tunable resistorless memtranstor emulator using CCCCTA and its application to secure image encryption Design and Noise-Aware Validation of a Testable Dual-Edge Triggered Reversible D Flip-Flop Using IBM Quantum Qiskit Simulation Analysis and design of Marchand balun and its application in CMOS down-conversion mixers for satellite network receivers Error prediction in approximate multiplier using regression and feature expansion method
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1