HRCIM-NTT:具有混合冗余数字的高效内存中计算NTT加速器

IF 5.2 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Circuits and Systems I: Regular Papers Pub Date : 2024-09-30 DOI:10.1109/TCSI.2024.3463184
Xu Zhang;Yaodong Wei;Minghao Li;Jing Tian;Zhongfeng Wang
{"title":"HRCIM-NTT:具有混合冗余数字的高效内存中计算NTT加速器","authors":"Xu Zhang;Yaodong Wei;Minghao Li;Jing Tian;Zhongfeng Wang","doi":"10.1109/TCSI.2024.3463184","DOIUrl":null,"url":null,"abstract":"Recently, four NIST-approved Post-Quantum Cryptography (PQC) algorithms are selected to be standardized. Three of them are lattice-based cryptographic schemes and feature the number-theoretic transform (NTT) as the computing bottleneck compelling fast and low-power hardware implementations. In this work, a high-speed and power-efficient NTT accelerator is presented leveraging the compute-in-memory (CIM) technique with bottom-up optimizations. Firstly, a carry-free modular multiplication (CFMM) algorithm is proposed, which utilizes on-the-fly reduction and hybrid-redundant representation to optimize the butterfly unit operation, the cornerstone of NTT. Based on the optimized algorithm, an efficient butterfly unit in memory (BUIM) is developed by co-designing with SRAM circuit, which saves the memory access energy, decreases operation cycles, and obtains ultra-short critical path. Additionally, the data pattern of CIM array is also improved to avoid redundant memory read/write operations, which further reduces memory access overhead. Finally, a combination of pipelined operation flow and constant interstage data mapping strategy is employed to bestow the proposed hybrid-redundant CIM NTT (HRCIM-NTT) architecture with minimized computing cycles and reduced routing overhead. The implementation under 45nm CMOS technology demonstrates that HRCIM-NTT achieves the highest throughput and lowest latency among the existing CIM-based NTT accelerators.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 1","pages":"214-227"},"PeriodicalIF":5.2000,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"HRCIM-NTT: An Efficient Compute-in-Memory NTT Accelerator With Hybrid-Redundant Numbers\",\"authors\":\"Xu Zhang;Yaodong Wei;Minghao Li;Jing Tian;Zhongfeng Wang\",\"doi\":\"10.1109/TCSI.2024.3463184\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, four NIST-approved Post-Quantum Cryptography (PQC) algorithms are selected to be standardized. Three of them are lattice-based cryptographic schemes and feature the number-theoretic transform (NTT) as the computing bottleneck compelling fast and low-power hardware implementations. In this work, a high-speed and power-efficient NTT accelerator is presented leveraging the compute-in-memory (CIM) technique with bottom-up optimizations. Firstly, a carry-free modular multiplication (CFMM) algorithm is proposed, which utilizes on-the-fly reduction and hybrid-redundant representation to optimize the butterfly unit operation, the cornerstone of NTT. Based on the optimized algorithm, an efficient butterfly unit in memory (BUIM) is developed by co-designing with SRAM circuit, which saves the memory access energy, decreases operation cycles, and obtains ultra-short critical path. Additionally, the data pattern of CIM array is also improved to avoid redundant memory read/write operations, which further reduces memory access overhead. Finally, a combination of pipelined operation flow and constant interstage data mapping strategy is employed to bestow the proposed hybrid-redundant CIM NTT (HRCIM-NTT) architecture with minimized computing cycles and reduced routing overhead. The implementation under 45nm CMOS technology demonstrates that HRCIM-NTT achieves the highest throughput and lowest latency among the existing CIM-based NTT accelerators.\",\"PeriodicalId\":13039,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems I: Regular Papers\",\"volume\":\"72 1\",\"pages\":\"214-227\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2024-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems I: Regular Papers\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10700038/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems I: Regular Papers","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10700038/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

最近,选择了四种nist批准的后量子加密(PQC)算法进行标准化。其中三种是基于格的加密方案,并以数字理论变换(NTT)为特征,作为快速低功耗硬件实现的计算瓶颈。在这项工作中,提出了一种高速和节能的NTT加速器,利用内存中计算(CIM)技术进行自下而上的优化。首先,提出了一种无携带模乘法(CFMM)算法,该算法利用动态约简和混合冗余表示来优化NTT的基础——蝴蝶单元运行。在优化算法的基础上,通过与SRAM电路协同设计,开发了一种高效的内存蝴蝶单元(BUIM),节省了存储器访问能量,缩短了运算周期,并获得了超短的关键路径。此外,还改进了CIM阵列的数据模式,避免了冗余的内存读/写操作,进一步降低了内存访问开销。最后,采用流水线操作流和恒级间数据映射策略相结合,使所提出的混合冗余CIM NTT (HRCIM-NTT)体系结构具有最小化的计算周期和减少的路由开销。在45纳米CMOS技术下的实现表明,HRCIM-NTT在现有的基于cim的NTT加速器中实现了最高的吞吐量和最低的延迟。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
HRCIM-NTT: An Efficient Compute-in-Memory NTT Accelerator With Hybrid-Redundant Numbers
Recently, four NIST-approved Post-Quantum Cryptography (PQC) algorithms are selected to be standardized. Three of them are lattice-based cryptographic schemes and feature the number-theoretic transform (NTT) as the computing bottleneck compelling fast and low-power hardware implementations. In this work, a high-speed and power-efficient NTT accelerator is presented leveraging the compute-in-memory (CIM) technique with bottom-up optimizations. Firstly, a carry-free modular multiplication (CFMM) algorithm is proposed, which utilizes on-the-fly reduction and hybrid-redundant representation to optimize the butterfly unit operation, the cornerstone of NTT. Based on the optimized algorithm, an efficient butterfly unit in memory (BUIM) is developed by co-designing with SRAM circuit, which saves the memory access energy, decreases operation cycles, and obtains ultra-short critical path. Additionally, the data pattern of CIM array is also improved to avoid redundant memory read/write operations, which further reduces memory access overhead. Finally, a combination of pipelined operation flow and constant interstage data mapping strategy is employed to bestow the proposed hybrid-redundant CIM NTT (HRCIM-NTT) architecture with minimized computing cycles and reduced routing overhead. The implementation under 45nm CMOS technology demonstrates that HRCIM-NTT achieves the highest throughput and lowest latency among the existing CIM-based NTT accelerators.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Circuits and Systems I: Regular Papers
IEEE Transactions on Circuits and Systems I: Regular Papers 工程技术-工程:电子与电气
CiteScore
9.80
自引率
11.80%
发文量
441
审稿时长
2 months
期刊介绍: TCAS I publishes regular papers in the field specified by the theory, analysis, design, and practical implementations of circuits, and the application of circuit techniques to systems and to signal processing. Included is the whole spectrum from basic scientific theory to industrial applications. The field of interest covered includes: - Circuits: Analog, Digital and Mixed Signal Circuits and Systems - Nonlinear Circuits and Systems, Integrated Sensors, MEMS and Systems on Chip, Nanoscale Circuits and Systems, Optoelectronic - Circuits and Systems, Power Electronics and Systems - Software for Analog-and-Logic Circuits and Systems - Control aspects of Circuits and Systems.
期刊最新文献
Table of Contents IEEE Circuits and Systems Society Information IEEE Transactions on Circuits and Systems--I: Regular Papers Information for Authors IEEE Transactions on Circuits and Systems--I: Regular Papers Publication Information Guest Editorial Special Issue on Emerging Hardware Security and Trust Technologies—AsianHOST 2023
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1