HRCIM-NTT：具有混合冗余数字的高效内存中计算NTT加速器

IF 5.2 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Circuits and Systems I: Regular Papers Pub Date : 2024-09-30 DOI:10.1109/TCSI.2024.3463184

Xu Zhang;Yaodong Wei;Minghao Li;Jing Tian;Zhongfeng Wang

{"title":"HRCIM-NTT：具有混合冗余数字的高效内存中计算NTT加速器","authors":"Xu Zhang;Yaodong Wei;Minghao Li;Jing Tian;Zhongfeng Wang","doi":"10.1109/TCSI.2024.3463184","DOIUrl":null,"url":null,"abstract":"Recently, four NIST-approved Post-Quantum Cryptography (PQC) algorithms are selected to be standardized. Three of them are lattice-based cryptographic schemes and feature the number-theoretic transform (NTT) as the computing bottleneck compelling fast and low-power hardware implementations. In this work, a high-speed and power-efficient NTT accelerator is presented leveraging the compute-in-memory (CIM) technique with bottom-up optimizations. Firstly, a carry-free modular multiplication (CFMM) algorithm is proposed, which utilizes on-the-fly reduction and hybrid-redundant representation to optimize the butterfly unit operation, the cornerstone of NTT. Based on the optimized algorithm, an efficient butterfly unit in memory (BUIM) is developed by co-designing with SRAM circuit, which saves the memory access energy, decreases operation cycles, and obtains ultra-short critical path. Additionally, the data pattern of CIM array is also improved to avoid redundant memory read/write operations, which further reduces memory access overhead. Finally, a combination of pipelined operation flow and constant interstage data mapping strategy is employed to bestow the proposed hybrid-redundant CIM NTT (HRCIM-NTT) architecture with minimized computing cycles and reduced routing overhead. The implementation under 45nm CMOS technology demonstrates that HRCIM-NTT achieves the highest throughput and lowest latency among the existing CIM-based NTT accelerators.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 1","pages":"214-227"},"PeriodicalIF":5.2000,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"HRCIM-NTT: An Efficient Compute-in-Memory NTT Accelerator With Hybrid-Redundant Numbers\",\"authors\":\"Xu Zhang;Yaodong Wei;Minghao Li;Jing Tian;Zhongfeng Wang\",\"doi\":\"10.1109/TCSI.2024.3463184\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, four NIST-approved Post-Quantum Cryptography (PQC) algorithms are selected to be standardized. Three of them are lattice-based cryptographic schemes and feature the number-theoretic transform (NTT) as the computing bottleneck compelling fast and low-power hardware implementations. In this work, a high-speed and power-efficient NTT accelerator is presented leveraging the compute-in-memory (CIM) technique with bottom-up optimizations. Firstly, a carry-free modular multiplication (CFMM) algorithm is proposed, which utilizes on-the-fly reduction and hybrid-redundant representation to optimize the butterfly unit operation, the cornerstone of NTT. Based on the optimized algorithm, an efficient butterfly unit in memory (BUIM) is developed by co-designing with SRAM circuit, which saves the memory access energy, decreases operation cycles, and obtains ultra-short critical path. Additionally, the data pattern of CIM array is also improved to avoid redundant memory read/write operations, which further reduces memory access overhead. Finally, a combination of pipelined operation flow and constant interstage data mapping strategy is employed to bestow the proposed hybrid-redundant CIM NTT (HRCIM-NTT) architecture with minimized computing cycles and reduced routing overhead. The implementation under 45nm CMOS technology demonstrates that HRCIM-NTT achieves the highest throughput and lowest latency among the existing CIM-based NTT accelerators.\",\"PeriodicalId\":13039,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems I: Regular Papers\",\"volume\":\"72 1\",\"pages\":\"214-227\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2024-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems I: Regular Papers\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10700038/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems I: Regular Papers","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10700038/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

最近，选择了四种nist批准的后量子加密（PQC）算法进行标准化。其中三种是基于格的加密方案，并以数字理论变换（NTT）为特征，作为快速低功耗硬件实现的计算瓶颈。在这项工作中，提出了一种高速和节能的NTT加速器，利用内存中计算（CIM）技术进行自下而上的优化。首先，提出了一种无携带模乘法（CFMM）算法，该算法利用动态约简和混合冗余表示来优化NTT的基础——蝴蝶单元运行。在优化算法的基础上，通过与SRAM电路协同设计，开发了一种高效的内存蝴蝶单元（BUIM），节省了存储器访问能量，缩短了运算周期，并获得了超短的关键路径。此外，还改进了CIM阵列的数据模式，避免了冗余的内存读/写操作，进一步降低了内存访问开销。最后，采用流水线操作流和恒级间数据映射策略相结合，使所提出的混合冗余CIM NTT （HRCIM-NTT）体系结构具有最小化的计算周期和减少的路由开销。在45纳米CMOS技术下的实现表明，HRCIM-NTT在现有的基于cim的NTT加速器中实现了最高的吞吐量和最低的延迟。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

HRCIM-NTT: An Efficient Compute-in-Memory NTT Accelerator With Hybrid-Redundant Numbers

Recently, four NIST-approved Post-Quantum Cryptography (PQC) algorithms are selected to be standardized. Three of them are lattice-based cryptographic schemes and feature the number-theoretic transform (NTT) as the computing bottleneck compelling fast and low-power hardware implementations. In this work, a high-speed and power-efficient NTT accelerator is presented leveraging the compute-in-memory (CIM) technique with bottom-up optimizations. Firstly, a carry-free modular multiplication (CFMM) algorithm is proposed, which utilizes on-the-fly reduction and hybrid-redundant representation to optimize the butterfly unit operation, the cornerstone of NTT. Based on the optimized algorithm, an efficient butterfly unit in memory (BUIM) is developed by co-designing with SRAM circuit, which saves the memory access energy, decreases operation cycles, and obtains ultra-short critical path. Additionally, the data pattern of CIM array is also improved to avoid redundant memory read/write operations, which further reduces memory access overhead. Finally, a combination of pipelined operation flow and constant interstage data mapping strategy is employed to bestow the proposed hybrid-redundant CIM NTT (HRCIM-NTT) architecture with minimized computing cycles and reduced routing overhead. The implementation under 45nm CMOS technology demonstrates that HRCIM-NTT achieves the highest throughput and lowest latency among the existing CIM-based NTT accelerators.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Circuits and Systems I: Regular Papers 工程技术-工程：电子与电气

CiteScore

9.80

自引率

11.80%

发文量

441

审稿时长

2 months

期刊介绍： TCAS I publishes regular papers in the field specified by the theory, analysis, design, and practical implementations of circuits, and the application of circuit techniques to systems and to signal processing. Included is the whole spectrum from basic scientific theory to industrial applications. The field of interest covered includes: - Circuits: Analog, Digital and Mixed Signal Circuits and Systems - Nonlinear Circuits and Systems, Integrated Sensors, MEMS and Systems on Chip, Nanoscale Circuits and Systems, Optoelectronic - Circuits and Systems, Power Electronics and Systems - Software for Analog-and-Logic Circuits and Systems - Control aspects of Circuits and Systems.

期刊最新文献

Table of Contents IEEE Circuits and Systems Society Information IEEE Transactions on Circuits and Systems--I: Regular Papers Information for Authors IEEE Transactions on Circuits and Systems--I: Regular Papers Publication Information Guest Editorial Special Issue on Emerging Hardware Security and Trust Technologies—AsianHOST 2023