Dilithium 紧凑型指令集扩展

IF 2.8 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE ACM Transactions on Embedded Computing Systems Pub Date : 2024-02-02 DOI:10.1145/3643826
Lu Li, Qi Tian, Guofeng Qin, Shuaiyu Chen, Weijia Wang
{"title":"Dilithium 紧凑型指令集扩展","authors":"Lu Li, Qi Tian, Guofeng Qin, Shuaiyu Chen, Weijia Wang","doi":"10.1145/3643826","DOIUrl":null,"url":null,"abstract":"<p>Post-quantum cryptography is considered to provide security against both traditional and quantum computer attacks. Dilithium is a digital signature algorithm that derives its security from the challenge of finding short vectors in lattices. It has been selected as one of the standardizations in the NIST post-quantum cryptography project. Hardware-software co-design is a commonly adopted implementation strategy to address various implementation challenges, including limited resources, high performance, and flexibility requirements. In this study, we investigate using compact instruction set extensions (ISEs) for Dilithium, aiming to improve software efficiency with low hardware overheads. To begin with, we propose tightly coupled accelerators that are deeply integrated into the RISC-V processor. These accelerators target the most computationally demanding components in resource-constrained processors, such as polynomial generation, Number Theoretic Transform (NTT), and modular arithmetic. Next, we design a set of custom instructions that seamlessly integrate with the RISC-V base instruction formats, completing the accelerators in a compact manner. Subsequently, we implement our ISEs in a chip design for the Hummingbird E203 core and conduct performance benchmarks for Dilithium utilizing these ISEs. Additionally, we evaluate the resource consumption of the ISEs on FPGA and ASIC technologies. Compared to the reference software implementation on the RISC-V core, our co-design demonstrates a remarkable speedup factor ranging from 6.95 to 9.96. This significant improvement in performance is achieved by incorporating additional hardware resources, specifically, a \\(35\\% \\) increase in LUTs, a \\(14\\% \\) increase in FFs, 7 additional DSPs, and no additional RAM. Furthermore, compared to the state-of-the-art approach, our work achieves faster speed performance with a reduced circuit cost. Specifically, the usage of additional LUTs, FFs, and RAMs is reduced by \\(47.53\\% \\), \\(50.43\\% \\), and \\(100\\% \\), respectively. On ASIC technology, our approach demonstrates 12 412 cell counts. Our co-design provides a better trade-off implementation on speed performance and circuit overheads.</p>","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":"1 1","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Compact Instruction Set Extensions for Dilithium\",\"authors\":\"Lu Li, Qi Tian, Guofeng Qin, Shuaiyu Chen, Weijia Wang\",\"doi\":\"10.1145/3643826\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Post-quantum cryptography is considered to provide security against both traditional and quantum computer attacks. Dilithium is a digital signature algorithm that derives its security from the challenge of finding short vectors in lattices. It has been selected as one of the standardizations in the NIST post-quantum cryptography project. Hardware-software co-design is a commonly adopted implementation strategy to address various implementation challenges, including limited resources, high performance, and flexibility requirements. In this study, we investigate using compact instruction set extensions (ISEs) for Dilithium, aiming to improve software efficiency with low hardware overheads. To begin with, we propose tightly coupled accelerators that are deeply integrated into the RISC-V processor. These accelerators target the most computationally demanding components in resource-constrained processors, such as polynomial generation, Number Theoretic Transform (NTT), and modular arithmetic. Next, we design a set of custom instructions that seamlessly integrate with the RISC-V base instruction formats, completing the accelerators in a compact manner. Subsequently, we implement our ISEs in a chip design for the Hummingbird E203 core and conduct performance benchmarks for Dilithium utilizing these ISEs. Additionally, we evaluate the resource consumption of the ISEs on FPGA and ASIC technologies. Compared to the reference software implementation on the RISC-V core, our co-design demonstrates a remarkable speedup factor ranging from 6.95 to 9.96. This significant improvement in performance is achieved by incorporating additional hardware resources, specifically, a \\\\(35\\\\% \\\\) increase in LUTs, a \\\\(14\\\\% \\\\) increase in FFs, 7 additional DSPs, and no additional RAM. Furthermore, compared to the state-of-the-art approach, our work achieves faster speed performance with a reduced circuit cost. Specifically, the usage of additional LUTs, FFs, and RAMs is reduced by \\\\(47.53\\\\% \\\\), \\\\(50.43\\\\% \\\\), and \\\\(100\\\\% \\\\), respectively. On ASIC technology, our approach demonstrates 12 412 cell counts. Our co-design provides a better trade-off implementation on speed performance and circuit overheads.</p>\",\"PeriodicalId\":50914,\"journal\":{\"name\":\"ACM Transactions on Embedded Computing Systems\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2024-02-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Embedded Computing Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3643826\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Embedded Computing Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3643826","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

摘要

后量子加密算法被认为能提供安全防护,既能抵御传统计算机攻击,也能抵御量子计算机攻击。Dilithium 是一种数字签名算法,其安全性来自于在网格中寻找短向量的挑战。它已被选为 NIST 后量子加密项目的标准化算法之一。硬件-软件协同设计是一种普遍采用的实现策略,以应对各种实现挑战,包括有限的资源、高性能和灵活性要求。在本研究中,我们研究了如何为 Dilithium 使用紧凑型指令集扩展(ISE),旨在以较低的硬件开销提高软件效率。首先,我们提出了与 RISC-V 处理器深度集成的紧密耦合加速器。这些加速器针对资源受限处理器中计算要求最高的组件,如多项式生成、数论变换(NTT)和模块化算术。接下来,我们设计了一套自定义指令,与 RISC-V 基本指令格式无缝集成,以紧凑的方式完成加速器。随后,我们在蜂鸟 E203 内核的芯片设计中实现了 ISE,并利用这些 ISE 对 Dilithium 进行了性能基准测试。此外,我们还评估了 ISE 在 FPGA 和 ASIC 技术上的资源消耗。与 RISC-V 内核上的参考软件实现相比,我们的协同设计实现了 6.95 到 9.96 的显著提速。性能的大幅提升是通过增加硬件资源实现的,具体来说,LUT增加了35%,FF增加了14%,增加了7个DSP,但没有增加RAM。此外,与最先进的方法相比,我们的工作实现了更快的速度性能,同时降低了电路成本。具体来说,额外的LUT、FF和RAM的使用分别减少了47.53%、50.43%和100%。在 ASIC 技术上,我们的方法展示了 12 412 个单元数。我们的协同设计在速度性能和电路开销之间实现了更好的权衡。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Compact Instruction Set Extensions for Dilithium

Post-quantum cryptography is considered to provide security against both traditional and quantum computer attacks. Dilithium is a digital signature algorithm that derives its security from the challenge of finding short vectors in lattices. It has been selected as one of the standardizations in the NIST post-quantum cryptography project. Hardware-software co-design is a commonly adopted implementation strategy to address various implementation challenges, including limited resources, high performance, and flexibility requirements. In this study, we investigate using compact instruction set extensions (ISEs) for Dilithium, aiming to improve software efficiency with low hardware overheads. To begin with, we propose tightly coupled accelerators that are deeply integrated into the RISC-V processor. These accelerators target the most computationally demanding components in resource-constrained processors, such as polynomial generation, Number Theoretic Transform (NTT), and modular arithmetic. Next, we design a set of custom instructions that seamlessly integrate with the RISC-V base instruction formats, completing the accelerators in a compact manner. Subsequently, we implement our ISEs in a chip design for the Hummingbird E203 core and conduct performance benchmarks for Dilithium utilizing these ISEs. Additionally, we evaluate the resource consumption of the ISEs on FPGA and ASIC technologies. Compared to the reference software implementation on the RISC-V core, our co-design demonstrates a remarkable speedup factor ranging from 6.95 to 9.96. This significant improvement in performance is achieved by incorporating additional hardware resources, specifically, a \(35\% \) increase in LUTs, a \(14\% \) increase in FFs, 7 additional DSPs, and no additional RAM. Furthermore, compared to the state-of-the-art approach, our work achieves faster speed performance with a reduced circuit cost. Specifically, the usage of additional LUTs, FFs, and RAMs is reduced by \(47.53\% \), \(50.43\% \), and \(100\% \), respectively. On ASIC technology, our approach demonstrates 12 412 cell counts. Our co-design provides a better trade-off implementation on speed performance and circuit overheads.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems 工程技术-计算机:软件工程
CiteScore
3.70
自引率
0.00%
发文量
138
审稿时长
6 months
期刊介绍: The design of embedded computing systems, both the software and hardware, increasingly relies on sophisticated algorithms, analytical models, and methodologies. ACM Transactions on Embedded Computing Systems (TECS) aims to present the leading work relating to the analysis, design, behavior, and experience with embedded computing systems.
期刊最新文献
Multi-Traffic Resource Optimization for Real-Time Applications with 5G Configured Grant Scheduling Dynamic Cluster Head Selection in WSN Lightweight Hardware-Based Cache Side-Channel Attack Detection for Edge Devices (Edge-CaSCADe) Reordering Functions in Mobiles Apps for Reduced Size and Faster Start-Up NAVIDRO, a CARES architectural style for configuring drone co-simulation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1