ECO-CRYSTALS:标准RISC-V ISA上的高效加密晶体

IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE IEEE Transactions on Computers Pub Date : 2024-10-21 DOI:10.1109/TC.2024.3483631
Xinyi Ji;Jiankuo Dong;Junhao Huang;Zhijian Yuan;Wangchen Dai;Fu Xiao;Jingqiang Lin
{"title":"ECO-CRYSTALS:标准RISC-V ISA上的高效加密晶体","authors":"Xinyi Ji;Jiankuo Dong;Junhao Huang;Zhijian Yuan;Wangchen Dai;Fu Xiao;Jingqiang Lin","doi":"10.1109/TC.2024.3483631","DOIUrl":null,"url":null,"abstract":"The field of post-quantum cryptography (PQC) is continuously evolving. Many researchers are exploring efficient PQC implementation on various platforms, including x86, ARM, FPGA, GPU, etc. In this paper, we present an Efficient CryptOgraphy CRYSTALS (ECO-CRYSTALS) implementation on standard 64-bit RISC-V Instruction Set Architecture (ISA). The target schemes are two winners of the National Institute of Standards and Technology (NIST) PQC competition: CRYSTALS-Kyber and CRYSTALS-Dilithium, where the two most time-consuming operations are Keccak and polynomial multiplication. Notably, this paper is the first highly-optimized assembly software implementation to deploy Kyber and Dilithium on the 64-bit RISC-V ISA. Firstly, we propose a better scheduling strategy for Keccak, which is specifically tailored for the 64-bit dual-issue RISC-V architecture. Our 24-round Keccak permutation (Keccak-<inline-formula><tex-math>$p$</tex-math></inline-formula>[1600,24]) achieves a 59.18% speed-up compared to the reference implementation. Secondly, we apply two modular arithmetic (Montgomery arithmetic and Plantard arithmetic) in the polynomial multiplication of Kyber and Dilithium to get a better lazy reduction. Then, we propose a flexible dual-instruction-issue scheme of Number Theoretic Transform (NTT). As for the matrix-vector multiplication, we introduce a row-to-column processing methodology to minimize the expensive memory access operations. Compared to the reference implementation, we obtain a speedup of 53.85%<inline-formula><tex-math>$\\thicksim$</tex-math></inline-formula>85.57% for NTT, matrix-vector multiplication, and INTT in our ECO-CRYSTALS. Finally, the ECO-CRYSTALS implementation for key generation, encapsulation, and decapsulation in Kyber achieves 399k, 448k, and 479k cycles respectively, achieving speedups of 60.82%, 63.93%, and 65.56% compared to the NIST reference implementation. Similarly, the ECO-CRYSTALS implementation for key generation, sign, and verify in Dilithium reaches 1 364k, 3 191k, and 1 369k cycles, showcasing speedups of 54.84%, 64.98%, and 57.20%, respectively.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 2","pages":"401-413"},"PeriodicalIF":3.8000,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ECO-CRYSTALS: Efficient Cryptography CRYSTALS on Standard RISC-V ISA\",\"authors\":\"Xinyi Ji;Jiankuo Dong;Junhao Huang;Zhijian Yuan;Wangchen Dai;Fu Xiao;Jingqiang Lin\",\"doi\":\"10.1109/TC.2024.3483631\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The field of post-quantum cryptography (PQC) is continuously evolving. Many researchers are exploring efficient PQC implementation on various platforms, including x86, ARM, FPGA, GPU, etc. In this paper, we present an Efficient CryptOgraphy CRYSTALS (ECO-CRYSTALS) implementation on standard 64-bit RISC-V Instruction Set Architecture (ISA). The target schemes are two winners of the National Institute of Standards and Technology (NIST) PQC competition: CRYSTALS-Kyber and CRYSTALS-Dilithium, where the two most time-consuming operations are Keccak and polynomial multiplication. Notably, this paper is the first highly-optimized assembly software implementation to deploy Kyber and Dilithium on the 64-bit RISC-V ISA. Firstly, we propose a better scheduling strategy for Keccak, which is specifically tailored for the 64-bit dual-issue RISC-V architecture. Our 24-round Keccak permutation (Keccak-<inline-formula><tex-math>$p$</tex-math></inline-formula>[1600,24]) achieves a 59.18% speed-up compared to the reference implementation. Secondly, we apply two modular arithmetic (Montgomery arithmetic and Plantard arithmetic) in the polynomial multiplication of Kyber and Dilithium to get a better lazy reduction. Then, we propose a flexible dual-instruction-issue scheme of Number Theoretic Transform (NTT). As for the matrix-vector multiplication, we introduce a row-to-column processing methodology to minimize the expensive memory access operations. Compared to the reference implementation, we obtain a speedup of 53.85%<inline-formula><tex-math>$\\\\thicksim$</tex-math></inline-formula>85.57% for NTT, matrix-vector multiplication, and INTT in our ECO-CRYSTALS. Finally, the ECO-CRYSTALS implementation for key generation, encapsulation, and decapsulation in Kyber achieves 399k, 448k, and 479k cycles respectively, achieving speedups of 60.82%, 63.93%, and 65.56% compared to the NIST reference implementation. Similarly, the ECO-CRYSTALS implementation for key generation, sign, and verify in Dilithium reaches 1 364k, 3 191k, and 1 369k cycles, showcasing speedups of 54.84%, 64.98%, and 57.20%, respectively.\",\"PeriodicalId\":13087,\"journal\":{\"name\":\"IEEE Transactions on Computers\",\"volume\":\"74 2\",\"pages\":\"401-413\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2024-10-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computers\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10723802/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10723802/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

摘要

后量子密码学(PQC)领域正在不断发展。许多研究者正在探索各种平台上高效的PQC实现,包括x86、ARM、FPGA、GPU等。本文提出了一种基于标准64位RISC-V指令集架构(ISA)的高效加密晶体(ECO-CRYSTALS)实现。目标方案是国家标准与技术研究所(NIST) PQC竞赛的两个获胜者:CRYSTALS-Kyber和crystals - diliium,其中两个最耗时的操作是Keccak和多项式乘法。值得注意的是,本文是第一个在64位RISC-V ISA上部署Kyber和Dilithium的高度优化的汇编软件实现。首先,我们为Keccak提出了一个更好的调度策略,该策略是专门为64位双发行RISC-V架构量身定制的。与参考实现相比,我们的24轮Keccak排列(Keccak-$p$[1600,24])实现了59.18%的加速。其次,我们在Kyber和diilithium的多项式乘法中应用了Montgomery和Plantard两种模算法,得到了较好的延迟约简。然后,我们提出了一种灵活的数论变换(NTT)双指令发布方案。对于矩阵-向量乘法,我们引入了一种行到列的处理方法,以最小化昂贵的内存访问操作。与参考实现相比,我们的ECO-CRYSTALS在NTT,矩阵-向量乘法和INTT方面的加速提高了53.85%和85.57%。最后,在Kyber中用于密钥生成、封装和解封装的ECO-CRYSTALS实现分别实现了399k、448k和479k周期,与NIST参考实现相比,实现了60.82%、63.93%和65.56%的速度。同样,在diiliium中,用于密钥生成、签名和验证的ECO-CRYSTALS实现达到了1 364k、3 191k和1 369k周期,分别显示了54.84%、64.98%和57.20%的速度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
ECO-CRYSTALS: Efficient Cryptography CRYSTALS on Standard RISC-V ISA
The field of post-quantum cryptography (PQC) is continuously evolving. Many researchers are exploring efficient PQC implementation on various platforms, including x86, ARM, FPGA, GPU, etc. In this paper, we present an Efficient CryptOgraphy CRYSTALS (ECO-CRYSTALS) implementation on standard 64-bit RISC-V Instruction Set Architecture (ISA). The target schemes are two winners of the National Institute of Standards and Technology (NIST) PQC competition: CRYSTALS-Kyber and CRYSTALS-Dilithium, where the two most time-consuming operations are Keccak and polynomial multiplication. Notably, this paper is the first highly-optimized assembly software implementation to deploy Kyber and Dilithium on the 64-bit RISC-V ISA. Firstly, we propose a better scheduling strategy for Keccak, which is specifically tailored for the 64-bit dual-issue RISC-V architecture. Our 24-round Keccak permutation (Keccak-$p$[1600,24]) achieves a 59.18% speed-up compared to the reference implementation. Secondly, we apply two modular arithmetic (Montgomery arithmetic and Plantard arithmetic) in the polynomial multiplication of Kyber and Dilithium to get a better lazy reduction. Then, we propose a flexible dual-instruction-issue scheme of Number Theoretic Transform (NTT). As for the matrix-vector multiplication, we introduce a row-to-column processing methodology to minimize the expensive memory access operations. Compared to the reference implementation, we obtain a speedup of 53.85%$\thicksim$85.57% for NTT, matrix-vector multiplication, and INTT in our ECO-CRYSTALS. Finally, the ECO-CRYSTALS implementation for key generation, encapsulation, and decapsulation in Kyber achieves 399k, 448k, and 479k cycles respectively, achieving speedups of 60.82%, 63.93%, and 65.56% compared to the NIST reference implementation. Similarly, the ECO-CRYSTALS implementation for key generation, sign, and verify in Dilithium reaches 1 364k, 3 191k, and 1 369k cycles, showcasing speedups of 54.84%, 64.98%, and 57.20%, respectively.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Computers
IEEE Transactions on Computers 工程技术-工程:电子与电气
CiteScore
6.60
自引率
5.40%
发文量
199
审稿时长
6.0 months
期刊介绍: The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.
期刊最新文献
GRASP: Accelerating Hash-Based PQC Performance on GPU Parallel Architecture FlexClave: An Extensible and Secure Trusted Execution Environment Framework Collaborative Prediction of Cloud DRAM Failures With Rules and Machine Learning Hardware-Efficient Taylor Series-Based Optimal Unsigned Square Rooter for Fast and Low Power Computation MalPDT: Backdoor Attack Against Static Malware Detection With Plug-and-Play Dynamic Triggers
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1