Low-Latency Design and Implementation of the Squaring in Class Groups for Verifiable Delay Function Using Redundant Representation

IACR Trans. Cryptogr. Hardw. Embed. Syst. Pub Date : 2022-11-29 DOI:10.46586/tches.v2023.i1.438-462

Danyang Zhu, Rong-Xian Zhang, Lun Ou, Jing Tian, Zhongfeng Wang

{"title":"Low-Latency Design and Implementation of the Squaring in Class Groups for Verifiable Delay Function Using Redundant Representation","authors":"Danyang Zhu, Rong-Xian Zhang, Lun Ou, Jing Tian, Zhongfeng Wang","doi":"10.46586/tches.v2023.i1.438-462","DOIUrl":null,"url":null,"abstract":"A verifiable delay function (VDF) is a function whose evaluation requires running a prescribed number of sequential steps over a group while the result can be efficiently verified. As a kind of cryptographic primitives, VDFs have been adopted in rapidly growing applications for decentralized systems. For the security of VDFs in practical applications, it is widely agreed that the fastest implementation for the VDF evaluation, sequential squarings in a group of unknown order, should be publicly provided. To this end, we propose a possible minimum latency hardware implementation for the squaring in class groups by algorithmic and architectural level co-optimization. Firstly, low-latency architectures for large-number division, multiplication, and addition are devised using redundant representation, respectively. Secondly, we present two hardware-friendly algorithms which avoid time-consuming divisions involved in calculations related to the extended greatest common divisor (XGCD) and design the corresponding low-latency architectures. Besides, we schedule and reuse these computation modules to achieve good resource utilization by using compact instruction control. Finally, we code and synthesize the proposed design under the TSMC 28nm CMOS technology. The experimental results show that our design can achieve a speedup of 3.6x compared to the state-of-the-art implementation of the squaring in the class group. Moreover, compared to the optimal C++ implementation over an advanced CPU, our implementation is 9.1x faster.","PeriodicalId":13186,"journal":{"name":"IACR Trans. Cryptogr. Hardw. Embed. Syst.","volume":"20 1","pages":"438-462"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IACR Trans. Cryptogr. Hardw. Embed. Syst.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46586/tches.v2023.i1.438-462","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

A verifiable delay function (VDF) is a function whose evaluation requires running a prescribed number of sequential steps over a group while the result can be efficiently verified. As a kind of cryptographic primitives, VDFs have been adopted in rapidly growing applications for decentralized systems. For the security of VDFs in practical applications, it is widely agreed that the fastest implementation for the VDF evaluation, sequential squarings in a group of unknown order, should be publicly provided. To this end, we propose a possible minimum latency hardware implementation for the squaring in class groups by algorithmic and architectural level co-optimization. Firstly, low-latency architectures for large-number division, multiplication, and addition are devised using redundant representation, respectively. Secondly, we present two hardware-friendly algorithms which avoid time-consuming divisions involved in calculations related to the extended greatest common divisor (XGCD) and design the corresponding low-latency architectures. Besides, we schedule and reuse these computation modules to achieve good resource utilization by using compact instruction control. Finally, we code and synthesize the proposed design under the TSMC 28nm CMOS technology. The experimental results show that our design can achieve a speedup of 3.6x compared to the state-of-the-art implementation of the squaring in the class group. Moreover, compared to the optimal C++ implementation over an advanced CPU, our implementation is 9.1x faster.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于冗余表示的可验证延迟函数类群平方的低延迟设计与实现

可验证延迟函数(VDF)是一种函数，它的求值需要在一组上运行规定数量的连续步骤，而结果可以有效地验证。vdf作为一种加密原语，已被广泛应用于分布式系统中。为了保证VDF在实际应用中的安全性，人们普遍认为应该公开提供VDF求值的最快实现方法，即一组未知阶数的顺序平方。为此，我们提出了一种可能的最小延迟硬件实现，通过算法和架构级别的协同优化来实现类组中的平方。首先，采用冗余表示分别设计了大数除法、乘法和加法的低延迟架构。其次，我们提出了两种硬件友好的算法，避免了与扩展最大公约数(XGCD)相关的计算中耗时的分割，并设计了相应的低延迟架构。此外，我们利用紧凑的指令控制，对这些计算模块进行调度和重用，以达到良好的资源利用率。最后，我们在台积电28纳米CMOS技术下对所提出的设计进行了编码和综合。实验结果表明，与类组中最先进的平方实现相比，我们的设计可以实现3.6倍的加速。此外，与在高级CPU上的最优c++实现相比，我们的实现要快9.1倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IACR Trans. Cryptogr. Hardw. Embed. Syst.

自引率

0.00%

发文量

期刊最新文献

MMM: Authenticated Encryption with Minimum Secret State for Masking Don't Forget Pairing-Friendly Curves with Odd Prime Embedding Degrees LPN-based Attacks in the White-box Setting Enhancing Quality and Security of the PLL-TRNG Protecting Dilithium against Leakage Revisited Sensitivity Analysis and Improved Implementations